Hi Christian,
I actually prefer normal files myself, but had a need to use the files from the
NASA OceanColor site, some of which are provided as .main + subordinate
(linked) files. I have used a subordinate file structure with HDF5 in the
past, but when I did so I was working directly with the HDF files (via their
HDF Java api), so the linking wasn’t an issue. The primary reason I’m aware of
for using subordinates is to keep the size of any single file smaller – though
I think this is a somewhat antiquated reason that’s a holdover from the days of
2GB file limits.
As to my particular problem, I’ve been able to incorporate the aforementioned
HDF Java library into our application, which has allowed us to read the
linked-fileset without issue. The downside is that it we incur a requirement
for platform-specific binaries, but we don’t have much other option! :)
Fortunately, we’re able to segregate the code into a “pre-process”, which means
we don’t need to worry about distributing the platform-specific portions.
It’s understandable that there is not support for linked HDF files in
theNetCDF-Java library – as you said, it’s probably not a very frequently
required functionality. However – it may be worth trying to find a way to at
least recognize that a particular dataset is backed by a linked-file so that an
appropriate error can be thrown. The concern I have is that, as it stands now,
the NetCDF-Java library returns data without any indication that the data is
incorrect. While in theory, someone should know what their dealing with and
recognize that the data is incorrect, I could envision a scenario where it
could become a problem.
Best,
Chris
From: Christian Ward-Garrison <cwardgar@xxxxxxxx<mailto:cwardgar@xxxxxxxx>>
Date: Friday, August 1, 2014 at 7:17 PM
To: Christopher Mueller
<cmueller@xxxxxxxxxxxxxx<mailto:cmueller@xxxxxxxxxxxxxx>>
Cc: "netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>"
<netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>>
Subject: Re: [netcdf-java] Erroneous data from linked HDF files
Hi Chris,
First off, let me just say that this is an absolutely fantastic bug report. I
wish I had better news for you, but the simple answer is that NetCDF-Java
doesn't support linked HDF files. Frankly, you're the first use that's even
mentioned them to us. Is there a particular reason that you prefer linked files
to normal files?
Regards,
Christian
On Tue, Jul 15, 2014 at 1:29 PM, Christopher Mueller
<cmueller@xxxxxxxxxxxxxx<mailto:cmueller@xxxxxxxxxxxxxx>> wrote:
tl;dr There appears to be a bug in NetCDF Java with respect to reading linked
HDF4 files which results in data that is read from the linked file(s) to be
erroneous.
Resources
* ToolsUI
* HDFView
* The files mentioned below can be retrieved directly from
OceanColor<http://oceancolor.gsfc.nasa.gov/cgi/l3/A20021822013212.L3b_MC_RRS.main.bz2?sub=bin>
(one at a time), or (for convenience) as one tar.gz file
fromhere<https://drive.google.com/uc?id=0B6UT7Mn4GZQhMjdLNDBBMFE0TTA&export=download>
Details
I'm reading data from the Aqua MODIS L3 Binned products available from the NASA
OceanColor<http://oceancolor.gsfc.nasa.gov/> website. It should be noted that
these files are HDF4 (4.2.9 according to NetCDF Java - ncdump). Many of the
products, such as chlorophyll, Particulate Inorganic Carbon, and Sea Surface
Temperature, come as a single file. The NetCDF library reads these files
without any difficulty.
However, one of the datasets of interest is the Remote Sensing Reflectance
data, which is NOT provided as a single file, but as a "main" file and a set of
subordinate files which are read via the "main" file as needed (see here for
more information<http://oceancolor.gsfc.nasa.gov/PRODUCTS/modis_binned.html>):
* A20021822013212.L3b_MC_RRS.main
* A20021822013212.L3b_MC_RRS.x00
* A20021822013212.L3b_MC_RRS.x01
* A20021822013212.L3b_MC_RRS.x02
* A20021822013212.L3b_MC_RRS.x03
* A20021822013212.L3b_MC_RRS.x04
* A20021822013212.L3b_MC_RRS.x05
* A20021822013212.L3b_MC_RRS.x06
* A20021822013212.L3b_MC_RRS.x07
* A20021822013212.L3b_MC_RRS.x08
* A20021822013212.L3b_MC_RRS.x09
* A20021822013212.L3b_MC_RRS.x10
* A20021822013212.L3b_MC_RRS.x11
NetCDF Java (via ToolsUI) loads the .main file without issue, and permits
reading of data variables (i.e. Rrs_412) without raising any errors. However,
the data returned is not accurate. Below is a comparison of the data returned
by ToolsUI and the same data returned by HDFView (which uses the HDF-java
JNI<http://www.hdfgroup.org/products/java/JNI/> library):
Retrieving the first 10 values for variable "Rrs_412"
HDFView
Screen Capture<http://cl.ly/WZnD>
Opening the .main file in HDFView and looking at the Rrs_412 dataset gives a
very different set of data:
0.0055423053, 0.0106070135, 0.006894292, -0.0040368317, -0.0020879991,
-0.0020279996, 0.009794002, 0.011879213, 0.010874448, 0.012330733
ToolsUI
Screen Capture<http://cl.ly/WZMW>
Opening the .main file and performing an Ncdump Data of variable:
"Level-3_Binned_Data/Rrs_412(0:10:1).Rrs_412_sum"
Returns:
float Rrs_412_sum;
data:
{1.86057E-40, 9.403955E-38, 6.4099753E-10, 2.6076459E-9, 1.0297978E21,
5.6431478E-11, 0.0, -2.9699963E36, 4.59183E-40, 3.67343E-40, 2.60329423E11}
Also, in ToolsUI, all of the other data variables (i.e. angstrom, aot_869 &
Rrs_*) all display very very similar (most are identical) values as the
Rrs_412. This is not the case for HDFView.
Incidentally, reading the data via OceanColor's
SeaDas<http://seadas.gsfc.nasa.gov/> application (which uses NetCDF Java under
the hood) results in the same data as ToolsUI.
Wrap-up
The evidence above appears to indicate that there is a bug in NetCDF Java
related to linked HDF files which results in incorrect data reads from linked
files.
Does anyone have any idea:
a) what could be causing the issue?
b) how could it be addressed?
Thanks in advance,
Chris
_______________________________________________
netcdf-java mailing list
netcdf-java@xxxxxxxxxxxxxxxx<mailto:netcdf-java@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/