On 6/20/2011 11:07 AM, Comiskey, Glenn wrote:
Hi John,
Thank you kindly for the feedback, much appreciated.
Regarding storing data in NetCDF format, this is not something that
locally is wanted to be done as the source files are GRIB v2 and it
would require manual intervention to create the NetCDF file. The
purpose of the conversion/ncdump header file sent was to show how the
GRIB file is wanted to be published, i.e. three dimensions and 16
distinct data variables.
Currently, if the GRIB file is published in its native form it is read
by NetCDF-Java as a four dimensional/3 data variable file, i.e.
dimensions ordered_sequence_of_data, time, lat, lon and only SWELL,
SWDIR, SWPER data variables. This differes from earlier versions that
read the file as a three dimensional/13 data variable file, i.e.
dimensions time, lat, lon and only SWELL, SWDIR, SWPER data variables
that were "2 in sequence" - that it say the "1 in sequence" data
variables were ignored.
Its a complicated problem. WIthout manual intervention, theres no way
for the library to know that those variables should remain 2
dimensional. Heres a blog about it if you are interested:
http://www.unidata.ucar.edu/blogs/developer/en/entry/dataset_schemas_are_lost_in
The thing I find most odd is that given my understanding of the GRIB
v2 file specification, the octet that defines
"ordered_sequence_of_data" is a locally defined value. In the file
sent, it being value 241 (decimal) as defined by NCEP/NOAA
(_http://www.nco.ncep.noaa.gov/pmb/docs/grib2/grib2_table4-5.shtml_),
and therefore wouldn't have thought NetCDF-Java would have been able
to determine what the significance of the octet value meant.
We have NCEP local tables, so we know what this local variable means. Im
hoping NCEP (soon!) will publish their local tables so we can stop
maintaining our own version. That goes for all WMO centers using local
tables.
Thanks for the info. regarding conversion to NetCDF-4 being only a
factor of 2. My current 'wgrib2' only allows conversion to NetCDF-3
(classic) hence why the issue of disk capacity. Will source alternate
software to be able to convert to NetCDF-4.
Eventually, we hope!
Kind regards,
Glenn
------------------------------------------------------------------------
*From:* John Caron [mailto:caron@xxxxxxxxxxxxxxxx]
*Sent:* 20 June 2011 17:02
*To:* Comiskey, Glenn
*Cc:* netcdf-java@xxxxxxxxxxxxxxxx
*Subject:* Re: [netcdf-java] GRIB v2 files
On 6/20/2011 8:11 AM, Comiskey, Glenn wrote:
John,
While it is possible to present the data in this format having
converted the GRIB v2 file to a NetCDF file, as you'll note from the
quoted file sizes at the top of header.txt it results in an 11-fold
increase in file size. If this was to be used for all GRIB v2 files
it would require an enormous increase in storage capacity.
Regards,
Glenn
Hi Glenn:
Ive taken the liberty of cc'ing this to the netcdf-java list, as
others may want to hear about this also.
1) The CDM reads the data in its native (GRIB2) format and does the
conversion on the fly.
2) If you want to store the data in netCDF, you will get a factor of
10 or more increase in size for netCDF-3 format. The netCDF-4 library
(built on HDF-5) allows one to store the data compressed. Our
experiments indicate that GRIB2 compression still outperforms this by
about a factor of 2. So currently we can reduce your factor of 11 to
a factor of 2, if you switch to netCDF-4.
3) AFAIK, mostly this factor of 2 is due to GRIB JPEG-2000 wavelet
compression. Eg this is what the data you sent me uses for encoding.
We are working on adding this kind of compression to the netcdf-4 C
library, and HDF-5 is interested in including this also. Our intention
is to make the netCDF-4 format as space efficient as GRIB2. Im not
sure if we will run into any roadblocks on this, but we are motivated
to remove obstacles for netCDF adoption. I personally think that
GRIB-2 should not be used as a long-term archive format, due to
problems with tables, and also the kinds of problems that you have
reported.
John