In the ideal world Unidata will do the packing in such a way that it
is essentially invisible to the user. I know they have wanted to do if
for many years. I dont know how fast it is actually going to happen.
In the interim I have consider a kludge, which would require an extra
set of agreed upon conventions, and maybe a wrapper to the Unidata
Netcdf API.
imagine a 4-D array of floats T(x,y,z,t). We define a set of scale
factors and offsets float T-scale(z,t) and T-offset(z,t). The values
of the factors are chosen so that an array short T-short(x,y,z,t) can
be defined so that
T-approx(x,y,z,t) = T-short(x,y,z,t)*T-scale(*,*,z,t) + T-offset(*,*,z,t)
T-approx is an approximation to the original array T.
It is trivial to produce a netcdf file storing T-short, T-scale, and
T-offset, and the file will be about a factor of two smaller than one
storing the original array T. If one used a byte array it would be a
factor of 4 smaller. It is obviously not an optimal setup, because it doesnt
support arbitrary bit lengths, but it would frequently result
in dramatic reductions in the file length.
The scale and offset arrays can be determined to optimize the
scalability of the data. E.g. I have chosen here to define a set of
scalings that are relevant to data that has similar characteristics on
x,y surfaces.
The main problem is we would need to establish a convention to be able
to share within the community, and maybe a set of useful wrappers so
that each programmer would not need to do conformance checking, etc.
There may be a better way to do this kind of thing, and I am open to
suggestion. I toss this out to stimulate some discussion.
Phil
On Thu, Oct 03, 2002 at 07:57:34AM -0600, John Caron wrote:
> We have thought a lot about packing in netCDF and considered various
> schemes on how to do it. Variable length compression schemes would
> probably interfere with efficient subsetting, but it seems to me that fixed
> bit size packing should be doable. We have this on our wish list, i mean
> to-do list, for the next round of netCDF development.
>
> since COARDS is lat/lon only, you cant do arbitrary GRIB conversion. CF
> seems like a good candidate, however. Has anyone investigated the
> feasibility of this?
>
> gribtonc uses NUWG, and has various limitations. We are interested in
> possibly upgrading gribtonc. If anyone can make use of more flexible GRIB
> to netCDF conversion, I'd like to hear about your "use-case".
>
> I think in order to correctly group the GRIB records into 3 or 4
> dimensional netCDF variables, you will need (for the general case) some
> sort of configuration info for the converter, although I suppose a "common
> case" could be assumed. Anyone have any thoughts on that?
>
>
> Timothy Hume wrote:
>
> >Hi,
> >
> >This discussion reminded me of how GRIB packs data. Ideally, it would be
> >nice for NetCDF to be able to handle data with an arbitrary number of
> >bits. Many meteorological data can be packed into only 9 or 10 bits (often
> >less), so packing them into 16 bit short integers is "wasteful". Aside
> >from that many satellite data are "naturally" 10 bit, and increasing them
> >to 16 bits can cause the file size to increase by tens of megabytes per
> >image.
> >
> >By the way, does anyone know of software that can convert GRIB data to
> >COARDS or CF conventions? gribtonc converts GRIB to NUWG conventions?
> >
> >Tim Hume
> >
> >By the way, does anyone know of a GRIB to COARDS or CF
> >
> >On Wed, 2 Oct 2002, Mark A Ohrenschall wrote:
> >
> >>Hello,
> >>
> >>In the case of a packed variable (in which scale_factor and add_offset
> >>are used) both the COARDS and CF conventions indicate that missing_value
> >>and _FillValue should be likewise packed:
> >>
> >>COARDS: "In cases where the data variable is packed via the scale_value
> >>attribute this implies that the missing_value flag is likewise packed."
> >>CF: "The missing values of a variable with scale_factor and/or
> >>add_offset attributes (see section 8.1) are interpreted relative to the
> >>variable's external values, i.e., the values stored in the netCDF file."
> >>
> >>I'm assuming that for the sake of consistency, this means that all
> >>statistical variable attributes should be packed as well, e.g.,
> >>valid_range and actual_range, as well as mean and standard_deviation. Is
> >>this true?
> >>
> >>So for example, if I have real world data values for temperature between
> >>-1.6 and 31.4 and I'm applying a scale_factor of 0.1 then I would say
> >>the valid_range is -16, 314 and the mean is 116 (not 11.6)?
> >>
> >>Thanks,
> >>
> >>Mark
> >>
> >>
> >>
> >
> >
>
--
Phil Rasch, Climate Modeling Section, National Center for Atmospheric Research
Mail --> P.O. Box 3000, Boulder CO 80307
Shipping --> 1850 Table Mesa Dr, Boulder, CO 80305
email: pjr@xxxxxxxx, Web: http://www.cgd.ucar.edu/cms/pjr Phone: 303-497-1368,
FAX: 303-497-1324
>From owner-netcdfgroup@xxxxxxxxxxxxxxxx Thu 3 2002 Oct 08:56:41
Date: Thu, 3 Oct 2002 08:56:41 -0700 (PDT)
From: Charlie Zender <zender@xxxxxxx>
To: netCDF Mailing Group <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: packed data in NCO
Received: (from majordo@localhost)
by unidata.ucar.edu (UCAR/Unidata) id g93GVFw17441
for netcdfgroup-out; Thu, 3 Oct 2002 10:31:15 -0600 (MDT)
Organization: UCAR/Unidata
Keywords: 200210031631.g93GVD117428
Message-Id: <20021003155641.0C82B24803@xxxxxxxxxxxxxxx>
Sender: owner-netcdfgroup@xxxxxxxxxxxxxxxx
Precedence: bulk
Reply-To: Charlie Zender <zender@xxxxxxx>
Hi,
For what it's worth, recent versions of NCO (http://nco.sf.net)
support interpreting packed data to the following extent:
All arithmetic operators (claim to) support packed data.
This means multiple packed data files can be, e.g., easily averaged.
ncap can read and write packed data, i.e., it will pack it for you.
The read functions support packing into any type, so data packed into
NC_CHAR (8 bits) should work fine.
I hope people will exercise the packing/unpacking functionality
and let me know how it works for them.
Charlie
--
Charlie Zender, zender at uci dot edu, (949) 824-2987, Department of
Earth System Science, University of California, Irvine CA 92697-3100