Re: [netcdfgroup] NetCDF4 for Fusion Data

To: John Storrs <john.storrs@xxxxxxxxxxxx>
Subject: Re: [netcdfgroup] NetCDF4 for Fusion Data
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Date: Wed, 21 Jan 2009 12:35:07 -0700

John Storrs <john.storrs@xxxxxxxxxxxx> writes:

> Hi Ed
>
> Further to my previous postings about 'unlimited' dimensions, I now
understand
> the semantics better, and it's apparent that there is a mismatch with the
> needs of our application.
>
> As previously described, we need to archive say 96 digitizer channels which
> have the same sample times but potentially different sample counts. From a
> logical point of view, the channel measurements share a single time
> dimension - some move further along it than others, that's all. They should
> clearly all reference a single time coordinate variable. Also, we may want to
> stick with our present compression strategy for time, storing it as a
> (sequence of) triple: start time, time increment, and count. We might put
> these values in an attribute of the time coordinate variable, leaving the
> variable itself empty. Potentially all the variables might have different
> initialized sizes.
>
> The 'unlimited' semantics go only half way to matching this requirement. At
> the HDF5 storage level,all is well. H5dump shows that the stored size of each
> variable is the initialized size, not the maximum initialized size of all the
> variables to which the dimension is evidently set. So far so good, but ncdump
> shows all the data padded to that size, reducing its usefulness. This is
> presumably because the dimension provides the only size exposed by the API,
> unless I overlook something. HDF5 knows about the initialized sizes, but
> NetCDF doesn't expose them. So we cannot easily read the data and nothing but
> the data. Do you have an initialized size inquiry function tucked away
> somewhere, or do we have to store the value as an attribute with each
> variable?

If it is any consolation to you, netcdf-4 does not actually attempt to
write or read the extra fill values.

That is, although the increase in the time dimension seems to cause
all the variable that share this dimension to increase in size, in
fact, no writes or reads take place for those other variables (as
would happen with classic netcdf format). The netcdf-4 library just
pretends that the other variables have increased in sizes, and, if you
try and read such values, hands you arrays of the fill value.

However, as John points out, the semantics of netCDF objects are such
that, logically, all the variables share the dimension, and it must be
the maximum size needed to hold data from any of the variables that
share it.

> I don't think I want to explore VLEN to crack this, because it's new and
would
> complicate things. It seems to me that this is a use case others will
> encounter, which needs a tidy solution.Any thoughts? I have to present a
> strong case for NetCDF here next week, to counter an HDF5 proposal which
> doesn't have this problem, though it has many others.

I would suggest that the HDF5 solution will probably involve VLENs, as
they are natural for the data structure you describe. But are you
content to always read/write the VLEN as a unit? That is, you can't
read/write part of a VLEN, you have to do the whole VLEN at once. This
might make it unsuitable.

> Another point: nc_inq_ncid returns NC_NOERR if the named group doesn't
exist.
> Do you mean this?

That sounds like a bug. Let me check this out...

Thanks,

Ed

--
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

References:
- [netcdfgroup] NetCDF4 for Fusion Data
  - From: John Storrs
- Re: [netcdfgroup] NetCDF4 for Fusion Data
  - From: John Storrs

2009 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: