Re: [netcdfgroup] Proposed changes to netcdf-c library: DetectingnetCDF versus HDF5

Howdy All!

Pedro asks:
"what is the reason for using the HDF5 Dimension Scales API inside netCDF ?"

The dimension scales API was added to HDF5 to support netCDF dimensions. So
I used it in netCDF-4. The idea was that other HDF5 users might use
dimension scales to express shared dimension info, and then netCDF-4 would
understand those files too.

NetCDF-4 also can read HDF5 files that do not have dimension scales, so
dimension scales are not required to read a file with netCDF-4.

Thanks,
Ed



On Tue, May 3, 2016 at 12:30 AM, Pedro Vicente <
pedro.vicente@xxxxxxxxxxxxxxxxxx> wrote:

> @Dennis
> @Ed
> @Kent Yang
> @Ward
>
>
> >I am about to commit a pull request for the netcdf-c library
>>>>>
>>>>
>
> I don't see any commits in the master branch, is this the right place ?
>
> https://github.com/Unidata/netcdf-c
>
>
> First, there is
>>>> a hidden, persistent, attribute names _NCProperties.  It specifies the
>>>> library versions of the netcdf library
>>>>
>>>
> this is for the case for all future files, ok
>
>
> Second, there are two special, non-persistent, attributes that are
>>>> computed from information already in the file.
>>>>
>>>
> this seems to be to deal with the case of "old" files, that do not have
> the "First case" above
>
> ok, looks good to me
>
>
> for something off-topic, but that has to do with this
>
> At some point, it needs to be added to the netcdf NASA spec
>>>
>>>> (https://earthdata.nasa.gov/standards/netcdf-4hdf5-file-format).
>>>>
>>>
> the doc says, in the section
>
> "Dimensions with HDF5 Dimension Scales"
>
> "Until version 1.8, HDF5 did not have any capability to represent shared
> dimensions. With the 1.8
> release, HDF5 introduced the dimension scale feature to allow shared
> dimensions
> in HDF5 files.
> The HDF5 dimension scale is not exactly equivalent to the netCDF shared
> dimension. This leads
> to a number of compromises in the design of netCDF
> A netCDF shared dimension consists solely of a length and a name. An HDF5
> dimension scale
> also  includes  values  for  each  point  along  the  dimension.    This
> additional  information  is
> (optionally) included in a netCDF coordinate variable. "
>
> maybe Ed or Kent Yang from the HDF Group know better, but
>
> *** what is the reason for using the HDF5 Dimension Scales API inside
> netCDF ? ***
>
> Hi Kent, how are you?
> Do you remember why Dimension Scales  was chosen?
>
> A netCDF dimension consists solely of a length and a name, exactly.
> So it seems that it would be trivial to add just these 2 scalar types of
> metadata to each variable.
> as attributes , for example.
> or have an attribute at root that has a list of all the dimensions,
> something like, pairs of name / sizes
>
> /group1/lat 90
> /group2/lon 180
>
> the index in the array specifies the dimension ID. every time a new
> dimension is created, it would be just a matter
> of a table lookup to see if that same absolute name already exists
>
> since netCDF has at most 1024 dimensions there are no scale issues here
>
> The HDF5 dimension scale is a mechanism to associate one to one or one to
> many HDF5 datasets.
> at one end is a "regular" dataset, the same as the netCDF variable.
> at the other end is a dataset that is called a "dimension scale", which is
> just another dataset that has the spatial
> information for each element in the array (say a latitude of size 90, the
> array would have all the element values from 0 to 89)
> these "dimension scales" datasets can be added or removed by the API,
> making it possible even to add many scales as potential options
> (say latitude with coarser values)
>
> it is actually one of the most well designed and powerful APIs in HDF5
>
> it  is a much more complex mechanism than netCDF needs.
>
> By adding it to netCDF it seems you just added extra complexity where none
> was needed.
>
> and the fact that HDF5 has something related to dimensions does not mean
> you have to use it. In fact
> HDF5 dimension scale is not even part of the HDF5 format, it's just a
> "high-level" (optional) abstraction.
> Even if it was part of the format still it does not mean that you would
> have to use it.
>
>
> @Ward, Dennis
>
> if you agree with the above , that it does not make any sense to use HDF5
> dimension scales in netCDF,
> maybe something you should consider one day is just to remove completely
> the HDF5 dimension scales
> from netCDF?
>
> Or is there any other reason for keeping it?
>
> regards to sunny Colorado
>
> ----------------------
> Pedro Vicente
> pedro.vicente@xxxxxxxxxxxxxxxxxx
> https://twitter.com/_pedro__vicente
> http://www.space-research.org/
>
>
> ----- Original Message ----- From: <dmh@xxxxxxxx>
> To: <netcdfgroup@xxxxxxxxxxxxxxxx>
> Sent: Friday, April 29, 2016 10:32 PM
> Subject: [netcdfgroup] Proposed changes to netcdf-c library:
> DetectingnetCDF versus HDF5
>
>
>
> I am about to commit a pull request for the netcdf-c library having to
>> do with identifying the provenance and format of netcdf-4 files,
>> and specifically targeted at detecting netcdf-4 files from HDF5 files.
>>
>> This provenance consists of the following information.  First, there is
>> a hidden, persistent, attribute names _NCProperties.  It specifies the
>> library versions of the netcdf library and the hdf5 library used to
>> create the file. This attribute never changes during the lifetime of the
>> file (unless modified deliberately thru the hdf5 API).
>>
>> Second, there are two special, non-persistent, attributes that are
>> computed from information already in the file.
>> 1. _SuperblockVersion
>> 2. _IsNetcdf4
>> Non-persistence means these attributes do not actually appear in the
>> file.  and are computed from other info already in the file.
>>
>> The _SuperblockVersion attribute is a single integer giving the version
>> number (currently 0-3) of the superblock in the hdf5/netcdf-4 file.
>>
>> The _IsNetcdf4 attribute is a single integer 0/1 indicating if the file
>> has various tags indicating it was produced thru the netcdf-4 API. This
>> is computed by using the HDF5 API to walk the file to look for
>> attributes specific to netcdf-4.  False negatives are possible for a
>> small subset of netcdf-4 files, especially those not containing
>> dimensions. False positives are (I think) only possible by deliberate
>> modifications to an existing HDF5 file thru the HDF5 API. For files with
>> the _NCProperties attribute, this attribute is redundant. For files
>> created prior to the introduction of the _NCProperties attribute, this
>> may be a useful indicator of the provenance of the file.
>>
>> These three attributes are hidden in the sense that they can only be
>> accessed thru the netcdf-C api calls via the name. They have no
>> attribute number and will not be counted in the number of global
>> attributes in the root group.
>>
>> The simplest way to view these attributes is to use the -s flag to the
>> ncdump command.
>>
>> Comments are welcome.
>>
>> =Dennis Heimbigner
>> Unidata
>>
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>