Re: [netcdfgroup] Proposed changes to netcdf-c library: DetectingnetCDF versus HDF5

@Dennis
@Ed
@Kent Yang
@Ward


>I am about to commit a pull request for the netcdf-c library


I don't see any commits in the master branch, is this the right place ?

https://github.com/Unidata/netcdf-c


First, there is
a hidden, persistent, attribute names _NCProperties.  It specifies the
library versions of the netcdf library

this is for the case for all future files, ok


Second, there are two special, non-persistent, attributes that are
computed from information already in the file.

this seems to be to deal with the case of "old" files, that do not have the "First case" above

ok, looks good to me


for something off-topic, but that has to do with this

At some point, it needs to be added to the netcdf NASA spec
(https://earthdata.nasa.gov/standards/netcdf-4hdf5-file-format).

the doc says, in the section

"Dimensions with HDF5 Dimension Scales"

"Until version 1.8, HDF5 did not have any capability to represent shared dimensions. With the 1.8 release, HDF5 introduced the dimension scale feature to allow shared dimensions
in HDF5 files.
The HDF5 dimension scale is not exactly equivalent to the netCDF shared dimension. This leads
to a number of compromises in the design of netCDF
A netCDF shared dimension consists solely of a length and a name. An HDF5 dimension scale also includes values for each point along the dimension. This additional information is
(optionally) included in a netCDF coordinate variable. "

maybe Ed or Kent Yang from the HDF Group know better, but

*** what is the reason for using the HDF5 Dimension Scales API inside netCDF ? ***

Hi Kent, how are you?
Do you remember why Dimension Scales  was chosen?

A netCDF dimension consists solely of a length and a name, exactly.
So it seems that it would be trivial to add just these 2 scalar types of metadata to each variable.
as attributes , for example.
or have an attribute at root that has a list of all the dimensions, something like, pairs of name / sizes

/group1/lat 90
/group2/lon 180

the index in the array specifies the dimension ID. every time a new dimension is created, it would be just a matter
of a table lookup to see if that same absolute name already exists

since netCDF has at most 1024 dimensions there are no scale issues here

The HDF5 dimension scale is a mechanism to associate one to one or one to many HDF5 datasets.
at one end is a "regular" dataset, the same as the netCDF variable.
at the other end is a dataset that is called a "dimension scale", which is just another dataset that has the spatial information for each element in the array (say a latitude of size 90, the array would have all the element values from 0 to 89) these "dimension scales" datasets can be added or removed by the API, making it possible even to add many scales as potential options
(say latitude with coarser values)

it is actually one of the most well designed and powerful APIs in HDF5

it  is a much more complex mechanism than netCDF needs.

By adding it to netCDF it seems you just added extra complexity where none was needed.

and the fact that HDF5 has something related to dimensions does not mean you have to use it. In fact HDF5 dimension scale is not even part of the HDF5 format, it's just a "high-level" (optional) abstraction. Even if it was part of the format still it does not mean that you would have to use it.


@Ward, Dennis

if you agree with the above , that it does not make any sense to use HDF5 dimension scales in netCDF, maybe something you should consider one day is just to remove completely the HDF5 dimension scales
from netCDF?

Or is there any other reason for keeping it?

regards to sunny Colorado

----------------------
Pedro Vicente
pedro.vicente@xxxxxxxxxxxxxxxxxx
https://twitter.com/_pedro__vicente
http://www.space-research.org/


----- Original Message ----- From: <dmh@xxxxxxxx>
To: <netcdfgroup@xxxxxxxxxxxxxxxx>
Sent: Friday, April 29, 2016 10:32 PM
Subject: [netcdfgroup] Proposed changes to netcdf-c library: DetectingnetCDF versus HDF5


I am about to commit a pull request for the netcdf-c library having to
do with identifying the provenance and format of netcdf-4 files,
and specifically targeted at detecting netcdf-4 files from HDF5 files.

This provenance consists of the following information.  First, there is
a hidden, persistent, attribute names _NCProperties.  It specifies the
library versions of the netcdf library and the hdf5 library used to
create the file. This attribute never changes during the lifetime of the
file (unless modified deliberately thru the hdf5 API).

Second, there are two special, non-persistent, attributes that are
computed from information already in the file.
1. _SuperblockVersion
2. _IsNetcdf4
Non-persistence means these attributes do not actually appear in the
file.  and are computed from other info already in the file.

The _SuperblockVersion attribute is a single integer giving the version
number (currently 0-3) of the superblock in the hdf5/netcdf-4 file.

The _IsNetcdf4 attribute is a single integer 0/1 indicating if the file
has various tags indicating it was produced thru the netcdf-4 API. This
is computed by using the HDF5 API to walk the file to look for
attributes specific to netcdf-4.  False negatives are possible for a
small subset of netcdf-4 files, especially those not containing
dimensions. False positives are (I think) only possible by deliberate
modifications to an existing HDF5 file thru the HDF5 API. For files with
the _NCProperties attribute, this attribute is redundant. For files
created prior to the introduction of the _NCProperties attribute, this
may be a useful indicator of the provenance of the file.

These three attributes are hidden in the sense that they can only be
accessed thru the netcdf-C api calls via the name. They have no
attribute number and will not be counted in the number of global
attributes in the root group.

The simplest way to view these attributes is to use the -s flag to the
ncdump command.

Comments are welcome.

=Dennis Heimbigner
Unidata

_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/