Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5

Hmmm. Is there any big reason NOT to try to read a netCDF produced HDF5 file 
with the native HDF5 library if somene so chooses?

As far as detecting the data producer goes, I have a similar problem with my 
Silo library. Silo can write to HDF5. It can also write to PDB (Thats 'Portable 
Databse', https://wci.llnl.gov/codes/pact/pdb.html) not Protien Database).

And, attmpeting to read an HDF5 file produced by Silo using just the HDF5 
library (e.g. w/o Silo) is a major pain.

To handle detection of Silo/HDF5, Silo/PDB, there are a couple of things I do.

First, augment the Linux 'file' utility calling it 'silofile'...

#!/bin/sh
#
# Use octal dump (od) command to examine first few bytes of file.
# If do not find expected bytes of any of the formats we'd like
# to identify here, fall back to using the good ole' file command.
#
for f in $*; do
    if test -f $f; then
        headerBytes=$(od -a -N 10 $f)
        if test -n "$(echo $headerBytes | tr -d ' ' | grep '<<PDB:')"; then
            echo "$f: Portable Database (PDB) data"
        elif test -n "$(echo $headerBytes | tr -d ' \\' | grep 'HDFcrnl')"; then
            echo "$f: Hierarchical Data Format version 5 (HDF5) data"
        else
            headerBytes=$(od -t x1 -N 4 $f)
            if test -n "$(echo $headerBytes | grep '0000000 0e 03 13 01')"; then
                echo "$f: Hierarchical Data Format version 4 (HDF4) data"
            else
                file $f
            fi
        fi
    else # not a regular file
        file $f
    fi
done

Now, this won't tell a user if the file was produced by Silo but it will tell a 
user whether the file appears to be HDF5, PDB or HDF4 and that is usually 
sufficient for Silo users

Now, from within C code, its sufficient for me to just attempt to open the file 
using Silo's open routines. That process involves looking for telltale signs 
the file was produced by Silo. It turns out the Silo library creates a couple 
of somewhat uniquley named char datasets in the root group of the file, 
"_silolibinfo" and "_hdf5libinfo". So, if Silo's open succeeds, its a fairly 
certain sign the file was actually produced by Silo.

In a cursory look over the libsrc4 sources in netCDF distro, I see a few things 
that might give a hint a file was created with netCDF. . .

First, in NC_CLASSIC_MODEL, an attribute gets attached to the root group named 
"_nc3_strict". So, the existence of an attribute on the root group by that name 
would suggest the HDF5 file was generated by netCDF.

Also, I tested a simple case of nc_open, nc_def_dim, etc. nc_close to see what 
it produced.

It appears to produce datasets for each 'dimension' defined with two attributes 
named "CLASS" and "NAME". The value of "CLASS" is a 16 char null-terminated 
string "DIMENSION_SCALE" and the value of "NAME" is a 64-char null terminated 
string of the form "This is a netCDF dimension but not a netCDF variable.      
%d"

Finally, someone does an nc_open followed immediately by nc_close, then I don't 
think the resulting HDF5 file has anything to suggest it might have been 
created by netCDF. OTOH, the file is also devoid of any objects in that case 
and so who cares whether netCDF produced it.

Hope that helps.

Mark


From: Hdf-forum 
<hdf-forum-bounces@xxxxxxxxxxxxxxxxxx<mailto:hdf-forum-bounces@xxxxxxxxxxxxxxxxxx>>
 on behalf of John Shalf <jshalf@xxxxxxx<mailto:jshalf@xxxxxxx>>
Reply-To: HDF Users Discussion List 
<hdf-forum@xxxxxxxxxxxxxxxxxx<mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>
Date: Wednesday, March 2, 2016 1:02 PM
To: HDF Users Discussion List 
<hdf-forum@xxxxxxxxxxxxxxxxxx<mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>
Cc: "netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>" 
<netcdfgroup@xxxxxxxxxxxxxxxx<mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>, Ward 
Fisher <wfisher@xxxxxxxx<mailto:wfisher@xxxxxxxx>>
Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5

Perhaps NetCDF (and other higher-level APIs that are built on top of HDF5) 
should include an attribute attached to the root group that identifies the name 
and version of the API that created the file?  (adopt this as a convention)

-john

On Mar 2, 2016, at 12:55 PM, Pedro Vicente 
<pedro.vicente@xxxxxxxxxxxxxxxxxx<mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>> 
wrote:
Hi Ward
As you know, Data Explorer is going to be a general purpose data reader for 
many formats, including HDF5 and netCDF.
Here
http://www.space-research.org/
Regarding the handling of both HDF5 and netCDF, it seems there is a potential 
issue, which is, how to tell if any HDF5 file was saved by the HDF5 API or by 
the netCDF API?
It seems to me that this is not possible. Is this correct?
netCDF uses an internal function NC_check_file_type to examine the first few 
bytes of a file, and for example for any HDF5 file the test is
/* Look at the magic number */
   /* Ignore the first byte for HDF */
   if(magic[1] == 'H' && magic[2] == 'D' && magic[3] == 'F') {
     *filetype = FT_HDF;
     *version = 5;
The problem is that this test works for any HDF5 file and for any netCDF file, 
which makes it impossible to tell which is which.
Which makes it impossible for any general purpose data reader to decide to use 
the netCDF API or the HDF5 API.
I have a possible solution for this , but before going any further, I would 
just like to confirm that
1)      Is indeed not possible
2)      See if you have a solid workaround for this, excluding the dumb ones, 
for example deciding on a extension .nc or .h5, or traversing the HDF5 file to 
see if it's non netCDF conforming one. Yes, to further complicate things, it is 
possible that the above test says OK for a HDF5 file, but then the read by the 
netCDF API fails because the file is a HDF5 non netCDF conformant
Thanks
----------------------
Pedro Vicente
pedro.vicente@xxxxxxxxxxxxxxxxxx<mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
http://www.space-research.org/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@xxxxxxxxxxxxxxxxxx<mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@xxxxxxxxxxxxxxxxxx<mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: