Re: [netcdfgroup] Detecting netCDF versus HDF5

  • To: Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
  • Subject: Re: [netcdfgroup] Detecting netCDF versus HDF5
  • From: Elena Pourmal <epourmal@xxxxxxxxxxxx>
  • Date: Mon, 25 Apr 2016 00:10:26 +0000
  • Authentication-results: space-research.org; dkim=none (message not signed) header.d=none;space-research.org; dmarc=none action=none header.from=hdfgroup.org;
  • Spamdiagnosticmetadata: NSPM
  • Spamdiagnosticoutput: 1:23
All,

I am probably missing something in this discussion. Since Pedro asked me to 
chime in and answer his question, I’ll try... [I am referring to Pedro’s 
initial question "Regarding the handling of both HDF5 and netCDF, it seems 
there is a potential issue, which is, how to tell if any HDF5 file was saved by 
the HDF5 API or by the netCDF API?”] 

netCDF-4 file is an HDF5 file. netCDF-4 is not a file format but a convention 
how to store data that is described by the netCDF-4 data model in HDF5.

I don’t think there is a solution to the problem which APIs wrote the file. One 
can write a pure C program that doesn’t call HDF5 or netCDF-4 library but 
writes an HDF5 file according to the HDF5 file format and to the netCDF-4 
convention making it a netCDF-4 file.

One should probably have a checker function that traverses an HDF5 file and 
tells if the file is compliant with the netCDF-4 convention. Adding attributes, 
etc., really will not help. I can add an attribute to a “non-netCDF-4" HDF5 
file and fool netCDF-4 library. I can also write netCDF-4 file using just pure 
HDF5 library by following convention of the netCDF-4 library. 

I think the tool should follow Common Data Model and shield data formats from 
the user. What I am missing? 

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org   
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




On Apr 24, 2016, at 6:08 PM, Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx> 
wrote:

> All
> 
> I posted some code on github that solves the issue for older netCDF files, 
> see below
> 
> In reply to previous comments
> 
> @ John Caron
> 
>>> Here are the blogs:
>>> 
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions
> 
> I had seen some of your blogs but not the one above.
> By looking at the netCDF code I came out with the code below that uses 
> detection
> of one of the "hidden" attributes described in that blog, and other one that 
> is not described
> 
> 
> @ David Brown
> 
>>> But this is not ideal, because we only
>> want to open files that are explicitly written using NetCDF4 as
>> NetCDF
> 
> Hi David, yes, that's the issue.
> 
> I think this piece of code I posted on github is the possible best solution
> for this:
> 
> https://github.com/pedro-vicente/netcdf-detect
> 
> @ Ed Hartnett
> 
> I wrote that code by reading the comments you wrote on the files nc4file.c 
> and nc4hdf.c
> 
> here
> 
> https://github.com/Unidata/netcdf-c/tree/master/libsrc4
> 
> do you agree with the solution?
> 
> 
> anyone feel free to use that code
> 
> the C function is called is_netcdf()
> 
> the netCDF API writes, if variables and dimensions are present in the file:
> 
> 1) an attribute named "_Netcdf4Dimid" (in some cases)
> 2) an attribute named "NAME", (always), saved by the HDF5 Dimension Scales 
> API,
> that contains the string "This is a netCDF dimension but not a netCDF 
> variable."
> 
> This utility tries to detect both attributes by traversing the HDF5 file, if 
> either case is found, it returns a value of 1
> 
> the program includes 3 test cases: 2 cases that generate a file with 1) and 
> 2) above (they are mutually exclusive, it seems)
> , and a third one that simply does
> 
> nc_create
> nc_close
> 
> in this case, the above attributes are not written, so the test will fail, 
> like someone posted here before.
> I would say if someone writes this kind of file, it is irrelevant using HDF5 
> or netCDF , the files are virtually identical
> 
> * another * case that would give a false positive is the case where someone 
> tries to be a spoiler and uses the HDF5 API
> to write these 2 attributes
> "_Netcdf4Dimid"
> "This is a netCDF dimension but not a netCDF variable."
> 
> The only real "spoiler proof" 100% solution is the SOLUTION 1 I posted before:
> to have HDF5 save a byte in the file that explicitly tells what kind of 
> derived API it is.
> This function would be a private HDF5 function called by the derived API, say 
> called on
> nc_create()
> So, it does not deal with attributes written by public APIs at all
> 
> @ Elena Pourmal
> 
> Hi Elena, how are you?
> 
> Any change of discussing this solution?
> 
> by the way some of my email on this thread sent to the hdf-forum last Friday 
> is waiting for approval
> 
> "Your mail to 'Hdf-forum' with the subject...
> Is being held until the list moderator can review it for approval.
> "
> 
> The hdf-forum now requires approval by a moderator?
> that does not work very well on weekends for example
> 
> 
> ----------------------
> Pedro Vicente
> pedro.vicente@xxxxxxxxxxxxxxxxxx
> https://twitter.com/_pedro__vicente
> http://www.space-research.org/
> 
> 
> 
> ----- Original Message ----- From: "David Brown" <dbrown@xxxxxxxx>
> To: <netcdfgroup@xxxxxxxxxxxxxxxx>
> Sent: Saturday, April 23, 2016 3:06 PM
> Subject: Re: [netcdfgroup] netcdfgroup Digest, Vol 1126, Issue 2
> 
> 
>> Since Pedro asked earlier about how NCL distinguishes between NetCDF4
>> and HDF5, I'm going to add my 2 cents to what now appears to be the
>> longest thread ever on this mailing list.
>> 
>> First a bit of background. Traditionally NCL has distinguished among
>> file formats based solely on file extensions. If a file name ends with
>> ".nc" then it is considered to be a NetCDF file and will be opened
>> using the NetCDF library calls. Additionally there is an idiosyncratic
>> feature where you can add an "virtual" extension to a file name to
>> specify the format you want to use. For example, if the file is name
>> "test", you can open it as "test.h5" to open it using HDF5 calls.
>> Given this name NCL will look first for a file called "test.h5" and if
>> that is not found then it will look for "test". You can even add
>> extensions to files that already have them to open a file using
>> another format: e.g. test.hdf.nc.
>> 
>> But recent versions of NCL attempt to figure out the format of files
>> that do not have recognized extensions. And that means we have
>> definitely run into the issue that Pedro originally brought up. We
>> want our HDF5 module to handle HDF5 files on their own terms,
>> including, e.g., recognizing reference types. For now, we first try to
>> see if the file can be opened using the NetCDF library, and if not, we
>> try various versions of HDF. But this is not ideal, because we only
>> want to open files that are explicitly written using NetCDF4 as
>> NetCDF. So it is indeed welcome news that there will be global
>> attributes added to explicitly identify the file as NetCDF4. However,
>> it also would be nice if nc_inq_format or nc_inq_format_extended could
>> be adjusted to give a definitive answer as to whether the file was
>> created as NetCDF4. I have to admit I was quite surprised to discover
>> that nc_inq_format_extended would not answer this seemingly obvious
>> (to me at least) question.
>> -Dave Brown
>> NCL technical architect
>> 
>> 
>> On Sat, Apr 23, 2016 at 10:21 AM,  <netcdfgroup-request@xxxxxxxxxxxxxxxx>
>> wrote:
>>> Send netcdfgroup mailing list submissions to
>>>        netcdfgroup@xxxxxxxxxxxxxxxx
>>> 
>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>        http://mailman.unidata.ucar.edu/mailman/listinfo/netcdfgroup
>>> or, via email, send a message with subject or body 'help' to
>>>        netcdfgroup-request@xxxxxxxxxxxxxxxx
>>> 
>>> You can reach the person managing the list at
>>>        netcdfgroup-owner@xxxxxxxxxxxxxxxx
>>> 
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of netcdfgroup digest..."
>>> 
>>> 
>>> Today's Topics:
>>> 
>>>   1. Re: [CF-metadata] [Hdf-forum] Detecting netCDF versus HDF5 --
>>>      PROPOSED SOLUTIONS --REQUEST FOR COMMENTS (John Caron)
>>> 
>>> 
>>> ----------------------------------------------------------------------
>>> 
>>> Message: 1
>>> Date: Fri, 22 Apr 2016 21:57:51 -0600
>>> From: John Caron <jcaron1129@xxxxxxxxx>
>>> To: Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>>> Cc: cf-metadata@xxxxxxxxxxxx,   NetCDF-Java community
>>>        <netcdf-java@xxxxxxxxxxxxxxxx>, netcdfgroup@xxxxxxxxxxxxxxxx
>>> Subject: Re: [netcdfgroup] [CF-metadata] [Hdf-forum] Detecting netCDF
>>>        versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>>> Message-ID:
>>> 
>>> <CAN1vDkp3iYVaBcEvoC8irp83AVKT85Mq+h75PWU_L-dExjWcMA@xxxxxxxxxxxxxx>
>>> Content-Type: text/plain; charset="utf-8"
>>> 
>>> Here are the blogs:
>>> 
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimensions_scales
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scale2
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/dimension_scales_part_3
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_shared_dimensions
>>> http://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf4_use_of_dimension_scales
>>> 
>>> On Fri, Apr 22, 2016 at 7:57 AM, Pedro Vicente <
>>> pedro.vicente@xxxxxxxxxxxxxxxxxx> wrote:
>>> 
>>>> John
>>>> 
>>>> >>>i have written various blogs on the unidata site about why netcdf4 !=
>>>> hdf5, and what the unique signature for shared dimensions looks like, in
>>>> >>>case you want details.
>>>> 
>>>> yes, I am interested, I had the impression by looking at the code some
>>>> years ago that netCDF writes some unique name attributes somewhere
>>>> 
>>>> ----------------------
>>>> Pedro Vicente
>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx
>>>> https://twitter.com/_pedro__vicente
>>>> http://www.space-research.org/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> *From:* John Caron <jcaron1129@xxxxxxxxx>
>>>> *To:* Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>>>> *Cc:* cf-metadata@xxxxxxxxxxxx ; Discussion forum for the NeXus data
>>>> format <nexus@xxxxxxxxxxxxxxx> ; netcdfgroup@xxxxxxxxxxxxxxxx ; Dennis
>>>> Heimbigner <dmh@xxxxxxxx> ; NetCDF-Java community
>>>> <netcdf-java@xxxxxxxxxxxxxxxx>
>>>> *Sent:* Thursday, April 21, 2016 11:11 PM
>>>> *Subject:* Re: [CF-metadata] [netcdfgroup] [Hdf-forum] Detecting netCDF
>>>> versus HDF5 -- PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>>>> 
>>>> 1) I completely agree with the idea of adding system metadata that
>>>> indicates the library version(s) that wrote the file.
>>>> 
>>>> 2) the way shared dimensions are implemented by netcdf4 is a unique
>>>> signature that would likely identify (100 - epsilon) % of real data
>>>> files
>>>> in the wild. One could add such detection to the netcdf4 and/or hdf5
>>>> libraries, and/or write a utility program to detect.
>>>> 
>>>> there are 2 variants:
>>>> 
>>>> 2.1) one could write a netcdf4 file without shared dimensions, though im
>>>> pretty sure no one does. but you could argue then that its fine to just
>>>> treat it as an hdf5 file and read through hdf5 library
>>>> 
>>>> 2.2) one could write a netcdf4 file with hdf5 library, if you knew what
>>>> you are doing. i have heard of this happening. but then you could argue
>>>> that its really a netcdf4 file and you should use netcdf library to read
>>>> .
>>>> 
>>>> i have written various blogs on the unidata site about why netcdf4 !=
>>>> hdf5, and what the unique signature for shared dimensions looks like, in
>>>> case you want details.
>>>> 
>>>> On Thu, Apr 21, 2016 at 4:18 PM, Pedro Vicente <
>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>>> If you have hdf5 files that should be readable, then I will undertake
>>>>> to
>>>>>> look at them and see what the problem is.
>>>>>> 
>>>>> 
>>>>> 
>>>>> ok, thank you
>>>>> 
>>>>> WRT to old files:  We could produce a utility that would redef the file
>>>>>> and insert the
>>>>>>     _NCProperties attribute. This would allow someone to wholesale
>>>>>>     mark old files.
>>>>>> 
>>>>> 
>>>>> 
>>>>> Excellent idea , Dennis
>>>>> 
>>>>> ----------------------
>>>>> Pedro Vicente
>>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx
>>>>> https://twitter.com/_pedro__vicente
>>>>> http://www.space-research.org/
>>>>> 
>>>>> 
>>>>> ----- Original Message ----- From: <dmh@xxxxxxxx>
>>>>> To: "Pedro Vicente" <pedro.vicente@xxxxxxxxxxxxxxxxxx>; <
>>>>> cf-metadata@xxxxxxxxxxxx>; "Discussion forum for the NeXus data format"
>>>>> <
>>>>> nexus@xxxxxxxxxxxxxxx>; <netcdfgroup@xxxxxxxxxxxxxxxx>
>>>>> Sent: Thursday, April 21, 2016 5:02 PM
>>>>> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus HDF5 --
>>>>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>>>>> 
>>>>> 
>>>>> If you have hdf5 files that should be readable, then I will undertake
>>>>> to
>>>>>> look at them and see what the problem is.
>>>>>> WRT to old files:  We could produce a utility that would redef the
>>>>>> file
>>>>>> and insert the
>>>>>>     _NCProperties attribute. This would allow someone to wholesale
>>>>>>     mark old files.
>>>>>> =Dennis Heimbigner
>>>>>>  Unidata
>>>>>> 
>>>>>> 
>>>>>> On 4/21/2016 2:17 PM, Pedro Vicente wrote:
>>>>>> 
>>>>>>> Dennis
>>>>>>> 
>>>>>>> I am in the process of adding a global attribute in the root group
>>>>>>>>>>> 
>>>>>>>>>> that captures both the netcdf library version and the hdf5 library
>>>>>>>> version
>>>>>>>> whenever a netcdf file is created. The current  form is
>>>>>>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ok, good to know, thank you
>>>>>>> 
>>>>>>> 
>>>>>>> > 1. I am open to suggestions about changing the format or adding
>>>>>>>>>> info > to it.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> I personally don't care, anything that uniquely identifies a netCDF
>>>>>>> file (HDF5 based) as such will work
>>>>>>> 
>>>>>>> 
>>>>>>> 2. Of course this attribute will not exist in files written using
>>>>>>> older
>>>>>>>>>> 
>>>>>>>>> versions of the netcdf library, but at least the process will have
>>>>>>>> begun.
>>>>>>>> 
>>>>>>> 
>>>>>>> yes
>>>>>>> 
>>>>>>> 
>>>>>>> 3. This technically does not address the original issue because there
>>>>>>>> exist
>>>>>>>>     hdf5 files  not written by netcdf that are still compatible
>>>>>>>> with
>>>>>>>> and can be
>>>>>>>>     read by netcdf. Not sure this case is important or not.
>>>>>>>> 
>>>>>>> 
>>>>>>> there will always be HDF5 files  not written by netcdf that netCDF
>>>>>>> will
>>>>>>> read as we are now.
>>>>>>> 
>>>>>>> this is not really the issue, but you just made a further issue :-)
>>>>>>> 
>>>>>>> the issue is that I would like an application that reads a netCDF
>>>>>>> (HDF5
>>>>>>> based) file to decide to use the netCDF or HDF5 API.
>>>>>>> your attribute writing will do , for future files.
>>>>>>> for older nertCDF files there may be  a way to detect the current
>>>>>>> attributes and data structures to see if we can make it "identify
>>>>>>> itself"
>>>>>>> as netCDF. A bit of debugging will confirm that, since Dimension
>>>>>>> Scales
>>>>>>> are used, that would be an (imperfect maybe) way to do it
>>>>>>> 
>>>>>>> regarding the "further issue " above
>>>>>>> 
>>>>>>> you could go one step further and for any HDF5 files  not written by
>>>>>>> netcdf , you could make netCDF reject the file reading,
>>>>>>> because it's not "netCDF compliant".
>>>>>>> Since having netCDF read pure HDF5 files is not a problem (at least
>>>>>>> for
>>>>>>> me), I don't know if you would want to do this, just an idea.
>>>>>>> In my mind taking complexity and ambiguities of problems is always a
>>>>>>> good thing
>>>>>>> 
>>>>>>> 
>>>>>>> ah, I forgot one thing, related to this
>>>>>>> 
>>>>>>> 
>>>>>>> In the past I have found several pure HDF5 files that netCDF failed
>>>>>>> in
>>>>>>> reading.
>>>>>>> Since netCDF is HDF5 binary compatible, one would expect that all
>>>>>>> HDF5
>>>>>>> files will be read by netCDF.
>>>>>>> Except if you specifically wrote something in the code that makes it
>>>>>>> to
>>>>>>> fail if some condition is not met,
>>>>>>> This was a while ago, I'll try to find those cases and I'll send a
>>>>>>> bug
>>>>>>> report to the bug report email
>>>>>>> 
>>>>>>> ----------------------
>>>>>>> Pedro Vicente
>>>>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx
>>>>>>> https://twitter.com/_pedro__vicente
>>>>>>> http://www.space-research.org/
>>>>>>> 
>>>>>>> ----- Original Message ----- From: <dmh@xxxxxxxx>
>>>>>>> To: "Pedro Vicente" <pedro.vicente@xxxxxxxxxxxxxxxxxx>; "HDF Users
>>>>>>> Discussion List" <hdf-forum@xxxxxxxxxxxxxxxxxx>; <
>>>>>>> cf-metadata@xxxxxxxxxxxx>; "Discussion forum for the NeXus data
>>>>>>> format" <nexus@xxxxxxxxxxxxxxx>; <netcdfgroup@xxxxxxxxxxxxxxxx>
>>>>>>> Cc: "John Shalf" <jshalf@xxxxxxx>; <Richard.E.Ullman@xxxxxxxx>;
>>>>>>> "Marinelli, Daniel J. (GSFC-5810)" <daniel.j.marinelli@xxxxxxxx>;
>>>>>>> "Miller, Mark C." <miller86@xxxxxxxx>
>>>>>>> Sent: Thursday, April 21, 2016 2:30 PM
>>>>>>> Subject: Re: [netcdfgroup] [Hdf-forum] Detecting netCDF versus
>>>>>>> HDF5 --
>>>>>>> PROPOSED SOLUTIONS --REQUEST FOR COMMENTS
>>>>>>> 
>>>>>>> 
>>>>>>> I am in the process of adding a global attribute in the root group
>>>>>>>> that captures both the netcdf library version and the hdf5 library
>>>>>>>> version
>>>>>>>> whenever a netcdf file is created. The current  form is
>>>>>>>> _NCProperties="version=...|netcdflibversion=...|hdflibversion=..."
>>>>>>>> Where version is the version of the _NCProperties attribute and the
>>>>>>>> others
>>>>>>>> are e.g. 1.8.18 or 4.4.1-rc1.
>>>>>>>> Issues:
>>>>>>>> 1. I am open to suggestions about changing the format or adding info
>>>>>>>> to it.
>>>>>>>> 2. Of course this attribute will not exist in files written using
>>>>>>>> older versions
>>>>>>>>    of the netcdf library, but at least the process will have begun .
>>>>>>>> 3. This technically does not address the original issue because
>>>>>>>> there
>>>>>>>> exist
>>>>>>>>     hdf5 files  not written by netcdf that are still compatible
>>>>>>>> with
>>>>>>>> and can be
>>>>>>>>     read by netcdf. Not sure this case is important or not.
>>>>>>>> =Dennis Heimbigner
>>>>>>>>   Unidata
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 4/21/2016 9:33 AM, Pedro Vicente wrote:
>>>>>>>> 
>>>>>>>>> DETECTING HDF5 VERSUS NETCDF GENERATED FILES
>>>>>>>>> REQUEST FOR COMMENTS
>>>>>>>>> AUTHOR: Pedro Vicente
>>>>>>>>> 
>>>>>>>>> AUDIENCE:
>>>>>>>>> 1) HDF, netcdf developers,
>>>>>>>>> Ed Hartnett
>>>>>>>>> Kent Yang
>>>>>>>>> 2) HDF, netcdf users, that replied to this thread
>>>>>>>>> Miller, Mark C.
>>>>>>>>> John Shalf
>>>>>>>>> 3 ) netcdf tools developers
>>>>>>>>> Mary Haley  , NCL
>>>>>>>>> 4) HDF, netcdf managers and sponsors
>>>>>>>>> David Pearah  , CEO HDF Group
>>>>>>>>> Ward Fisher, UCAR
>>>>>>>>> Marinelli, Daniel J. , Richard Ullmman, Christopher Lynnes, NASA
>>>>>>>>> 5)
>>>>>>>>> [CF-metadata] list
>>>>>>>>> After this thread started 2 months ago, there was an annoucement on
>>>>>>>>> the [CF-metadata] mail list
>>>>>>>>> about
>>>>>>>>> "a meeting to discuss current and future netCDF-CF efforts and
>>>>>>>>> directions.
>>>>>>>>> The meeting will be held on 24-26 May 2016 in Boulder, CO, USA at
>>>>>>>>> the
>>>>>>>>> UCAR Center Green facility."
>>>>>>>>> This would be a good topic to put on the agenda, maybe?
>>>>>>>>> THE PROBLEM:
>>>>>>>>> Currently it is impossible to detect if an HDF5 file was generated
>>>>>>>>> by
>>>>>>>>> the HDF5 API or by the netCDF API.
>>>>>>>>> See previous email about the reasons why.
>>>>>>>>> WHY THIS MATTERS:
>>>>>>>>> Software applications that need to handle both netCDF and HDF5
>>>>>>>>> files
>>>>>>>>> cannot decide which API to use.
>>>>>>>>> This includes popular visualization tools like IDL, Matlab, NCL,
>>>>>>>>> HDF
>>>>>>>>> Explorer.
>>>>>>>>> SOLUTIONS PROPOSED: 2
>>>>>>>>> SOLUTION 1: Add a flag to HDF5 source
>>>>>>>>> The hdf5 format specification, listed here
>>>>>>>>> https://www.hdfgroup.org/HDF5/doc/H5.format.html
>>>>>>>>> describes a sequence of bytes in the file layout that have special
>>>>>>>>> meaning for the HDF5 API. It is common practice, when designing a
>>>>>>>>> data
>>>>>>>>> format,
>>>>>>>>> so leave some fields "reserved for future use".
>>>>>>>>> This solution makes use of one of these empty  "reserved for future
>>>>>>>>> use" spaces to save a byte (for example) that describes an
>>>>>>>>> enumerator
>>>>>>>>> of "HDF5 compatible formats".
>>>>>>>>> An "HDF5 compatible format" is a data format that uses the HDF5 API
>>>>>>>>> at a lower level (usually hidden from the user of the upper API),
>>>>>>>>> and providing its own API.
>>>>>>>>> This category can still be divide in 2 formats:
>>>>>>>>> 1) A "pure HDF5 compatible format". Example, NeXus
>>>>>>>>> http://www.nexusformat.org/
>>>>>>>>> NeXus just writes some metadata (attributes) on top of the HDF5
>>>>>>>>> API,
>>>>>>>>> that has some special meaning for the NeXus community
>>>>>>>>> 2) A "non pure HDF5 compatible format". Example, netCDF
>>>>>>>>> Here, the format adds some extra feature besides HDF5. In the case
>>>>>>>>> of
>>>>>>>>> netCDF, these are shared dimensions between variables.
>>>>>>>>> This sub-division between 1) and 2) is irrelevant for the problem
>>>>>>>>> and
>>>>>>>>> solution in question
>>>>>>>>> The solution consists of writing a different enumerator value on
>>>>>>>>> the
>>>>>>>>> "reserved for future use" space. For example
>>>>>>>>> Value decimal 0 (current value): This file was generated by the
>>>>>>>>> HDF5
>>>>>>>>> API (meaning the HDF5 only API)
>>>>>>>>> Value decimal 1: This file was generated by the netCDF API (using
>>>>>>>>> HDF5)
>>>>>>>>> Value decimal 2: This file was generated by <put here another HDF5
>>>>>>>>> based format>
>>>>>>>>> and so on
>>>>>>>>> The advantage of this solution is that this process involves 2
>>>>>>>>> parties: the HDF Group and the other format's organization.
>>>>>>>>> This allows the HDF Group to "keep track" of new HDF5 based formats
>>>>>>>>> .
>>>>>>>>> It allows to make the other format "HDF5 certified" .
>>>>>>>>> SOLUTION 2: Add some metadata to the other API on top of HDF5
>>>>>>>>> This is what Nexus uses.
>>>>>>>>> A Nexus file on creation writes several attributes on the root
>>>>>>>>> group,
>>>>>>>>> like "NeXus_version" and other numeric data.
>>>>>>>>> This is done using the public HDF5 API calls.
>>>>>>>>> The solution for netCDF consists of the same approach, just write
>>>>>>>>> some specific attributes, and a special netCDF API to write/read
>>>>>>>>> them.
>>>>>>>>> This solutions just requires the work of one party (the netCDF
>>>>>>>>> group)
>>>>>>>>> END OF RFC
>>>>>>>>> In reply to people that commented in the thread
>>>>>>>>> @John Shalf
>>>>>>>>> >>Perhaps NetCDF (and other higher-level APIs that are built on top
>>>>>>>>> >>of
>>>>>>>>> HDF5) should include an attribute attached
>>>>>>>>> >>to the root group that identifies the name and version of the API
>>>>>>>>> that created the file?  (adopt this as a convention)
>>>>>>>>> yes, that's one way to do it, Solution 2 above
>>>>>>>>> @Mark Miller
>>>>>>>>> >>>Hmmm. Is there any big reason NOT to try to read a netCDF
>>>>>>>>> >>>produced
>>>>>>>>> HDF5 file with the native HDF5 library if someone so chooses?
>>>>>>>>> It's possible to read a netCDF file using HDF5, yes.
>>>>>>>>> There are 2 things that you will miss doing this:
>>>>>>>>> 1) the ability to inquire about shared netCDF dimensions.
>>>>>>>>> 2) the ability to read remotely with openDAP.
>>>>>>>>> Reading with HDF5 also exposes metadata that is supposed to be
>>>>>>>>> private to netCDF. See below
>>>>>>>>> >>>> And, attempting  to read an HDF5 file produced by Silo using
>>>>>>>>> >>>> just
>>>>>>>>> the HDF5 library (e.g. w/o Silo) is a major pain.
>>>>>>>>> This I don't understand. Why not read the Silo file with the Silo
>>>>>>>>> API?
>>>>>>>>> That's the all purpose of this issue, each higher level API on top
>>>>>>>>> of
>>>>>>>>> HDF5 should be able to detect "itself".
>>>>>>>>> I am not familiar with Silo, but if Silo cannot do this, then you
>>>>>>>>> have the same design flaw that netCDF has.
>>>>>>>>> 
>>>>>>>>> >>> In a cursory look over the libsrc4 sources in netCDF distro, I
>>>>>>>>> >>> see
>>>>>>>>> a few things that might give a hint a file was created with netCDF.
>>>>>>>>> .
>>>>>>>>> .
>>>>>>>>> >>>> First, in NC_CLASSIC_MODEL, an attribute gets attached to the
>>>>>>>>> root group named "_nc3_strict". So, the existence of an attribute
>>>>>>>>> on
>>>>>>>>> the root group by that name would suggest the HDF5 file was
>>>>>>>>> generated by
>>>>>>>>> netCDF.
>>>>>>>>> I think this is done only by the "old" netCDF3 format.
>>>>>>>>> >>>>> Also, I tested a simple case of nc_open, nc_def_dim, etc.
>>>>>>>>> nc_close to see what it produced.
>>>>>>>>> >>>> It appears to produce datasets for each 'dimension' defined
>>>>>>>>> >>>> with
>>>>>>>>> two attributes named "CLASS" and "NAME".
>>>>>>>>> This is because netCDF uses the HDF5 Dimension Scales API
>>>>>>>>> internally
>>>>>>>>> to keep track of shared dimensions. These are internal attributes
>>>>>>>>> of Dimension Scales. This approach would not work because an HDF5
>>>>>>>>> only file with Dimension Scales would have the same attributes.
>>>>>>>>> 
>>>>>>>>> >>>> I like John's suggestion here.
>>>>>>>>> >>>>>But, any code you add to any applications now will work *only*
>>>>>>>>> for files that were produced post-adoption of this convention.
>>>>>>>>> yes. there are 2 actions to take here.
>>>>>>>>> 1) fix the issue for the future
>>>>>>>>> 2) try to retroactively have some workaround that makes possible
>>>>>>>>> now
>>>>>>>>> to differentiate a HDF5/netCDF files made before the adopted
>>>>>>>>> convention
>>>>>>>>> see below
>>>>>>>>> 
>>>>>>>>> >>>> In VisIt, we support >140 format readers. Over 20 of those are
>>>>>>>>> different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
>>>>>>>>> Samrai,
>>>>>>>>> netCDF, Flash, Enzo, Chombo, etc., etc.)
>>>>>>>>> >>>>When opening a file, how does VisIt figure out which plugin to
>>>>>>>>> use? In particular, how do we avoid one poorly written reader
>>>>>>>>> plugin
>>>>>>>>> (which may be the wrong one for a given file) from preventing the
>>>>>>>>> correct
>>>>>>>>> one from being found. Its kinda a hard problem.
>>>>>>>>> 
>>>>>>>>> Yes, that's the problem we are trying to solve. I have to say, that
>>>>>>>>> is quick a list of HDF5 based formats there.
>>>>>>>>> >>>> Some of our discussion is captured here. . .
>>>>>>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>>>>>>>> I"ll check it out, thank you for the suggestions
>>>>>>>>> @Ed Hartnett
>>>>>>>>> >>>I must admit that when putting netCDF-4 together I never
>>>>>>>>> >>>considered
>>>>>>>>> that someone might want to tell the difference between a "native"
>>>>>>>>> HDF5 file and a netCDF-4/HDF5 file.
>>>>>>>>> >>>>>Well, you can't think of everything.
>>>>>>>>> This is a major design flaw.
>>>>>>>>> If you are in the business of designing data file formats, one of
>>>>>>>>> the
>>>>>>>>> things you have to do is how to make it possible to identify it
>>>>>>>>> from the
>>>>>>>>> other formats.
>>>>>>>>> 
>>>>>>>>> >>> I agree that it is not possible to canonically tell the
>>>>>>>>> difference. The netCDF-4 API does use some special attributes to
>>>>>>>>> track named dimensions,
>>>>>>>>> >>>>and to tell whether classic mode should be enforced. But it can
>>>>>>>>> easily produce files without any named dimensions, etc.
>>>>>>>>> >>>So I don't think there is any easy way to tell.
>>>>>>>>> I remember you wrote that code together with Kent Yang from the HDF
>>>>>>>>> Group.
>>>>>>>>> At the time I was with the HDF Group but unfortunately I did follow
>>>>>>>>> closely what you were doing.
>>>>>>>>> I don't remember any design document being circulated that explains
>>>>>>>>> the internals of the "how to" make the netCDF (classic) model of
>>>>>>>>> shared
>>>>>>>>> dimensions
>>>>>>>>> use the hierarchical group model of HDF5.
>>>>>>>>> I know this was done using the HDF5 Dimension Scales (that I
>>>>>>>>> wrote),
>>>>>>>>> but is there any design document that explains it?
>>>>>>>>> Maybe just some internal email exchange between you and Kent Yang?
>>>>>>>>> Kent, how are you?
>>>>>>>>> Do you remember having any design document that explains this?
>>>>>>>>> Maybe something like a unique private attribute that is written
>>>>>>>>> somewhere in the netCDF file?
>>>>>>>>> 
>>>>>>>>> @Mary Haley, NCL
>>>>>>>>> NCL is a widely used tool that handles both netCDF and HDF5
>>>>>>>>> Mary, how are you?
>>>>>>>>> How does NCL deal with the case of reading both pure HDF5 files and
>>>>>>>>> netCDF files that use HDF5?
>>>>>>>>> Would you be interested in joining a community based effort to deal
>>>>>>>>> with this, in case this is an issue for you?
>>>>>>>>> 
>>>>>>>>> @David Pearah  , CEO HDF Group
>>>>>>>>> I volunteer to participate in the effort of this RFC together with
>>>>>>>>> the HDF Group (and netCDF Group).
>>>>>>>>> Maybe we could make a "task force" between HDF Group, netCDF Group
>>>>>>>>> and any volunteer (such as tools developers that happen to be in
>>>>>>>>> these mail
>>>>>>>>> lists)?
>>>>>>>>> The "task force" would have 2 tasks:
>>>>>>>>> 1) make a HDF5 based convention for the future and
>>>>>>>>> 2) try to retroactively salvage the current design issue of netCDF
>>>>>>>>> My phone is 217-898-9356, you are welcome to call in anytime.
>>>>>>>>> ----------------------
>>>>>>>>> Pedro Vicente
>>>>>>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx <mailto:
>>>>>>>>> pedro.vicente@xxxxxxxxxxxxxxxxxx>
>>>>>>>>> https://twitter.com/_pedro__vicente
>>>>>>>>> http://www.space-research.org/
>>>>>>>>> 
>>>>>>>>>    ----- Original Message -----
>>>>>>>>>    *From:* Miller, Mark C. <mailto:miller86@xxxxxxxx>
>>>>>>>>>    *To:* HDF Users Discussion List <mailto:
>>>>>>>>> hdf-forum@xxxxxxxxxxxxxxxxxx>
>>>>>>>>>    *Cc:* netcdfgroup@xxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:netcdfgroup@xxxxxxxxxxxxxxxx> ; Ward Fisher
>>>>>>>>>    <mailto:wfisher@xxxxxxxx>
>>>>>>>>>    *Sent:* Wednesday, March 02, 2016 7:07 PM
>>>>>>>>>    *Subject:* Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>>>>>>> 
>>>>>>>>>    I like John's suggestion here.
>>>>>>>>> 
>>>>>>>>>    But, any code you add to any applications now will work *only*
>>>>>>>>> for
>>>>>>>>>    files that were produced post-adoption of this convention.
>>>>>>>>> 
>>>>>>>>>    There are probably a bazillion files out there at this point
>>>>>>>>> that
>>>>>>>>>    don't follow that convention and you probably still want your
>>>>>>>>>    applications to be able to read them.
>>>>>>>>> 
>>>>>>>>>    In VisIt, we support >140 format readers. Over 20 of those are
>>>>>>>>>    different variants of HDF5 files (H5part, Xdmf, Pixie, Silo,
>>>>>>>>>    Samrai, netCDF, Flash, Enzo, Chombo, etc., etc.) When opening a
>>>>>>>>>    file, how does VisIt figure out which plugin to use? In
>>>>>>>>>    particular, how do we avoid one poorly written reader plugin
>>>>>>>>>    (which may be the wrong one for a given file) from preventing
>>>>>>>>> the
>>>>>>>>>    correct one from being found. Its kinda a hard problem.
>>>>>>>>> 
>>>>>>>>>    Some of our discussion is captured here. . .
>>>>>>>>> 
>>>>>>>>> http://www.visitusers.org/index.php?title=Database_Format_Detection
>>>>>>>>> 
>>>>>>>>>    Mark
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>    From: Hdf-forum <hdf-forum-bounces@xxxxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:hdf-forum-bounces@xxxxxxxxxxxxxxxxxx>> on behalf of
>>>>>>>>> John
>>>>>>>>>    Shalf <jshalf@xxxxxxx <mailto:jshalf@xxxxxxx>>
>>>>>>>>>    Reply-To: HDF Users Discussion List
>>>>>>>>> <hdf-forum@xxxxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>
>>>>>>>>>    Date: Wednesday, March 2, 2016 1:02 PM
>>>>>>>>>    To: HDF Users Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:hdf-forum@xxxxxxxxxxxxxxxxxx>>
>>>>>>>>>    Cc: "netcdfgroup@xxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>"
>>>>>>>>>    <netcdfgroup@xxxxxxxxxxxxxxxx
>>>>>>>>>    <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>, Ward Fisher
>>>>>>>>>    <wfisher@xxxxxxxx <mailto:wfisher@xxxxxxxx>>
>>>>>>>>>    Subject: Re: [Hdf-forum] Detecting netCDF versus HDF5
>>>>>>>>> 
>>>>>>>>>        Perhaps NetCDF (and other higher-level APIs that are built
>>>>>>>>> on
>>>>>>>>>        top of HDF5) should include an attribute attached to the
>>>>>>>>> root
>>>>>>>>>        group that identifies the name and version of the API that
>>>>>>>>>        created the file?  (adopt this as a convention)
>>>>>>>>> 
>>>>>>>>>        -john
>>>>>>>>> 
>>>>>>>>>            On Mar 2, 2016, at 12:55 PM, Pedro Vicente
>>>>>>>>>            <pedro.vicente@xxxxxxxxxxxxxxxxxx
>>>>>>>>> <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>> wrote:
>>>>>>>>>            Hi Ward
>>>>>>>>>            As you know, Data Explorer is going to be a general
>>>>>>>>>            purpose data reader for many formats, including HDF5
>>>>>>>>> and
>>>>>>>>>            netCDF.
>>>>>>>>>            Here
>>>>>>>>>            http://www.space-research.org/
>>>>>>>>>            Regarding the handling of both HDF5 and netCDF, it
>>>>>>>>> seems
>>>>>>>>>            there is a potential issue, which is, how to tell if
>>>>>>>>> any
>>>>>>>>>            HDF5 file was saved by the HDF5 API or by the netCDF
>>>>>>>>> API?
>>>>>>>>>            It seems to me that this is not possible. Is this
>>>>>>>>> correct?
>>>>>>>>>            netCDF uses an internal function NC_check_file_type to
>>>>>>>>>            examine the first few bytes of a file, and for example
>>>>>>>>> for
>>>>>>>>>            any HDF5 file the test is
>>>>>>>>>            /* Look at the magic number */
>>>>>>>>>               /* Ignore the first byte for HDF */
>>>>>>>>>               if(magic[1] == 'H' && magic[2] == 'D' && magic[3] ==
>>>>>>>>> 'F') {
>>>>>>>>>                 *filetype = FT_HDF;
>>>>>>>>>                 *version = 5;
>>>>>>>>>            The problem is that this test works for any HDF5 file
>>>>>>>>> and
>>>>>>>>>            for any netCDF file, which makes it impossible to tell
>>>>>>>>>            which is which.
>>>>>>>>>            Which makes it impossible for any general purpose data
>>>>>>>>>            reader to decide to use the netCDF API or the HDF5 API .
>>>>>>>>>            I have a possible solution for this , but before going
>>>>>>>>> any
>>>>>>>>>            further, I would just like to confirm that
>>>>>>>>>            1)      Is indeed not possible
>>>>>>>>>            2)      See if you have a solid workaround for this,
>>>>>>>>>            excluding the dumb ones, for example deciding on a
>>>>>>>>>            extension .nc or .h5, or traversing the HDF5 file to
>>>>>>>>> see
>>>>>>>>>            if it's non netCDF conforming one. Yes, to further
>>>>>>>>>            complicate things, it is possible that the above test
>>>>>>>>> says
>>>>>>>>>            OK for a HDF5 file, but then the read by the netCDF API
>>>>>>>>>            fails because the file is a HDF5 non netCDF conformant
>>>>>>>>>            Thanks
>>>>>>>>>            ----------------------
>>>>>>>>>            Pedro Vicente
>>>>>>>>>            pedro.vicente@xxxxxxxxxxxxxxxxxx
>>>>>>>>>            <mailto:pedro.vicente@xxxxxxxxxxxxxxxxxx>
>>>>>>>>>            http://www.space-research.org/
>>>>>>>>>            _______________________________________________
>>>>>>>>>            Hdf-forum is for HDF software users discussion.
>>>>>>>>>            Hdf-forum@xxxxxxxxxxxxxxxxxx
>>>>>>>>>            <mailto:Hdf-forum@xxxxxxxxxxxxxxxxxx>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>>>>>>>            Twitter: https://twitter.com/hdf5
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>        _______________________________________________
>>>>>>>>>        Hdf-forum is for HDF software users discussion.
>>>>>>>>>        Hdf-forum@xxxxxxxxxxxxxxxxxx <mailto:
>>>>>>>>> Hdf-forum@xxxxxxxxxxxxxxxxxx>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://secure-web.cisco.com/1r-EJFFfg6rWlpQsvXstBNTjaHQaKT_NkYRN0Jj_f-Z3EK0-hs6IbYc8XUBRyPsH3mU3CS0iiY7_qnchCA0QxNzQt270d_2HikCwpAWFmuHdacin62eaODutktDSOULIJmVbVYqFVSKWPzoX7kdP0yN9wIzSFxZfTwfhU8ebsN409xRg1PsW_8cvNiWzxDNm9wv9yBf9yK6nkEm-bOx2S0kBLbg9WfIChWzZrkpE3AHU9I-c2ZRH_IN-UF4g_g0_Dh4qE1VETs7tZTfKd1ox1MtBmeyKf7EKUCd3ezR9EbI5tK4hCU5qW4v5WWOxOrD17e8yCVmob27xz84Lr3bCK5wIQdH5VzFRTtyaAhudpt9E/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
>>>>>>>>>        Twitter: https://twitter.com/hdf5
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>>    _______________________________________________
>>>>>>>>>    Hdf-forum is for HDF software users discussion.
>>>>>>>>>    Hdf-forum@xxxxxxxxxxxxxxxxxx
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>>>>>>>>    Twitter: https://twitter.com/hdf5
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> netcdfgroup mailing list
>>>>>>>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>>>>>>>> For list information or to unsubscribe,  visit:
>>>>>>>>> http://www.unidata.ucar.edu/mailing_lists/
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> _______________________________________________
>>>>> CF-metadata mailing list
>>>>> CF-metadata@xxxxxxxxxxxx
>>>>> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>>>>> 
>>>> 
>>>> 
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL:
>>> <http://mailman.unidata.ucar.edu/mailing_lists/archives/netcdfgroup/attachments/20160422/f64faad2/attachment.html>
>>> 
>>> ------------------------------
>>> 
>>> _______________________________________________
>>> netcdfgroup mailing list
>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>> 
>>> End of netcdfgroup Digest, Vol 1126, Issue 2
>>> ********************************************
>> 
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>> 
> 



  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: