Re: [netcdfgroup] How to dump netCDF to JSON?

Hey,

The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and docs are 
here:  http://hdf5-json.readthedocs.io/en/latest/.

The package is both a library of HFD5 <-> JSON conversion functions and some 
simple scripts for converting HDF5 to JSON and vice-versa.  E.g.
$ python h5tojson.py –D <hdf5-file>
outputs JSON minus the dataset data values.

While it may not be the most elegant JSON schema, it’s designed with the 
following goals in mind:

1.       Complete fidelity to all HDF5 features (i.e. the goal is that you 
should be able to take any HDF5 files, convert it to JSON, convert back to HDF5 
and wind up with a file that is semantically equivalent to what you started 
with.

2.       Support graphs that are not acyclic.  I.e. a group structure like 
<root> links with A, and B.  And A and B links to C.  The output should only 
produce one representation of C.
Since NetCDF doesn’t use all these features, it’s certainly possible to come up 
with something simpler for just netCDF files.

Suggestions, feedback, and pull requests are welcome!

Cheers,
John

From: Chris Barker <chris.barker@xxxxxxxx>
Date: Friday, October 14, 2016 at 12:32 PM
To: Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
Cc: netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, Charlie Zender 
<zender@xxxxxxx>, John Readey <jreadey@xxxxxxxxxxxx>, HDF Users Discussion List 
<hdf-forum@xxxxxxxxxxxxxxxxxx>, David Pearah <David.Pearah@xxxxxxxxxxxx>
Subject: Re: [netcdfgroup] How to dump netCDF to JSON?

Pedro,

When I first started reading this thread, I thought "there should be a spec for 
how to represent netcdf in JSON"

and then I read:

1) The specification to convert netCDF/HDF5 to "a" JSON format (note the "a" 
here)

Awesome -- that's exactly what we need -- as you say there is not one way to 
represent netcdf data in JSON, and probably far more than one "obvious" way.

Without looking at your spec yet, I do think it should probably look as much 
like CDL as possible -- we are all familiar with that.

(why Python? HDF5 developer tools should be all about writing in C/C++)

Because Python is an excellent language with which to "drive" C/C++ libraries 
like HDF5 and netcdf4. If I were to do this, I'd sure use Python. Even if you 
want to get to a C++ implementation eventually, you'd probably benefit from 
prototyping and working out the kinks with a Python version first.

But whoever is writing the code....


The specification is here

http://www.space-research.org/

Just took a quick look -- nice start.

I've only used HDF through the netcdf4 spec, so there may be richness needed 
that I'm missing, but my first thought is to a make greater use of "objects" in 
JSON (key-value structures, hash tables, dicts in python), rather than array 
position for heterogeneous structures. For instance, you have:

 a dataset

{
"dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 
11, 12]]
}

I would perhaps do that as something like:

{
...
"dset1":{"object_type": "dataset",
         "dtype": "INT32"
         "rank": 2,
         "dimensions": [3,4],
         "data": [[1,2,3,4],
                  [5,6,7,8],
                  [9,10,11,12]]
         }
...
}

NOTES:

* I used nested arrays, rather than flattening the 2-d array -- this maps 
nicely to things like numpy arrays, for example -- not sure about the C++ 
world. (you can flatten and un-flatten numpy arrays easily, too, but this seems 
like a better mapping to the structure) And HDF is storing this all in chunks 
and who knows what -- so it's not a direct mapping to the memory layout anyway.

* Do you need "rank"? -- can't you check the length of the dimensions array?

* Do you  need "object_type" -- will it always be a dataset? Or you could have 
something like:

{
...
"datasets": {"dset1": {the actual dataset object},
             "dset2": {another dataset object},
 ....
}

Then you don't need object_type or a name


(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)

I would like to make this some kind of "official" netCDF/HDF5 JSON format for 
the community, so I encourage anyone to read the specification

If you see any flaw in the design or anything in the design that you would like 
to have change please let me know now

done :-)

It would be really great to have this become an "official" spec -- if you want 
to get it there, you're probably going to need to develop it more out in the 
open with a wider community. These lists are the way to get that started, but I 
suggest:

1) put it up somewhere that people can collaborate on it, make suggestions, 
capture the discussion, etc. gitHub is one really nice way to do that. See, for 
example the UGRID spec project:

  https://github.com/ugrid-conventions/ugrid-conventions

(NOTE that that one got put on gitHub after there was a pretty complete draft 
spec, so there isn't THAT much discussion captured. But also note that that is 
too bad -- there is no good record of the decision process that led to the spec)

At the moment it only (intentionally) uses common generic features of both 
netCDF and HDF5, which are the numeric atomic types and strings.

Good plan.

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx<mailto:Chris.Barker@xxxxxxxx>
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: