Re: [netcdfgroup] How to dump netCDF to JSON?

On Thu, Oct 20, 2016 at 12:02 PM, John Readey <jreadey@xxxxxxxxxxxx> wrote:

> So we came up with a scheme of Group, Dataset, and Datatype collections
> with a UUID to identify each object.  That way if you a reference to a
> specific UUID, you can always access the object regardless of what
> shenanigans may be happening with the links in the file.
>
>
>
> It’s true that this makes path look ups a bit more cumbersome, but it’s a
> more general way of specify a directed graph (the HDF5 link structure) on a
> tree (the JSON hierarchy).
>

Hmm -- interesting. I hadn't realized that HDF was this flexible. For my
part, I've only really used netcdf.

This is making me think that we may want a spec for netcdf-json that would
be a subset of the hdf-json spec.

That way they can be as compatible as possible without "cluttering up" the
netcdf spec too much.

-CHB





>
>
> John
>
>
>
> *From: *Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
> *Date: *Tuesday, October 18, 2016 at 9:37 PM
> *To: *John Readey <jreadey@xxxxxxxxxxxx>, Chris Barker <
> chris.barker@xxxxxxxx>
> *Cc: *netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, HDF Users
> Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx>
>
> *Subject: *Re: [netcdfgroup] How to dump netCDF to JSON?
>
>
>
> @John
>
>
>
> >> 1.       Complete fidelity to all HDF5 features
>
> >> 2.       Support graphs that are not acyclic.
>
>
>
> ok, understood.
>
>
>
> In my case I needed a simple schema for a particular set of files.
>
>
>
> But why didn't you start with the official HDF5 DDL
>
>
>
> https://support.hdfgroup.org/HDF5/doc/ddl.html
>
>
>
> and try to adapt to JSON?
>
>
>
> Same thing for netCDF, there is already an official CDL, so any JSON spec
> should be "identical".
>
>
>
>
>
>
>
> @Chris
>
>
>
> {
> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10, 11, 12]]
> }
>
>
>
> >> * Do you need "rank"?
>
>
>
> sometimes a bit of redundancy is useful, to make it visually clear
>
>
>
> >> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
>
>
>
> yes
>
>
>
> >>It would be really great to have this become an "official" spec -- if
> you want to get it there, you're probably going to need to develop it more
> out in the open with a wider community. These lists are the way to get that
> started, but I suggest
>
> >>1) put it up somewhere that people can collaborate on it, make
> suggestions, capture the discussion, etc. gitHub is one really nice way to
> do that. See, for example the UGRID spec project:
>
>
>
>
>
> ok, anyone interested send me an off list  email
>
>
>
>
>
> -Pedro
>
>
>
>
>
>
>
> ----- Original Message -----
>
> *From:* John Readey <jreadey@xxxxxxxxxxxx>
>
> *To:* Chris Barker <chris.barker@xxxxxxxx> ; Pedro Vicente
> <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>
> *Cc:* netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx> ; Charlie Zender
> <zender@xxxxxxx> ; HDF Users Discussion List
> <hdf-forum@xxxxxxxxxxxxxxxxxx> ; David Pearah <David.Pearah@xxxxxxxxxxxx>
>
> *Sent:* Tuesday, October 18, 2016 11:15 PM
>
> *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?
>
>
>
> Hey,
>
>
>
> The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and
> docs are here:  http://hdf5-json.readthedocs.io/en/latest/.
>
>
>
> The package is both a library of HFD5 <-> JSON conversion functions and
> some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.
>
> $ python h5tojson.py –D <hdf5-file>
>
> outputs JSON minus the dataset data values.
>
>
>
> While it may not be the most elegant JSON schema, it’s designed with the
> following goals in mind:
>
> 1.       Complete fidelity to all HDF5 features (i.e. the goal is that
> you should be able to take any HDF5 files, convert it to JSON, convert back
> to HDF5 and wind up with a file that is semantically equivalent to what you
> started with.
>
> 2.       Support graphs that are not acyclic.  I.e. a group structure
> like <root> links with A, and B.  And A and B links to C.  The output
> should only produce one representation of C.
>
> Since NetCDF doesn’t use all these features, it’s certainly possible to
> come up with something simpler for just netCDF files.
>
>
>
> Suggestions, feedback, and pull requests are welcome!
>
>
>
> Cheers,
>
> John
>
>
>
> *From: *Chris Barker <chris.barker@xxxxxxxx>
> *Date: *Friday, October 14, 2016 at 12:32 PM
> *To: *Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
> *Cc: *netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, Charlie Zender <
> zender@xxxxxxx>, John Readey <jreadey@xxxxxxxxxxxx>, HDF Users Discussion
> List <hdf-forum@xxxxxxxxxxxxxxxxxx>, David Pearah <
> David.Pearah@xxxxxxxxxxxx>
> *Subject: *Re: [netcdfgroup] How to dump netCDF to JSON?
>
>
>
> Pedro,
>
>
>
> When I first started reading this thread, I thought "there should be a
> spec for how to represent netcdf in JSON"
>
>
>
> and then I read:
>
>
>
> 1) The specification to convert netCDF/HDF5 to "a" JSON format (note the
> "a" here)
>
>
>
> Awesome -- that's exactly what we need -- as you say there is not one way
> to represent netcdf data in JSON, and probably far more than one "obvious"
> way.
>
>
>
> Without looking at your spec yet, I do think it should probably look as
> much like CDL as possible -- we are all familiar with that.
>
>
>
> (why Python? HDF5 developer tools should be all about writing in C/C++)
>
>
>
> Because Python is an excellent language with which to "drive" C/C++
> libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python.
> Even if you want to get to a C++ implementation eventually, you'd probably
> benefit from prototyping and working out the kinks with a Python version
> first.
>
>
>
> But whoever is writing the code....
>
>
>
>
>
> The specification is here
>
> http://www.space-research.org/
>
>
>
> Just took a quick look -- nice start.
>
>
>
> I've only used HDF through the netcdf4 spec, so there may be richness
> needed that I'm missing, but my first thought is to a make greater use of
> "objects" in JSON (key-value structures, hash tables, dicts in python),
> rather than array position for heterogeneous structures. For instance, you
> have:
>
>
>
>  a dataset
>
>
> {
> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10, 11, 12]]
> }
>
>
>
> I would perhaps do that as something like:
>
>
>
> {
>
> ...
>
> "dset1":{"object_type": "dataset",
>
>          "dtype": "INT32"
>
>          "rank": 2,
>
>          "dimensions": [3,4],
>
>          "data": [[1,2,3,4],
>
>                   [5,6,7,8],
>
>                   [9,10,11,12]]
>
>          }
>
> ...
>
> }
>
>
>
> NOTES:
>
>
>
> * I used nested arrays, rather than flattening the 2-d array -- this maps
> nicely to things like numpy arrays, for example -- not sure about the C++
> world. (you can flatten and un-flatten numpy arrays easily, too, but this
> seems like a better mapping to the structure) And HDF is storing this all
> in chunks and who knows what -- so it's not a direct mapping to the memory
> layout anyway.
>
>
>
> * Do you need "rank"? -- can't you check the length of the dimensions
> array?
>
>
>
> * Do you  need "object_type" -- will it always be a dataset? Or you could
> have something like:
>
>
>
> {
>
> ...
>
> "datasets": {"dset1": {the actual dataset object},
>
>              "dset2": {another dataset object},
>
>  ....
>
> }
>
>
>
> Then you don't need object_type or a name
>
>
>
>
>
> (BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
>
>
>
> I would like to make this some kind of "official" netCDF/HDF5 JSON format
> for the community, so I encourage anyone to read the specification
>
>
>
> If you see any flaw in the design or anything in the design that you would
> like to have change please let me know now
>
>
>
> done :-)
>
>
>
> It would be really great to have this become an "official" spec -- if you
> want to get it there, you're probably going to need to develop it more out
> in the open with a wider community. These lists are the way to get that
> started, but I suggest:
>
>
>
> 1) put it up somewhere that people can collaborate on it, make
> suggestions, capture the discussion, etc. gitHub is one really nice way to
> do that. See, for example the UGRID spec project:
>
>
>
>   https://github.com/ugrid-conventions/ugrid-conventions
>
>
>
> (NOTE that that one got put on gitHub after there was a pretty complete
> draft spec, so there isn't THAT much discussion captured. But also note
> that that is too bad -- there is no good record of the decision process
> that led to the spec)
>
>
>
> At the moment it only (intentionally) uses common generic features of both
> netCDF and HDF5, which are the numeric atomic types and strings.
>
>
>
> Good plan.
>
>
>
> -Chris
>
>
>
>
>
> --
>
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@xxxxxxxx
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2016 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: