Re: [netcdfgroup] How to dump netCDF to JSON?

On Thu, Oct 20, 2016 at 3:00 PM, Pedro Vicente <
pedro.vicente@xxxxxxxxxxxxxxxxxx> wrote:

> >>> This is making me think that we may want a spec for netcdf-json that
> would be a subset of the hdf-json spec.
>
> that is one option;
> other option is to make a JSON form of netCDF CDL , completely unaware of
> HDF5 (just like the netCDF API is)
>
> http://www.unidata.ucar.edu/software/netcdf/workshops/
> 2011/utilities/CDL.html
>

yup.

Are they mutually exclusive approaches? my thought was to make a
netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would
look a lot like CDL)

So a netcdfJSON file would be a valid hdfJSON file, but not the other way
around.

Like a netcdf4 file is a valid hdf5 file now.

-CHB



> with the "data" part being optional, which was one of the goals of my
> design, to transmit just metadata over the web, for a quick remote
> inspection
>
> -Pedro
>
> ----- Original Message -----
> *From:* Chris Barker <chris.barker@xxxxxxxx>
> *To:* John Readey <jreadey@xxxxxxxxxxxx>
> *Cc:* Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx> ; netCDF Mail List
> <netcdfgroup@xxxxxxxxxxxxxxxx> ; HDF Users Discussion List
> <hdf-forum@xxxxxxxxxxxxxxxxxx>
> *Sent:* Thursday, October 20, 2016 4:48 PM
> *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?
>
> On Thu, Oct 20, 2016 at 12:02 PM, John Readey <jreadey@xxxxxxxxxxxx>
> wrote:
>
>> So we came up with a scheme of Group, Dataset, and Datatype collections
>> with a UUID to identify each object.  That way if you a reference to a
>> specific UUID, you can always access the object regardless of what
>> shenanigans may be happening with the links in the file.
>>
>>
>>
>> Itâs true that this makes path look ups a bit more cumbersome, but itâs a
>> more general way of specify a directed graph (the HDF5 link structure) on a
>> tree (the JSON hierarchy).
>>
>
> Hmm -- interesting. I hadn't realized that HDF was this flexible. For my
> part, I've only really used netcdf.
>
> This is making me think that we may want a spec for netcdf-json that would
> be a subset of the hdf-json spec.
>
> That way they can be as compatible as possible without "cluttering up" the
> netcdf spec too much.
>
> -CHB
>
>
>
>
>
>>
>>
>> John
>>
>>
>>
>> *From: *Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>> *Date: *Tuesday, October 18, 2016 at 9:37 PM
>> *To: *John Readey <jreadey@xxxxxxxxxxxx>, Chris Barker <
>> chris.barker@xxxxxxxx>
>> *Cc: *netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, HDF Users
>> Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx>
>>
>> *Subject: *Re: [netcdfgroup] How to dump netCDF to JSON?
>>
>>
>>
>> @John
>>
>>
>>
>> >> 1.       Complete fidelity to all HDF5 features
>>
>> >> 2.       Support graphs that are not acyclic.
>>
>>
>>
>> ok, understood.
>>
>>
>>
>> In my case I needed a simple schema for a particular set of files.
>>
>>
>>
>> But why didn't you start with the official HDF5 DDL
>>
>>
>>
>> https://support.hdfgroup.org/HDF5/doc/ddl.html
>>
>>
>>
>> and try to adapt to JSON?
>>
>>
>>
>> Same thing for netCDF, there is already an official CDL, so any JSON
>> spec should be "identical".
>>
>>
>>
>>
>>
>>
>>
>> @Chris
>>
>>
>>
>> {
>> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8,
>> 9, 10, 11, 12]]
>> }
>>
>>
>>
>> >> * Do you need "rank"?
>>
>>
>>
>> sometimes a bit of redundancy is useful, to make it visually clear
>>
>>
>>
>> >> BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
>>
>>
>>
>> yes
>>
>>
>>
>> >>It would be really great to have this become an "official" spec -- if
>> you want to get it there, you're probably going to need to develop it more
>> out in the open with a wider community. These lists are the way to get that
>> started, but I suggest
>>
>> >>1) put it up somewhere that people can collaborate on it, make
>> suggestions, capture the discussion, etc. gitHub is one really nice way to
>> do that. See, for example the UGRID spec project:
>>
>>
>>
>>
>>
>> ok, anyone interested send me an off list  email
>>
>>
>>
>>
>>
>> -Pedro
>>
>>
>>
>>
>>
>>
>>
>> ----- Original Message -----
>>
>> *From:* John Readey <jreadey@xxxxxxxxxxxx>
>>
>> *To:* Chris Barker <chris.barker@xxxxxxxx> ; Pedro Vicente
>> <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>>
>> *Cc:* netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx> ; Charlie Zender
>> <zender@xxxxxxx> ; HDF Users Discussion List
>> <hdf-forum@xxxxxxxxxxxxxxxxxx> ; David Pearah <David.Pearah@xxxxxxxxxxxx>
>>
>> *Sent:* Tuesday, October 18, 2016 11:15 PM
>>
>> *Subject:* Re: [netcdfgroup] How to dump netCDF to JSON?
>>
>>
>>
>> Hey,
>>
>>
>>
>> The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json and
>> docs are here:  http://hdf5-json.readthedocs.io/en/latest/.
>>
>>
>>
>> The package is both a library of HFD5 <-> JSON conversion functions and
>> some simple scripts for converting HDF5 to JSON and vice-versa.  E.g.
>>
>> $ python h5tojson.py âD <hdf5-file>
>>
>> outputs JSON minus the dataset data values.
>>
>>
>>
>> While it may not be the most elegant JSON schema, itâs designed with the
>> following goals in mind:
>>
>> 1.       Complete fidelity to all HDF5 features (i.e. the goal is that
>> you should be able to take any HDF5 files, convert it to JSON, convert back
>> to HDF5 and wind up with a file that is semantically equivalent to what you
>> started with.
>>
>> 2.       Support graphs that are not acyclic.  I.e. a group structure
>> like <root> links with A, and B.  And A and B links to C.  The output
>> should only produce one representation of C.
>>
>> Since NetCDF doesnât use all these features, itâs certainly possible to
>> come up with something simpler for just netCDF files.
>>
>>
>>
>> Suggestions, feedback, and pull requests are welcome!
>>
>>
>>
>> Cheers,
>>
>> John
>>
>>
>>
>> *From: *Chris Barker <chris.barker@xxxxxxxx>
>> *Date: *Friday, October 14, 2016 at 12:32 PM
>> *To: *Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
>> *Cc: *netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, Charlie Zender <
>> zender@xxxxxxx>, John Readey <jreadey@xxxxxxxxxxxx>, HDF Users
>> Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx>, David Pearah <
>> David.Pearah@xxxxxxxxxxxx>
>> *Subject: *Re: [netcdfgroup] How to dump netCDF to JSON?
>>
>>
>>
>> Pedro,
>>
>>
>>
>> When I first started reading this thread, I thought "there should be a
>> spec for how to represent netcdf in JSON"
>>
>>
>>
>> and then I read:
>>
>>
>>
>> 1) The specification to convert netCDF/HDF5 to "a" JSON format (note the
>> "a" here)
>>
>>
>>
>> Awesome -- that's exactly what we need -- as you say there is not one way
>> to represent netcdf data in JSON, and probably far more than one "obvious"
>> way.
>>
>>
>>
>> Without looking at your spec yet, I do think it should probably look as
>> much like CDL as possible -- we are all familiar with that.
>>
>>
>>
>> (why Python? HDF5 developer tools should be all about writing in C/C++)
>>
>>
>>
>> Because Python is an excellent language with which to "drive" C/C++
>> libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python.
>> Even if you want to get to a C++ implementation eventually, you'd probably
>> benefit from prototyping and working out the kinks with a Python version
>> first.
>>
>>
>>
>> But whoever is writing the code....
>>
>>
>>
>>
>>
>> The specification is here
>>
>> http://www.space-research.org/
>>
>>
>>
>> Just took a quick look -- nice start.
>>
>>
>>
>> I've only used HDF through the netcdf4 spec, so there may be richness
>> needed that I'm missing, but my first thought is to a make greater use of
>> "objects" in JSON (key-value structures, hash tables, dicts in python),
>> rather than array position for heterogeneous structures. For instance, you
>> have:
>>
>>
>>
>>  a dataset
>>
>>
>> {
>> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8,
>> 9, 10, 11, 12]]
>> }
>>
>>
>>
>> I would perhaps do that as something like:
>>
>>
>>
>> {
>>
>> ...
>>
>> "dset1":{"object_type": "dataset",
>>
>>          "dtype": "INT32"
>>
>>          "rank": 2,
>>
>>          "dimensions": [3,4],
>>
>>          "data": [[1,2,3,4],
>>
>>                   [5,6,7,8],
>>
>>                   [9,10,11,12]]
>>
>>          }
>>
>> ...
>>
>> }
>>
>>
>>
>> NOTES:
>>
>>
>>
>> * I used nested arrays, rather than flattening the 2-d array -- this maps
>> nicely to things like numpy arrays, for example -- not sure about the C++
>> world. (you can flatten and un-flatten numpy arrays easily, too, but this
>> seems like a better mapping to the structure) And HDF is storing this all
>> in chunks and who knows what -- so it's not a direct mapping to the memory
>> layout anyway.
>>
>>
>>
>> * Do you need "rank"? -- can't you check the length of the dimensions
>> array?
>>
>>
>>
>> * Do you  need "object_type" -- will it always be a dataset? Or you could
>> have something like:
>>
>>
>>
>> {
>>
>> ...
>>
>> "datasets": {"dset1": {the actual dataset object},
>>
>>              "dset2": {another dataset object},
>>
>>  ....
>>
>> }
>>
>>
>>
>> Then you don't need object_type or a name
>>
>>
>>
>>
>>
>> (BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
>>
>>
>>
>> I would like to make this some kind of "official" netCDF/HDF5 JSON format
>> for the community, so I encourage anyone to read the specification
>>
>>
>>
>> If you see any flaw in the design or anything in the design that you
>> would like to have change please let me know now
>>
>>
>>
>> done :-)
>>
>>
>>
>> It would be really great to have this become an "official" spec -- if you
>> want to get it there, you're probably going to need to develop it more out
>> in the open with a wider community. These lists are the way to get that
>> started, but I suggest:
>>
>>
>>
>> 1) put it up somewhere that people can collaborate on it, make
>> suggestions, capture the discussion, etc. gitHub is one really nice way to
>> do that. See, for example the UGRID spec project:
>>
>>
>>
>>   https://github.com/ugrid-conventions/ugrid-conventions
>>
>>
>>
>> (NOTE that that one got put on gitHub after there was a pretty complete
>> draft spec, so there isn't THAT much discussion captured. But also note
>> that that is too bad -- there is no good record of the decision process
>> that led to the spec)
>>
>>
>>
>> At the moment it only (intentionally) uses common generic features of
>> both netCDF and HDF5, which are the numeric atomic types and strings.
>>
>>
>>
>> Good plan.
>>
>>
>>
>> -Chris
>>
>>
>>
>>
>>
>> --
>>
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959 <%28206%29%20526-6959>   voice
>> 7600 Sand Point Way NE   (206) 526-6329 <%28206%29%20526-6329>   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker@xxxxxxxx
>>
>>
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker@xxxxxxxx
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx