Re: [netcdfgroup] How to dump netCDF to JSON?

>>my thought was to make a netcdfJSON, then add features to make an hdfJSON. 
>>(and netcdfJSON would look a lot like CDL) 
>>So a netcdfJSON file would be a valid hdfJSON file, but not the other way 
>>around.

on better thinking , this design has the problem of netCDF having things that 
HDF5 does not (named dimensions),
and HDF5 has things that netCDF does not, so it's a bit of a catch 22 ; so 
maybe just keep them separate

my design method is usually a bit of specification , then a bit of code , then 
when something new comes up that was not planned, go to step 1 , 
and re-write the spec, sometimes re-write the code

-Pedro


  ----- Original Message ----- 
  From: Pedro Vicente 
  To: Chris Barker 
  Cc: HDF Users Discussion List ; netCDF Mail List 
  Sent: Thursday, October 20, 2016 7:33 PM
  Subject: Re: [netcdfgroup] How to dump netCDF to JSON?


  >>my thought was to make a netcdfJSON, then add features to make an hdfJSON. 
(and netcdfJSON would look a lot like CDL) 
  >>So a netcdfJSON file would be a valid hdfJSON file, but not the other way 
around.

  yes, sounds like a good plan
  I''ll send you an email when I have things ready, thanks
  -Pedro
    ----- Original Message ----- 
    From: Chris Barker 
    To: Pedro Vicente 
    Cc: John Readey ; netCDF Mail List ; HDF Users Discussion List 
    Sent: Thursday, October 20, 2016 6:17 PM
    Subject: Re: [netcdfgroup] How to dump netCDF to JSON?






    On Thu, Oct 20, 2016 at 3:00 PM, Pedro Vicente 
<pedro.vicente@xxxxxxxxxxxxxxxxxx> wrote:

      >>> This is making me think that we may want a spec for netcdf-json that 
would be a subset of the hdf-json spec.

      that is one option;
      other option is to make a JSON form of netCDF CDL , completely unaware of 
HDF5 (just like the netCDF API is)

      
http://www.unidata.ucar.edu/software/netcdf/workshops/2011/utilities/CDL.html


    yup.


    Are they mutually exclusive approaches? my thought was to make a 
netcdfJSON, then add features to make an hdfJSON. (and netcdfJSON would look a 
lot like CDL)


    So a netcdfJSON file would be a valid hdfJSON file, but not the other way 
around.


    Like a netcdf4 file is a valid hdf5 file now.


    -CHB



      with the "data" part being optional, which was one of the goals of my 
design, to transmit just metadata over the web, for a quick remote inspection

      -Pedro
        ----- Original Message ----- 
        From: Chris Barker 
        To: John Readey 
        Cc: Pedro Vicente ; netCDF Mail List ; HDF Users Discussion List 
        Sent: Thursday, October 20, 2016 4:48 PM
        Subject: Re: [netcdfgroup] How to dump netCDF to JSON?


        On Thu, Oct 20, 2016 at 12:02 PM, John Readey <jreadey@xxxxxxxxxxxx> 
wrote:

          So we came up with a scheme of Group, Dataset, and Datatype 
collections with a UUID to identify each object.  That way if you a reference 
to a specific UUID, you can always access the object regardless of what 
shenanigans may be happening with the links in the file.




          Itâs true that this makes path look ups a bit more cumbersome, but 
itâs a more general way of specify a directed graph (the HDF5 link structure) 
on a tree (the JSON hierarchy).



        Hmm -- interesting. I hadn't realized that HDF was this flexible. For 
my part, I've only really used netcdf.


        This is making me think that we may want a spec for netcdf-json that 
would be a subset of the hdf-json spec.


        That way they can be as compatible as possible without "cluttering up" 
the netcdf spec too much.


        -CHB









          John



          From: Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
          Date: Tuesday, October 18, 2016 at 9:37 PM
          To: John Readey <jreadey@xxxxxxxxxxxx>, Chris Barker 
<chris.barker@xxxxxxxx>
          Cc: netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, HDF Users 
Discussion List <hdf-forum@xxxxxxxxxxxxxxxxxx>


          Subject: Re: [netcdfgroup] How to dump netCDF to JSON?



          @John



          >> 1.       Complete fidelity to all HDF5 features

          >> 2.       Support graphs that are not acyclic.



          ok, understood.



          In my case I needed a simple schema for a particular set of files.



          But why didn't you start with the official HDF5 DDL



          https://support.hdfgroup.org/HDF5/doc/ddl.html



          and try to adapt to JSON?



          Same thing for netCDF, there is already an official CDL, so any JSON 
spec should be "identical".







          @Chris



          {
          "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 
8, 9, 10, 11, 12]]
          }



          >> * Do you need "rank"? 



          sometimes a bit of redundancy is useful, to make it visually clear



          >> BTW, is a "dataset" in HDF the same thing as a "variable" in 
netcdf?)



          yes



          >>It would be really great to have this become an "official" spec -- 
if you want to get it there, you're probably going to need to develop it more 
out in the open with a wider community. These lists are the way to get that 
started, but I suggest 

          >>1) put it up somewhere that people can collaborate on it, make 
suggestions, capture the discussion, etc. gitHub is one really nice way to do 
that. See, for example the UGRID spec project: 





          ok, anyone interested send me an off list  email 





          -Pedro







          ----- Original Message ----- 

            From: John Readey 

            To: Chris Barker ; Pedro Vicente 

            Cc: netCDF Mail List ; Charlie Zender ; HDF Users Discussion List ; 
David Pearah 

            Sent: Tuesday, October 18, 2016 11:15 PM

            Subject: Re: [netcdfgroup] How to dump netCDF to JSON?



            Hey,



            The hdf5-json code is here: https://github.com/HDFGroup/hdf5-json 
and docs are here:  http://hdf5-json.readthedocs.io/en/latest/.  



            The package is both a library of HFD5 <-> JSON conversion functions 
and some simple scripts for converting HDF5 to JSON and vice-versa.  E.g. 

            $ python h5tojson.py âD <hdf5-file> 

            outputs JSON minus the dataset data values.



            While it may not be the most elegant JSON schema, itâs designed 
with the following goals in mind:

            1.       Complete fidelity to all HDF5 features (i.e. the goal is 
that you should be able to take any HDF5 files, convert it to JSON, convert 
back to HDF5 and wind up with a file that is semantically equivalent to what 
you started with.

            2.       Support graphs that are not acyclic.  I.e. a group 
structure like <root> links with A, and B.  And A and B links to C.  The output 
should only produce one representation of C.

            Since NetCDF doesnât use all these features, itâs certainly 
possible to come up with something simpler for just netCDF files.



            Suggestions, feedback, and pull requests are welcome!



            Cheers,

            John



            From: Chris Barker <chris.barker@xxxxxxxx>
            Date: Friday, October 14, 2016 at 12:32 PM
            To: Pedro Vicente <pedro.vicente@xxxxxxxxxxxxxxxxxx>
            Cc: netCDF Mail List <netcdfgroup@xxxxxxxxxxxxxxxx>, Charlie Zender 
<zender@xxxxxxx>, John Readey <jreadey@xxxxxxxxxxxx>, HDF Users Discussion List 
<hdf-forum@xxxxxxxxxxxxxxxxxx>, David Pearah <David.Pearah@xxxxxxxxxxxx>
            Subject: Re: [netcdfgroup] How to dump netCDF to JSON?



            Pedro, 



            When I first started reading this thread, I thought "there should 
be a spec for how to represent netcdf in JSON"



            and then I read:



              1) The specification to convert netCDF/HDF5 to "a" JSON format 
(note the "a" here)



            Awesome -- that's exactly what we need -- as you say there is not 
one way to represent netcdf data in JSON, and probably far more than one 
"obvious" way.



            Without looking at your spec yet, I do think it should probably 
look as much like CDL as possible -- we are all familiar with that.



              (why Python? HDF5 developer tools should be all about writing in 
C/C++)



            Because Python is an excellent language with which to "drive" C/C++ 
libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python. 
Even if you want to get to a C++ implementation eventually, you'd probably 
benefit from prototyping and working out the kinks with a Python version first.



            But whoever is writing the code....





              The specification is here

              http://www.space-research.org/



            Just took a quick look -- nice start. 



            I've only used HDF through the netcdf4 spec, so there may be 
richness needed that I'm missing, but my first thought is to a make greater use 
of "objects" in JSON (key-value structures, hash tables, dicts in python), 
rather than array position for heterogeneous structures. For instance, you have:



             a dataset


              {
              "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12]]
              }



            I would perhaps do that as something like:



            {

            ...

            "dset1":{"object_type": "dataset",

                     "dtype": "INT32"

                     "rank": 2,

                     "dimensions": [3,4],

                     "data": [[1,2,3,4],

                              [5,6,7,8],

                              [9,10,11,12]]

                     }

            ...

            }



            NOTES:



            * I used nested arrays, rather than flattening the 2-d array -- 
this maps nicely to things like numpy arrays, for example -- not sure about the 
C++ world. (you can flatten and un-flatten numpy arrays easily, too, but this 
seems like a better mapping to the structure) And HDF is storing this all in 
chunks and who knows what -- so it's not a direct mapping to the memory layout 
anyway.



            * Do you need "rank"? -- can't you check the length of the 
dimensions array?



            * Do you  need "object_type" -- will it always be a dataset? Or you 
could have something like:



            {

            ...

            "datasets": {"dset1": {the actual dataset object},

                         "dset2": {another dataset object},

             ....

            } 



            Then you don't need object_type or a name





            (BTW, is a "dataset" in HDF the same thing as a "variable" in 
netcdf?)



              I would like to make this some kind of "official" netCDF/HDF5 
JSON format for the community, so I encourage anyone to read the specification



              If you see any flaw in the design or anything in the design that 
you would like to have change please let me know now



            done :-)



            It would be really great to have this become an "official" spec -- 
if you want to get it there, you're probably going to need to develop it more 
out in the open with a wider community. These lists are the way to get that 
started, but I suggest:



            1) put it up somewhere that people can collaborate on it, make 
suggestions, capture the discussion, etc. gitHub is one really nice way to do 
that. See, for example the UGRID spec project:



              https://github.com/ugrid-conventions/ugrid-conventions



            (NOTE that that one got put on gitHub after there was a pretty 
complete draft spec, so there isn't THAT much discussion captured. But also 
note that that is too bad -- there is no good record of the decision process 
that led to the spec)



              At the moment it only (intentionally) uses common generic 
features of both netCDF and HDF5, which are the numeric atomic types and 
strings.



            Good plan.



            -Chris





            -- 


            Christopher Barker, Ph.D.
            Oceanographer

            Emergency Response Division
            NOAA/NOS/OR&R            (206) 526-6959   voice
            7600 Sand Point Way NE   (206) 526-6329   fax
            Seattle, WA  98115       (206) 526-6317   main reception

            Chris.Barker@xxxxxxxx






        -- 


        Christopher Barker, Ph.D.
        Oceanographer

        Emergency Response Division
        NOAA/NOS/OR&R            (206) 526-6959   voice
        7600 Sand Point Way NE   (206) 526-6329   fax
        Seattle, WA  98115       (206) 526-6317   main reception

        Chris.Barker@xxxxxxxx





    -- 


    Christopher Barker, Ph.D.
    Oceanographer

    Emergency Response Division
    NOAA/NOS/OR&R            (206) 526-6959   voice
    7600 Sand Point Way NE   (206) 526-6329   fax
    Seattle, WA  98115       (206) 526-6317   main reception

    Chris.Barker@xxxxxxxx


------------------------------------------------------------------------------


  _______________________________________________
  NOTE: All exchanges posted to Unidata maintained email lists are
  recorded in the Unidata inquiry tracking system and made publicly
  available through the web.  Users who post to any of the lists we
  maintain are reminded to remove any personal information that they
  do not want to be made public.


  netcdfgroup mailing list
  netcdfgroup@xxxxxxxxxxxxxxxx
  For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/