Pedro,
When I first started reading this thread, I thought "there should be a spec
for how to represent netcdf in JSON"
and then I read:
1) The specification to convert netCDF/HDF5 to "a" JSON format (note the
> "a" here)
>
Awesome -- that's exactly what we need -- as you say there is not one way
to represent netcdf data in JSON, and probably far more than one "obvious"
way.
Without looking at your spec yet, I do think it should probably look as
much like CDL as possible -- we are all familiar with that.
> (why Python? HDF5 developer tools should be all about writing in C/C++)
>
Because Python is an excellent language with which to "drive" C/C++
libraries like HDF5 and netcdf4. If I were to do this, I'd sure use Python.
Even if you want to get to a C++ implementation eventually, you'd probably
benefit from prototyping and working out the kinks with a Python version
first.
But whoever is writing the code....
The specification is here
>
> http://www.space-research.org/
>
>
Just took a quick look -- nice start.
I've only used HDF through the netcdf4 spec, so there may be richness
needed that I'm missing, but my first thought is to a make greater use of
"objects" in JSON (key-value structures, hash tables, dicts in python),
rather than array position for heterogeneous structures. For instance, you
have:
a dataset
>
> {
> "dset1" : ["dataset", "STAR_INT32", 2, [3, 4], [1, 2, 3, 4, 5, 6, 7, 8, 9,
> 10, 11, 12]]
> }
>
I would perhaps do that as something like:
{
...
"dset1":{"object_type": "dataset",
"dtype": "INT32"
"rank": 2,
"dimensions": [3,4],
"data": [[1,2,3,4],
[5,6,7,8],
[9,10,11,12]]
}
...
}
NOTES:
* I used nested arrays, rather than flattening the 2-d array -- this maps
nicely to things like numpy arrays, for example -- not sure about the C++
world. (you can flatten and un-flatten numpy arrays easily, too, but this
seems like a better mapping to the structure) And HDF is storing this all
in chunks and who knows what -- so it's not a direct mapping to the memory
layout anyway.
* Do you need "rank"? -- can't you check the length of the dimensions array?
* Do you need "object_type" -- will it always be a dataset? Or you could
have something like:
{
...
"datasets": {"dset1": {the actual dataset object},
"dset2": {another dataset object},
....
}
Then you don't need object_type or a name
(BTW, is a "dataset" in HDF the same thing as a "variable" in netcdf?)
I would like to make this some kind of "official" netCDF/HDF5 JSON format
> for the community, so I encourage anyone to read the specification
> If you see any flaw in the design or anything in the design that you would
> like to have change please let me know now
>
done :-)
It would be really great to have this become an "official" spec -- if you
want to get it there, you're probably going to need to develop it more out
in the open with a wider community. These lists are the way to get that
started, but I suggest:
1) put it up somewhere that people can collaborate on it, make suggestions,
capture the discussion, etc. gitHub is one really nice way to do that. See,
for example the UGRID spec project:
https://github.com/ugrid-conventions/ugrid-conventions
(NOTE that that one got put on gitHub after there was a pretty complete
draft spec, so there isn't THAT much discussion captured. But also note
that that is too bad -- there is no good record of the decision process
that led to the spec)
> At the moment it only (intentionally) uses common generic features of both
> netCDF and HDF5, which are the numeric atomic types and strings.
>
Good plan.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@xxxxxxxx