Dear NetCDF-Group,
about half a year ago we discussed the integration of external links in
NetCDF.
Motivation:
In our institution, people are already working with multiple data files
(grid and data separated) to avoid replication of the grid when a file
only contains one timestep.
Here is a short summary of last discussion:
1. Our implementation of external links is based on HDF5 Virtual
Datasets (VDS).
It allows to use a variable defined in another file as one of the
dimensions.
2. Possible application fields are data deduplication and I/O optimization.
- When data and grid are stored in separate files, grid can be reused.
No duplication of the grid is necessary.
- I/O optimization is achieved, through saving of storage space and
network bandwidth.
3. Until now, there was an implicit assumption, that NetCDF files must
be self-contained, i.e., all data must be stored in one single file.
4. This feature is not mandatory nor does it change anything inside the
regular NetCDF4 file format. It can be used when necessary.
5. Storage of data in multiple files has been discussed:
- What happens if one file is missing?
The conclussion was, that the file is still valid, because in that case
the default values will be used, but the data file is useless for the
application, because the data can not be interpreted.
- Are all files (data and grid files) valid NetCDF4 files?
The files using links are not backwards compatible.
6. We believe the single file semantic must go away in the long term,
where this approach is an intermediate step.
We would like to see this feature to be added to NetCDF standard.
We can provide a patch for configure to include support only when the
required HDF5 version is available.
Is there anything else necessary to help in integrating this feature
into NetCDF:
- Do we need better understanding of saving data in multiple files?
- Shall we provide a well tested and documented implementation?
- How large must the number of intrested people be, in order to justify
the integration this feature?
You find a patch on our website:
http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start
We would like to reopen the discussion.
Please provide a clear rejection, if for some reason this feature can't
never be a part of NetCDF.
Regards,
Eugen