Eugene,
For reading only, how transparent are HDF5 virtual data sets as single
Netcdf-4 files? Is it now possible to have a VDS that can be fully and
transparently accessed by the Netcdf-C API, with the appearance of a single
Netcdf-4 file for all normal read-only purposes?
--Dave
On Wed, Feb 22, 2017 at 5:33 AM, Eugen Betke <betke@xxxxxxx> wrote:
> Dear NetCDF-Group,
>
> about half a year ago we discussed the integration of external links in
> NetCDF.
> Motivation:
> In our institution, people are already working with multiple data files
> (grid and data separated) to avoid replication of the grid when a file only
> contains one timestep.
>
> Here is a short summary of last discussion:
> 1. Our implementation of external links is based on HDF5 Virtual Datasets
> (VDS).
> It allows to use a variable defined in another file as one of the
> dimensions.
> 2. Possible application fields are data deduplication and I/O optimization.
> - When data and grid are stored in separate files, grid can be reused. No
> duplication of the grid is necessary.
> - I/O optimization is achieved, through saving of storage space and
> network bandwidth.
> 3. Until now, there was an implicit assumption, that NetCDF files must be
> self-contained, i.e., all data must be stored in one single file.
> 4. This feature is not mandatory nor does it change anything inside the
> regular NetCDF4 file format. It can be used when necessary.
> 5. Storage of data in multiple files has been discussed:
> - What happens if one file is missing?
> The conclussion was, that the file is still valid, because in that case
> the default values will be used, but the data file is useless for the
> application, because the data can not be interpreted.
> - Are all files (data and grid files) valid NetCDF4 files?
> The files using links are not backwards compatible.
> 6. We believe the single file semantic must go away in the long term,
> where this approach is an intermediate step.
>
> We would like to see this feature to be added to NetCDF standard.
> We can provide a patch for configure to include support only when the
> required HDF5 version is available.
> Is there anything else necessary to help in integrating this feature into
> NetCDF:
> - Do we need better understanding of saving data in multiple files?
> - Shall we provide a well tested and documented implementation?
> - How large must the number of intrested people be, in order to justify
> the integration this feature?
>
> You find a patch on our website:
> http://wr.informatik.uni-hamburg.de/research/projects/bullio
> /netcdf_external_links/start
>
> We would like to reopen the discussion.
> Please provide a clear rejection, if for some reason this feature can't
> never be a part of NetCDF.
>
> Regards,
> Eugen
>