I see two levels of transparency that need discussion.
1. transparency at the API level. Do we need to modify
nc_open to specify all the relevant files?
2. transparency at the hdf5 library level. If it is the case
that the hdf5 API is the same for files with links, then
of course, the existing netcdf-c library would be able to
read such files with no changes whatsoever.
Without having looked in detail. I am guessing that the HDF5 API
is not transparent wrt to linked files. Correct?
=Dennis Heimbigner
Unidata
p.s. I have elsewhere suggested providing a single-file-filesystem
model for netcdf-c files and this, or some equivalent such as .zip
files would help mitigate the use of multiple files.
On 2/22/2017 12:45 PM, Julian Kunkel wrote:
Dear Dave,
exactly that transparency is the idea. Normal netcdf applications don't
realize that there is any difference.
Regards
Julian
Am 22.02.2017 8:31 nachm. schrieb "Dave Allured - NOAA Affiliate"
<dave.allured@xxxxxxxx <mailto:dave.allured@xxxxxxxx>>:
Eugene,
For reading only, how transparent are HDF5 virtual data sets as
single Netcdf-4 files? Is it now possible to have a VDS that can be
fully and transparently accessed by the Netcdf-C API, with the
appearance of a single Netcdf-4 file for all normal read-only purposes?
--Dave
On Wed, Feb 22, 2017 at 5:33 AM, Eugen Betke <betke@xxxxxxx
<mailto:betke@xxxxxxx>> wrote:
Dear NetCDF-Group,
about half a year ago we discussed the integration of external
links in NetCDF.
Motivation:
In our institution, people are already working with multiple
data files (grid and data separated) to avoid replication of the
grid when a file only contains one timestep.
Here is a short summary of last discussion:
1. Our implementation of external links is based on HDF5 Virtual
Datasets (VDS).
It allows to use a variable defined in another file as one of
the dimensions.
2. Possible application fields are data deduplication and I/O
optimization.
- When data and grid are stored in separate files, grid can be
reused. No duplication of the grid is necessary.
- I/O optimization is achieved, through saving of storage space
and network bandwidth.
3. Until now, there was an implicit assumption, that NetCDF
files must be self-contained, i.e., all data must be stored in
one single file.
4. This feature is not mandatory nor does it change anything
inside the regular NetCDF4 file format. It can be used when
necessary.
5. Storage of data in multiple files has been discussed:
- What happens if one file is missing?
The conclussion was, that the file is still valid, because in
that case the default values will be used, but the data file is
useless for the application, because the data can not be
interpreted.
- Are all files (data and grid files) valid NetCDF4 files?
The files using links are not backwards compatible.
6. We believe the single file semantic must go away in the long
term, where this approach is an intermediate step.
We would like to see this feature to be added to NetCDF standard.
We can provide a patch for configure to include support only
when the required HDF5 version is available.
Is there anything else necessary to help in integrating this
feature into NetCDF:
- Do we need better understanding of saving data in multiple files?
- Shall we provide a well tested and documented implementation?
- How large must the number of intrested people be, in order to
justify the integration this feature?
You find a patch on our website:
http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start
<http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start>
We would like to reopen the discussion.
Please provide a clear rejection, if for some reason this
feature can't never be a part of NetCDF.
Regards,
Eugen
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
<http://www.unidata.ucar.edu/mailing_lists/>
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/