Re: [netcdfgroup] NetCDF external links

Julian et al,

Thank you, I think that read transparency is very important.  I did not
have the right understanding of external links until I took a closer look
at your website.  For multiple data files, is this correct as a simplified
model for external links?

   data1.nc4 (VDS) ------> grid.nc4 (plain)
   data2.nc4 (VDS) ------> ^
   data3.nc4 (VDS) ------> ^

Presumably this model could be wrapped in another VDS for time aggregation,
which would be important in my applications.

Have you though about an alternate model like this?

   VDS+mapping.h5/nc4 ------> grid.nc4  (plain)
            "         ------> data1.nc4 (plain)
            "         ------> data2.nc4 (plain)
            "         ------> data3.nc4 (plain)

Would this be possible with current, unmodified versions of HDF5 and
Netcdf-C?  This would require an HDF5 utility or custom program to create
the VDS+mapping file.  If that can be done, then it seems it would not be
necessary to change standard Netcdf.

--Dave


On Wed, Feb 22, 2017 at 12:45 PM, Julian Kunkel <juliankunkel@xxxxxxxxxxxxxx
> wrote:

> Dear Dave,
> exactly that transparency is the idea. Normal netcdf applications don't
> realize that there is any difference.
>
> Regards
> Julian
>
> Am 22.02.2017 8:31 nachm. schrieb "Dave Allured - NOAA Affiliate" <
> dave.allured@xxxxxxxx>:
>
>> Eugene,
>>
>> For reading only, how transparent are HDF5 virtual data sets as single
>> Netcdf-4 files?  Is it now possible to have a VDS that can be fully and
>> transparently accessed by the Netcdf-C API, with the appearance of a single
>> Netcdf-4 file for all normal read-only purposes?
>>
>> --Dave
>>
>>
>> On Wed, Feb 22, 2017 at 5:33 AM, Eugen Betke <betke@xxxxxxx> wrote:
>>
>>> Dear NetCDF-Group,
>>>
>>> about half a year ago we discussed the integration of external links in
>>> NetCDF.
>>> Motivation:
>>> In our institution, people are already working with multiple data files
>>> (grid and data separated) to avoid replication of the grid when a file only
>>> contains one timestep.
>>>
>>> Here is a short summary of last discussion:
>>> 1. Our implementation of external links is based on HDF5 Virtual
>>> Datasets (VDS).
>>> It allows to use a variable defined in another file as one of the
>>> dimensions.
>>> 2. Possible application fields are data deduplication and I/O
>>> optimization.
>>> - When data and grid are stored in separate files, grid can be reused.
>>> No duplication of the grid is necessary.
>>> - I/O optimization is achieved, through saving of storage space and
>>> network bandwidth.
>>> 3. Until now, there was an implicit assumption, that NetCDF files must
>>> be self-contained, i.e., all data must be stored in one single file.
>>> 4. This feature is not mandatory nor does it change anything inside the
>>> regular NetCDF4 file format. It can be used when necessary.
>>> 5. Storage of data in multiple files has been discussed:
>>> - What happens if one file is missing?
>>> The conclussion was, that the file is still valid, because in that case
>>> the default values will be used, but the data file is useless for the
>>> application, because the data can not be interpreted.
>>> - Are all files (data and grid files) valid NetCDF4 files?
>>> The files using links are not backwards compatible.
>>> 6. We believe the single file semantic must go away in the long term,
>>> where this approach is an intermediate step.
>>>
>>> We would like to see this feature to be added to NetCDF standard.
>>> We can provide a patch for configure to include support only when the
>>> required HDF5 version is available.
>>> Is there anything else necessary to help in integrating this feature
>>> into NetCDF:
>>> - Do we need better understanding of saving data in multiple files?
>>> - Shall we provide a well tested and documented implementation?
>>> - How large must the number of intrested people be, in order to justify
>>> the integration this feature?
>>>
>>> You find a patch on our website:
>>> http://wr.informatik.uni-hamburg.de/research/projects/bullio
>>> /netcdf_external_links/start
>>>
>>> We would like to reopen the discussion.
>>> Please provide a clear rejection, if for some reason this feature can't
>>> never be a part of NetCDF.
>>>
>>> Regards,
>>> Eugen
>>
>>
  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: