Re: [netcdfgroup] NetCDF external links

Hello all,

In fact there is already an accepted enhancement in CF convention, for this kind of feature on CMIP6 datasets.

"Subconvention for associated files, proposed for use in CMIP6"

http://cf-trac.llnl.gov/trac/ticket/145

to save storage space, where redundant information are stored in separated files.

This new feature would be great.

Regards

Antonio


On 22/02/17 15:44, Ed Hartnett wrote:
I echo Eugen's call for a way to handle multi-file datasets in netCDF.

I understand and applaud the netCDF ideal that metadata and data belong in the same file, but that ideal cannot always be accommodated, and many of those who have the greatest need for separate coordinate data files are in a netCDF core community of weather and climate modeling.

The features in this proposal look very attractive.

Keep on netCDFing!

Ed


On Wed, Feb 22, 2017 at 5:33 AM, Eugen Betke <betke@xxxxxxx <mailto:betke@xxxxxxx>> wrote:

    Dear NetCDF-Group,

    about half a year ago we discussed the integration of external
    links in NetCDF.
    Motivation:
    In our institution, people are already working with multiple data
    files (grid and data separated) to avoid replication of the grid
    when a file only contains one timestep.

    Here is a short summary of last discussion:
    1. Our implementation of external links is based on HDF5 Virtual
    Datasets (VDS).
    It allows to use a variable defined in another file as one of the
    dimensions.
    2. Possible application fields are data deduplication and I/O
    optimization.
    - When data and grid are stored in separate files, grid can be
    reused. No duplication of the grid is necessary.
    - I/O optimization is achieved, through saving of storage space
    and network bandwidth.
    3. Until now, there was an implicit assumption, that NetCDF files
    must be self-contained, i.e., all data must be stored in one
    single file.
    4. This feature is not mandatory nor does it change anything
    inside the regular NetCDF4 file format. It can be used when necessary.
    5. Storage of data in multiple files has been discussed:
    - What happens if one file is missing?
    The conclussion was, that the file is still valid, because in that
    case the default values will be used, but the data file is useless
    for the application, because the data can not be interpreted.
    - Are all files (data and grid files) valid NetCDF4 files?
    The files using links are not backwards compatible.
    6. We believe the single file semantic must go away in the long
    term, where this approach is an intermediate step.

    We would like to see this feature to be added to NetCDF standard.
    We can provide a patch for configure to include support only when
    the required HDF5 version is available.
    Is there anything else necessary to help in integrating this
    feature into NetCDF:
    - Do we need better understanding of saving data in multiple files?
    - Shall we provide a well tested and documented implementation?
    - How large must the number of intrested people be, in order to
    justify the integration this feature?

    You find a patch on our website:
    
http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start
    
<http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start>

    We would like to reopen the discussion.
    Please provide a clear rejection, if for some reason this feature
    can't never be a part of NetCDF.


    Regards,
    Eugen

    _______________________________________________
    NOTE: All exchanges posted to Unidata maintained email lists are
    recorded in the Unidata inquiry tracking system and made publicly
    available through the web.  Users who post to any of the lists we
    maintain are reminded to remove any personal information that they
    do not want to be made public.


    netcdfgroup mailing list
    netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
    For list information or to unsubscribe,  visit:
    http://www.unidata.ucar.edu/mailing_lists/
    <http://www.unidata.ucar.edu/mailing_lists/>




_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.


netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: