I'm glad that CF is considering URL for associated files, which HDF5
VDS do not.
Eugene, thanks for the patch. Can you extend it to support URL
(i.e., files on cloud)? It would be great if your patch can support
beyond HDF5 [1] as Virtual Data Set data source.
[1] https://github.com/hyoklee/HDML
--
HDF: #1 Driver for Big, Deep, Fast data science.
On Wed, Feb 22, 2017 at 9:52 AM, Antonio S. Cofiño <cofinoa@xxxxxxxxx> wrote:
> Hello all,
>
> In fact there is already an accepted enhancement in CF convention, for this
> kind of feature on CMIP6 datasets.
>
> "Subconvention for associated files, proposed for use in CMIP6"
>
> http://cf-trac.llnl.gov/trac/ticket/145
>
> to save storage space, where redundant information are stored in separated
> files.
>
> This new feature would be great.
>
> Regards
>
> Antonio
>
>
> On 22/02/17 15:44, Ed Hartnett wrote:
>
> I echo Eugen's call for a way to handle multi-file datasets in netCDF.
>
> I understand and applaud the netCDF ideal that metadata and data belong in
> the same file, but that ideal cannot always be accommodated, and many of
> those who have the greatest need for separate coordinate data files are in a
> netCDF core community of weather and climate modeling.
>
> The features in this proposal look very attractive.
>
> Keep on netCDFing!
>
> Ed
>
>
> On Wed, Feb 22, 2017 at 5:33 AM, Eugen Betke <betke@xxxxxxx> wrote:
>>
>> Dear NetCDF-Group,
>>
>> about half a year ago we discussed the integration of external links in
>> NetCDF.
>> Motivation:
>> In our institution, people are already working with multiple data files
>> (grid and data separated) to avoid replication of the grid when a file only
>> contains one timestep.
>>
>> Here is a short summary of last discussion:
>> 1. Our implementation of external links is based on HDF5 Virtual Datasets
>> (VDS).
>> It allows to use a variable defined in another file as one of the
>> dimensions.
>> 2. Possible application fields are data deduplication and I/O
>> optimization.
>> - When data and grid are stored in separate files, grid can be reused. No
>> duplication of the grid is necessary.
>> - I/O optimization is achieved, through saving of storage space and
>> network bandwidth.
>> 3. Until now, there was an implicit assumption, that NetCDF files must be
>> self-contained, i.e., all data must be stored in one single file.
>> 4. This feature is not mandatory nor does it change anything inside the
>> regular NetCDF4 file format. It can be used when necessary.
>> 5. Storage of data in multiple files has been discussed:
>> - What happens if one file is missing?
>> The conclussion was, that the file is still valid, because in that case
>> the default values will be used, but the data file is useless for the
>> application, because the data can not be interpreted.
>> - Are all files (data and grid files) valid NetCDF4 files?
>> The files using links are not backwards compatible.
>> 6. We believe the single file semantic must go away in the long term,
>> where this approach is an intermediate step.
>>
>> We would like to see this feature to be added to NetCDF standard.
>> We can provide a patch for configure to include support only when the
>> required HDF5 version is available.
>> Is there anything else necessary to help in integrating this feature into
>> NetCDF:
>> - Do we need better understanding of saving data in multiple files?
>> - Shall we provide a well tested and documented implementation?
>> - How large must the number of intrested people be, in order to justify
>> the integration this feature?
>>
>> You find a patch on our website:
>>
>> http://wr.informatik.uni-hamburg.de/research/projects/bullio/netcdf_external_links/start
>>
>> We would like to reopen the discussion.
>> Please provide a clear rejection, if for some reason this feature can't
>> never be a part of NetCDF.
>>
>>
>> Regards,
>> Eugen
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/