Hi all,
I'd like to find a way to verify the contents of a given netCDF dataset
across different representations on disk. (Think of the data set being
defined by its CDL code and different representations on disk being
realised by different choices of format, deflation, chunking, etc. but
with identical CDL.)
There are tools that compare the contents of two netCDF files: cdo's
diff or nccmp. These tools do, however, rely on both files being present
on the same file system and at the same time. A hash-based approach
calculating checksums from the contents rather than the binary
representation of the data set would be a nice solution to the problem.
I've tried and collected all attempts made at verification of netCDF
files in: https://github.com/willirath/netcdf-hash (The most successful
of which circled around the possibility of including the functionality
in `ncks` and lead to a pair of tools for calculation and verification
of MD5 checksums of netCDF files that are stored within the files.)
There also is a demo outlining an approach digesting different
representations of the same netCDF data set into a sha256 hash and
storing the hex-value of this hash in global arguments in the respective
files.
I'd be very happy about any pointers to additional ideas (or perhaps
existing tools) solving the problem of netCDF-content verification,
about suggestions, remarks, etc.
Cheers
Willi
--
Willi Rath
Theorie und Modellierung
GEOMAR
Helmholtz-Zentrum für Ozeanforschung Kiel
Duesternbrooker Weg 20, Raum 422
24105 Kiel, Germany
------------------------------------------------------------
Tel. +49-431-600-4010
wrath@xxxxxxxxx
www.geomar.de
-----------------------