Thanks for the pointer! I guess the complexity of sha256 will kick in
for very large data sets.
So far (but I've only tested with very small files), I did not see any
significant performance difference across the hash-algorithms available
in the Python standard lib.
Cheers
Willi
On 08/24/2017 08:04 PM, dmh@xxxxxxxx wrote:
A small note. Since the goal is equality testing rather than security,
you should be able to get by with CRC32 or CRC64 checksums.
SHA256 is overkill.
=Dennis Heimbigner
Unidata
On 8/24/2017 12:00 PM, Willi Rath wrote:
Hi all,
I'd like to find a way to verify the contents of a given netCDF
dataset across different representations on disk. (Think of the data
set being defined by its CDL code and different representations on
disk being realised by different choices of format, deflation,
chunking, etc. but with identical CDL.)
There are tools that compare the contents of two netCDF files: cdo's
diff or nccmp. These tools do, however, rely on both files being
present on the same file system and at the same time. A hash-based
approach calculating checksums from the contents rather than the
binary representation of the data set would be a nice solution to the
problem.
I've tried and collected all attempts made at verification of netCDF
files in: https://github.com/willirath/netcdf-hash (The most
successful of which circled around the possibility of including the
functionality in `ncks` and lead to a pair of tools for calculation
and verification of MD5 checksums of netCDF files that are stored
within the files.)
There also is a demo outlining an approach digesting different
representations of the same netCDF data set into a sha256 hash and
storing the hex-value of this hash in global arguments in the
respective files.
I'd be very happy about any pointers to additional ideas (or perhaps
existing tools) solving the problem of netCDF-content verification,
about suggestions, remarks, etc.
Cheers
Willi
_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web. Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
--
Willi Rath
Theorie und Modellierung
GEOMAR
Helmholtz-Zentrum für Ozeanforschung Kiel
Duesternbrooker Weg 20, Raum 422
24105 Kiel, Germany
------------------------------------------------------------
Tel. +49-431-600-4010
wrath@xxxxxxxxx
www.geomar.de
-----------------------