Re: ncdigest V1 #846

Tim Hume wrote:

> I like your idea. I threw together a quick pdksh script to implement
> something like you suggest. It assumes you have ncdump and the NetCDF
> operators (in particular the ncatted program). Basically, I ncdump the
> file, and calculate the MD5 sum. I then create a global attribute called
> md5sum. To check the file, I ncdump it again, being careful not to
> include the line containing the md5sum global attribute. If you look at
> the attached script you'll get the idea.
> 
> The script seems to work OK on my Linux box, but I guess it is slow and
> inefficient, especially on large NetCDF files. Perhaps someone has a
> better solution, or might refine the script a bit?

I think there are some good reasons to keep hashes such as MD5 or
SHA-1 external to files they are intended to check, rather than
embedded in the files:

 - If the digest is external, then something that corrupts the file
   might also corrupt the digest.

 - It's awkward to check an embedded hash, because it requires
   stripping out before recomputing the hash.

 - Updating an embedded hash whenever the file is updated is
   unacceptably inefficient.

 - It's easier to protect an externally stored hash from modification
   or corruption than a large file, for example the hash could be
   stored on write-once media.

However, if you want the convenience of a single file that contains
its own hash, I suggest just appending the hash on the end of a file.

Such a file will behave exactly like the original netCDF file with
respect to the netCDF interface, since nothing in the netCDF interface
lets you determine the size of the file or lets you read beyond the
last data written through the interface.  If you try to read an array
or record past the end of the netCDF data, you get the error "Index
exceeds dimension bound".  If you want to verify that appending to a
netCDF file won't damage it, just append some text to the end of a
netCDF file and run ncdump on it.  You should get the same output as
for the original file and no error messages.

The hash could easily be split off the end of the resulting file and
compared with the hash of the truncated file to verify it had not been
damaged.

--Russ

  • 2005 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: