Hi,
I like your idea. I threw together a quick pdksh script to implement
something like you suggest. It assumes you have ncdump and the NetCDF
operators (in particular the ncatted program). Basically, I ncdump the
file, and calculate the MD5 sum. I then create a global attribute called
md5sum. To check the file, I ncdump it again, being careful not to
include the line containing the md5sum global attribute. If you look at
the attached script you'll get the idea.
The script seems to work OK on my Linux box, but I guess it is slow and
inefficient, especially on large NetCDF files. Perhaps someone has a
better solution, or might refine the script a bit?
Tim Hume
Bureau of Meteorology Research Centre
Melbourne
Australia
Script follows:
#!/bin/ksh
#
# A quick and dirty hack to incorporate a MD5 sum in a NetCDF file.
#
# Tim Hume.
# 4 February 2005.
export PATH=/bin:/usr/bin:/usr/local/bin:/arm/tph/bin
#
# Defaults.
#
action=checkmd5
ncfile=""
while [[ $# -ge 1 ]]
do
case "${1}" in
( "-C" | "-c" )
action="checkmd5"
shift
;;
( "-S" | "-s" )
action="makemd5"
shift
;;
( "-H" | "-h" )
echo "Usage: sign_netcdf [ -C ] [ -S ] file.nc"
exit 0
;;
( * )
ncfile="${1}"
shift
;;
esac
done
if [[ ! -f "${ncfile}" ]]
then
echo "E: No such file: ${ncfile}"
exit 1
fi
#
# Now check an existing MD5 sum, or create a new one.
#
md5sum=$(ncdump "${ncfile}" | grep -E -v -e '[[:space:]]+:md5sum =' | md5sum |
awk '{print $1}')
if [[ "${action}" == "makemd5" ]]
then
ncatted -h -a md5sum,global,o,c,"${md5sum}" "${ncfile}"
else
md5sum_att=$(ncdump -h "${ncfile}" | grep -E -e '[[:space:]]+:md5sum ='
| awk -F\" '{print $2}')
if [[ "${md5sum}" == "${md5sum_att}" ]]
then
echo "Good MD5 sum: ${md5sum}"
else
echo "Bad MD5 sum. Actual sum is: ${md5sum}"
echo "Attribute says it should be: ${md5sum_att}"
fi
fi
On Thu, 3 Feb 2005 23:38:16 +0100
Reimar Bauer <R.Bauer@xxxxxxxxxxxxx> wrote:
> Dear all
>
> One of my colleagues gots the idea to have included into each netCDF
> file a checksum or a signature to indicate if the file gots changed
> by some kind of damaging (a virus or some hardware failures).
>
> The reason why we need such a information is the files gots larger and
> larger and you can't guarantee if a file which comes from a backup or
> from a file copy is the original one.
>
> We have had a very interesting hardware failure on a harddisk on one
> of our systems which inhibits the copying of a netCDF file in that
> way that's only one parameter of a file was not readable. It takes a
> long time to understand why this file on an other machine was readed
> right.
>
> If there will be automaticly included a kind of self diagnostic it is
> much easier to find out why some data does not look as supposed. And
> you know immediatly that there is something wrong!
>
> I think if there are ideas how to implent this feature it should be
> done. It is very important for all of us!
>
>
> cheers
> Reimar
>
> --
> Forschungszentrum Juelich
> email: R.Bauer@xxxxxxxxxxxxx
> http://www.fz-juelich.de/icg/icg-i/
> =================================================================
> a IDL library at ForschungsZentrum Juelich
> http://www.fz-juelich.de/icg/icg-i/idl_icglib/idl_lib_intro.html
>
>