On Tue, 2004-11-23 at 14:08 -0500, Rich Signell wrote:
> Folks,
>
> Here's a problem we encounter all the time in the
> modeling business: we have a model run (netcdf output,
> of course), and we can't quite figure out what version of
> the code actually was used to produce it. Oh sure,
> it says "version 2.1" but that was before we hacked
> the code to fix X, or put in that special little patch
> just for this run (or was it taken out)? Etc.
Hi Rich,
It sounds like you could use a finer-grained version number. And/or a
few more descriptive attributes that mention who is running the code and
what they have or have not have done with it.
But getting back to your idea: for "medium" [1] and larger code bases,
embedding the entire source code into the model output isn't necessarily
a great idea. It doesn't scale.
Instead, have you considered embedding a hash of your model? For
instance, during the model build stage you could easily create a list of
the source files and their MD5SUMs and then embed that list into one or
more output files (which may or may not be in a NetCDF format).
Admittedly, it wouldn't tell you exactly what modifications were made to
the code. But it would tell you which files had been modified and it
would scale to work with much larger projects. And any hash(es) can be
though of as a sort of "extended version number".
If you'd like, I'd be happy to work with you on writing up a quick pair
of hash-list-embed and hash-list-extract utilities.
Ed
[1] lets define "medium" as "big enough to make the output
file(s) unwieldy even when the source code has been
compressed"
--
Edward H. Hill III, PhD
office: MIT Dept. of EAPS; Rm 54-1424; 77 Massachusetts Ave.
Cambridge, MA 02139-4307
emails: eh3@xxxxxxx ed@xxxxxxx
URLs: http://web.mit.edu/eh3/ http://eh3.com/
phone: 617-253-0098
fax: 617-253-4464