On Mon, Jun 21, 2010 at 10:33:12PM +0400, Constantine Khroulev wrote:
> It seems to me that case 1 is slow because NetCDF (Classic) keeps
> the file header as small as possible (Section 4 of the NetCDF User's
> Guide is perfectly clear about this).
You can use nc__enddef (the double-underscore version) to adjust this
behavior and pad out the header. then, when the header grows in size
because you've added another variable, you won't have to rewrite the
entire dataset. You only need a few bytes in the header for a new
variable: by adding, say, 4k of headroom, you can store a lot of
variables w/o triggering a rewrite.
> Case 2, on the other hand,
> seems to be slow because (please correct me if I'm wrong) variables
> are stored contiguously. (In other words: if variables A and B are
> defined in this order, then appending X bytes to A requires moving B
> over by X bytes.)
In parallel-netcdf land we take some (technically legal) liberties
with the file format so that you can pad out individual variables.
There might be a tuning option to do that in netcdf, but I don't know
it off the top of my head.
> My question is:
>
> How does NetCDF-4 compare to NetCDF Classic in this regard? Would
> switching to it improve write performance? (This is two questions,
> really: I'm interested in cases 1 and 2 separately.)
I imagine the new file format will handle this pretty well, but I'm
not an expert. You'll pay a bit of a price when you read back this
data but it sounds like that's not a big deal for your workload.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA