Hi!
I doing some simple run-time visualization of a model run that creates a NetCDF
file, adding records to state variables as time advances (time is the unlimited
dimension). The model runs on one machine, the visualization on another one,
both share the data via NFS (from another machine).
Now, the visualization is simply this: Look at the NetCDF header if the time
dimension grew, if it did, extract and plot the last available record. It is a
shell script that does simple busy-waiting with intervals of a second (or 10,
which is the same, basically;-), calling a Fortran program to look at the
header and then another one to extract the data.
I have experienced issues with the NFS shares (or even the local file system)
before (p.ex. builds fail because a following compilation step cannot read all
of the object data written in an earlier step), so I am suspicious about any
data inconsistencies and funky caching issues. But I want to first make sure
that the issue I am having with my simple run-time visualization is not simply
rooted in the way NetCDF writes its data.
The problem is this: My script sees that a new time is there, but the
subsequent data extraction does not yield a complete data set. There can be
several seconds between writing of the data and the attempt to extract it... so
I suspect some nasty caching at some point. It is not that much data, in the
megabyte range -- it does take a fraction of a second to write it.
I was about to ask about the order of write operations in NetCDF, like, if it
first writes the new record of a variable and _then_ updates the header to
increase the record dimension length -- but this is really not my problem here
(an in addition, it is rather moot since I have to write my data and the "time"
variable itself, too, so there are always times with inconsistent
header/variable values when not using some locking for file access). My problem
is that there simply is a long delay (approaching the minute range, instead of
seconds) before the vizualization machine sees the written bytes from the model
run, and then it even doesn't see full records.
I do call nf90_sync() in the writing program, just after appending a record.
From its documentation I gather that it should exactly do what I expect: Make
sure any buffers are written do disk... make the file on disk consistent.
But, well... I see now that this does not work with NFS. Or should it?
I observe that I can trigger an apparent update of the data available for
visualization (also getting a consistent data excerpt) when running 'sync' on
the machine that runs the model code -- where nf90_sync() has been called
before.
So, perhaps it boils down to this: What kind of syncing is implied in
nf90_sync()? It is just internal NetCDF buffers to operating system buffers, I
presume... so there is no call to the C fsync() function in NetCDF, for
example? Or is there such a call and our NFS (with ZFS behind) setup is simply
broken? But then, 'sync' on the command line works...
Can someone enlighten me on the caching/synchronization strategy there?
Alrighty then,
Thomas.
PS: I may quickly check what kind of syncing NetCDF does by looking at the
code, but it may also be a good idea to have a bit of discussion about this...
or have me quickly pointed to the FAQ entry that explains it all;-)
--
Dipl. Phys. Thomas Orgis
Atmospheric Modelling
Alfred-Wegener-Institute for Polar and Marine Research