Re: [netcdfgroup] Wondering about when NetCDF data hits the disk...

  • To: Rob Ross <rross@xxxxxxxxxxx>
  • Subject: Re: [netcdfgroup] Wondering about when NetCDF data hits the disk...
  • From: Thomas Orgis <thomas.orgis@xxxxxx>
  • Date: Wed, 28 Oct 2009 14:50:35 +0100
Am Wed, 28 Oct 2009 08:17:34 -0500
schrieb Rob Ross <rross@xxxxxxxxxxx>:

> NFS is not meant for the sort of concurrent access that you're  
> attempting. There is no coordination of the caching on one client
> with another. The sync() operation pushes data from the writing
> client back to the server, but there's no good way to ensure that the
> reader sees the up-to-date data.

Well, actually my case is a bit simpler as the generic one: I am only after 
_appended_ data, added to the existing NetCDF file.


> You could *try* closing and
> re-opening on the client, but there are no guarantees.

I am starting a new program instance for each read attempt, so I am 
closing/reopening enough, I guess.
Of course there are no guarantees without some other proper synchronization 
mechanism, but I have read the documentation of nf90_sync(), from 
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f90/NF90_005fSYNC.html, 
where one can find these two phrases:

"The function NF90_SYNC offers a way to synchronize the disk copy of a netCDF 
dataset with in-memory buffers"
"For a writer, this flushes buffers to disk."

At least the documentation should be adapted, then, because -- so it seems to 
me -- NetCDF merely writes data from internal buffers, in fact handing over 
that data to operating system buffers in my case. "Synchronizing a disk copy" 
can too easily be misunderstood as "doing fsync() on the underlying file after 
writing data".
"Flushes buffers to disk" is even more problematic since one can be led to even 
think that we are telling the hard disk to flush its buffers to the real 
storage (OK, that may be a little stretch, but it's imaginable).
We cannot imply here that any hard disk activity is triggered any time soon.

> This is not a netCDF issue, but an NFS one.

True, this should be no issue for processes on one box (apart from theoretical 
safety from system crash by ensuring that data has been written as expected), 
but it is an issue coming up with NetCDF usage and as I see it, I need some 
help from the NetCDF side to solve it.
To do my visualization properly, I think I need two provisions:

1. Use file system locking to ensure that I do not try to extract data while it 
is being partially written. NFS does support locking (at least it tries;-). 
Actually I should only need to lock at the step looking for the current extend 
of the time dimension, because It does not hurt to append more data while I am 
reading the previous complete data set -- NetCDF does not move any existing 
data around when not redefining header stuff (right?).

2. Some way to tell NetCDF to do fsync() on the underlying file. You can argue 
that the normal nc_sync() should not include fsync() as it's just for 
synchronizing processes on the same box and not concerned with data integrity 
or network shares, but would be a bad thing to add an API hook like 
nc_sync_files() (or _disk, _filesystem ...) that simply calls fsync()? My 
problem is that I do not see a better way to do it outside of NetCDF... I'd 
need to extract the file descriptor from NetCDF to do it myself in C, which 
violates the encapsulation of the API, also ... I am at a loss with standard 
Fortran 90, where I yet have to figure out if there is a portable/standard way 
to get fsync() behaviour. A hack might be to close the NetCDF dataset, open the 
file locally, do whatever triggers fsync(), close, reopen with netcdf(). 
Hacking NetCDF appears a lot cleaner than _that_.

Well, a patch would be rather trivial (um, for UNIX...), I guess, so I ask: Is 
there opposition to including an explicit fsync() facility to the NetCDF API? I 
think it can be wanted for the same reasons other database systems use fsync(), 
and I presented a (possibly retarded;-) use case here.

In any case, one should clarify the documentation...


Alrighty then,

Thomas.

-- 
Dipl. Phys. Thomas Orgis
Atmospheric Modelling
Alfred-Wegener-Institute for Polar and Marine Research



  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: