Re: [netcdfgroup] Wondering about when NetCDF data hits the disk...

  • To: Thomas Orgis <thomas.orgis@xxxxxx>
  • Subject: Re: [netcdfgroup] Wondering about when NetCDF data hits the disk...
  • From: Rob Ross <rross@xxxxxxxxxxx>
  • Date: Wed, 28 Oct 2009 09:28:36 -0500
On Oct 28, 2009, at 8:50 AM, Thomas Orgis wrote:

Am Wed, 28 Oct 2009 08:17:34 -0500
schrieb Rob Ross <rross@xxxxxxxxxxx>:

NFS is not meant for the sort of concurrent access that you're
attempting. There is no coordination of the caching on one client
with another. The sync() operation pushes data from the writing
client back to the server, but there's no good way to ensure that the
reader sees the up-to-date data.

Well, actually my case is a bit simpler as the generic one: I am only after _appended_ data, added to the existing NetCDF file.

It is a mistake to think that there is any rhyme or reason to the cache update and replacement policy in NFS. In fact it is ok for a client implementation to cache the file size and other metadata too, and return an out-of-date version to a process.

You could *try* closing and
re-opening on the client, but there are no guarantees.

I am starting a new program instance for each read attempt, so I am closing/reopening enough, I guess. Of course there are no guarantees without some other proper synchronization mechanism, but I have read the documentation of nf90_sync(), from http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f90/NF90_005fSYNC.html , where one can find these two phrases:

"The function NF90_SYNC offers a way to synchronize the disk copy of a netCDF dataset with in-memory buffers"
"For a writer, this flushes buffers to disk."

It is supposedly synchronizing that process's in-memory buffers with the copy on the server (on disk). Actually, what is probably happening is that the dirty regions are being pushed to the server, but it's entirely possible that clean cached blocks are not check to see if they are up-to-date.

At least the documentation should be adapted, then, because -- so it seems to me -- NetCDF merely writes data from internal buffers, in fact handing over that data to operating system buffers in my case. "Synchronizing a disk copy" can too easily be misunderstood as "doing fsync() on the underlying file after writing data". "Flushes buffers to disk" is even more problematic since one can be led to even think that we are telling the hard disk to flush its buffers to the real storage (OK, that may be a little stretch, but it's imaginable). We cannot imply here that any hard disk activity is triggered any time soon.

Well, I imagine that they are calling fsync(), which says it will do the following:

---
Fsync() causes all modified data and attributes of fildes to be moved to a permanent storage device. This normally results in all in-core modified copies of buffers for the associated file to be written to a disk.
---

But point taken that the NFS server might do something other than write it to disk.

This is not a netCDF issue, but an NFS one.

True, this should be no issue for processes on one box (apart from theoretical safety from system crash by ensuring that data has been written as expected), but it is an issue coming up with NetCDF usage and as I see it, I need some help from the NetCDF side to solve it.

You are making a false assumption, which is that netCDF can do something to help solve this. As someone who has been dealing with NFS in the context of parallel computing for over a decade, I can tell you with confidence that this is a losing battle that the netCDF team should not undertake. The correct solution to this problem is for you to run a cluster file system that provides appropriate semantics for this type of coordinated access.

To do my visualization properly, I think I need two provisions:

1. Use file system locking to ensure that I do not try to extract data while it is being partially written. NFS does support locking (at least it tries;-). Actually I should only need to lock at the step looking for the current extend of the time dimension, because It does not hurt to append more data while I am reading the previous complete data set -- NetCDF does not move any existing data around when not redefining header stuff (right?).

The locks are advisory and do not impact caching. Dead end. Sorry, but this is hard. You can have a look at this:
  http://www.mcs.anl.gov/research/projects/romio/doc/users-guide-all/index.html

Under "Using ROMIO on NFS, and ROMIO, NFS, and Synchronization" you can get a feel for what you can actually do to try to make NFS behave in a helpful way for you. Expect performance of your NFS volume to drop significantly if you configure it in this way.

2. Some way to tell NetCDF to do fsync() on the underlying file.

I haven't looked; I'm guessing that it does do an fsync() as part of a normal nc_sync()?

You can argue that the normal nc_sync() should not include fsync() as it's just for synchronizing processes on the same box and not concerned with data integrity or network shares, but would be a bad thing to add an API hook like nc_sync_files() (or _disk, _filesystem ...) that simply calls fsync()? My problem is that I do not see a better way to do it outside of NetCDF... I'd need to extract the file descriptor from NetCDF to do it myself in C, which violates the encapsulation of the API, also ... I am at a loss with standard Fortran 90, where I yet have to figure out if there is a portable/standard way to get fsync() behaviour. A hack might be to close the NetCDF dataset, open the file locally, do whatever triggers fsync(), close, reopen with netcdf(). Hacking NetCDF appears a lot cleaner than _that_.

My guess is that the issue isn't with the writer (who would call fsync()) at all, but the reader. You must understand that in NFS readers cache data and hand that data back to processes without bothering to check if the data is up-to-date with respect to the data on the server. Thus I believe this is a reader-side problem, not a writer-side one.

Well, a patch would be rather trivial (um, for UNIX...), I guess, so I ask: Is there opposition to including an explicit fsync() facility to the NetCDF API? I think it can be wanted for the same reasons other database systems use fsync(), and I presented a (possibly retarded;-) use case here.

In any case, one should clarify the documentation...

How would you propose clarifying it? Something like:

"Note, the view of the file relative to other processes is file system dependent, so this call is not adequate to ensure that the most up-to-date file state is available at all processes."

Rob



  • 2009 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: