NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Russ, > For some light reading last night, I was reading the HDF5 FAQs (I > know, I've got to get out more :-), and came across a possible :-) > show-stopper: > > http://hdf.ncsa.uiuc.edu/hdf5-quest.html#grdwt > > As background, users of netCDF sometimes have one writer process and > one or more reader processes opening and accessing the same file > concurrently, using nc_sync() or NC_SHARE to make sure the readers and > writer see a consistent version of the file. The way concurrent > access is handled is explained here in about seven paragraphs: > > > http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-10.html#HEADING10-322 > > under the nc_sync() description. > > Note that there are two different levels of concern for > synchronization: > > 1. data, that is values of variables that are changed and new data > added, including new records as the result of the unlimited > dimension being increased by the writer process > > 2. schema changes, such as adding new dimensions, variables, or > attributes, changing the names of things, or even changing the > value of an attribute. > > NetCDF provides good support for multiple readers and one writer for > changes of the first type, to the data, by either using nc_sync() or > (preferred) by using the NC_SHARE flag on open. > > NetCDF provides almost no support for concurrent changes of the second > type, which involve a writer changing the schema (header) information > for a file, implying that the cached in-memory header information > would all have to be reread. > > So for the fairly uncommon second kind of change (to the schema), we > recommend that some external form of communication be used to inform > the readers of a need to close and reopen the file to see the changes > made by the writer. However the more common first kind of change is > handled without needing any communication between writer and readers > and without requiring closing and reopening the file. > > If my reading of the HDF5 FAQ answer is right, this common kind of > data concurrency is not supported in HDF5, so systems that make data > changes with a concurrent writer and one or more readers won't work > unless we provide some new communication among the processes doing I/O > to make sure readers close and then reopen the file after *any* write. > Is this right, or am I taking the HDF5 FAQ answer too literally? > > We're currently not doing all this stuff in our netCDF-4 prototype > if a file is open with the NC_SHARE flag or on nc_sync() calls. If we > have to add code on reads to close and then reopen the file if it's > been modified, this will require some rework and have performance > implications. > > On the other hand, maybe everything is OK and the above is not really > necessary to assure that the reader gets a consistent, if not > absolutely up-to-date, view of the file (which is all that the netCDF > implementation needs). > > Comments? This sort of concurrency is not supported by default, but it should be possible to achieve it with sufficient tweaking of the caching parameters. You can use H5Pset_sieve_buf_size() to turn off raw data caching and you can use H5Pset_cache() to turn off metadata caching also. Obviously, performance is not great in these scenarios, but I think it will work. If we want to recover some of the performance given up by these sort of tweaks, we could change the internal caches to allow write-through instead of write-back caching, which would probably recover a significant chunk of the slowdown. Quincey
netcdf-hdf
archives: