NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi, For some light reading last night, I was reading the HDF5 FAQs (I know, I've got to get out more :-), and came across a possible show-stopper: http://hdf.ncsa.uiuc.edu/hdf5-quest.html#grdwt As background, users of netCDF sometimes have one writer process and one or more reader processes opening and accessing the same file concurrently, using nc_sync() or NC_SHARE to make sure the readers and writer see a consistent version of the file. The way concurrent access is handled is explained here in about seven paragraphs: http://www.unidata.ucar.edu/packages/netcdf/guidec/guidec-10.html#HEADING10-322 under the nc_sync() description. Note that there are two different levels of concern for synchronization: 1. data, that is values of variables that are changed and new data added, including new records as the result of the unlimited dimension being increased by the writer process 2. schema changes, such as adding new dimensions, variables, or attributes, changing the names of things, or even changing the value of an attribute. NetCDF provides good support for multiple readers and one writer for changes of the first type, to the data, by either using nc_sync() or (preferred) by using the NC_SHARE flag on open. NetCDF provides almost no support for concurrent changes of the second type, which involve a writer changing the schema (header) information for a file, implying that the cached in-memory header information would all have to be reread. So for the fairly uncommon second kind of change (to the schema), we recommend that some external form of communication be used to inform the readers of a need to close and reopen the file to see the changes made by the writer. However the more common first kind of change is handled without needing any communication between writer and readers and without requiring closing and reopening the file. If my reading of the HDF5 FAQ answer is right, this common kind of data concurrency is not supported in HDF5, so systems that make data changes with a concurrent writer and one or more readers won't work unless we provide some new communication among the processes doing I/O to make sure readers close and then reopen the file after *any* write. Is this right, or am I taking the HDF5 FAQ answer too literally? We're currently not doing all this stuff in our netCDF-4 prototype if a file is open with the NC_SHARE flag or on nc_sync() calls. If we have to add code on reads to close and then reopen the file if it's been modified, this will require some rework and have performance implications. On the other hand, maybe everything is OK and the above is not really necessary to assure that the reader gets a consistent, if not absolutely up-to-date, view of the file (which is all that the netCDF implementation needs). Comments? --Russ
netcdf-hdf
archives: