NOTE: The netcdf-hdf
mailing list is no longer active. The list archives are made available for historical reasons.
Hi Ed, > Quincey Koziol <koziol@xxxxxxxxxxxxx> writes: > > > From HDF5's perspective, you have to use H5Pset_fapl_<foo>(params) to > > choose to use a particular file driver to access a file. Probably something > > like this should be exported/translated out to the netCDF4 layer for users > > to > > choose which driver to access the file with. > > Here's the URL for the parallel HDF5 info currently: > > http://hdf.ncsa.uiuc.edu/HDF5/PHDF5/ > > I'm seeing three steps to parallel HDF5: > > 1 - Initialize MPI > 2 - When opening/creating the file, set a property in file access > properties. > 3 - Every time reading or writing file, pass a correctly set transfer > property. I'm assuming you mean reading/writing "raw" data. > Does that seem to sum it up? That's some of it. You also have to make certain that the functions listed below are called correctly. > But I see below that you are also asking that "these properties must > be set to the same values when they > are used in a parallel program," > > What do you mean by that? You can't have half the processes set a property to one value and the other half set the same property to a different value. (i.e. everybody must agree that the userblock is 512 bytes, for example :-) > In parallel I/O do multiple processes try and create the file? Or does > one create it, and the rest just open it? Sorry if that seems like a > dumb question! In MPI-I/O, file creation is a collective operation, so all the processes participate in the create (from our perspective at least, I don't know how it happens internally in the MPI-I/O library). You are going to have fun learning how to do parallel programming with MPI - think of it as multi-threaded programs with bad debugging support... :-/ Quincey > > > For reading, what does this mean to the API, if anything? > > Well, I've appended a list of HDF5 API functions that are required to be > > performed collectively to the bottom of this document (I can't find the link > > on our web-pages). > > > > > Everyone gets to open the file read-only, and read from it to their > > > heart's content, confident that they are getting the most recent data > > > at that moment. That requires no API changes. > > > > > > Is that it for readers? Or do they get some special additional > > > features, like notification of data arrival, etc? > > User's would also need the option to choose to use collective or > > independent I/O when reading or writing data to the file. That reminds me - > > are y'all planning on adding any wrappers to the H5P* routines in HDF5 which > > set/get various properties for objects? > > This is truly an important question that I will treat in it's own > email thread... > > > > > > Quincey > > > > ============================================================== > > > > Collective functions: > > H5Aclose (2) > > H5Acreate > > H5Adelete > > H5Aiterate > > H5Aopen_idx > > H5Aopen_name > > H5Aread (6) > > H5Arename (A) > > H5Awrite (3) > > > > H5Dclose (2) > > H5Dcreate > > H5Dfill (6) (A) > > H5Dopen > > H5Dextend (5) > > H5Dset_extent (5) (A) > > > > H5Fclose (1) > > H5Fcreate > > H5Fflush > > H5Fmount > > H5Fopen > > H5Funmount > > > > H5Gclose (2) > > H5Gcreate > > H5Giterate > > H5Glink > > H5Glink2 (A) > > H5Gmove > > H5Gmove2 (A) > > H5Gopen > > H5Gset_comment > > H5Gunlink > > > > H5Idec_ref (7) (A) > > H5Iget_file_id (B) > > H5Iinc_ref (7) (A) > > > > H5Pget_fill_value (6) > > > > H5Rcreate > > H5Rdereference > > > > H5Tclose (4) > > H5Tcommit > > H5Topen > > > > Additionally, these properties must be set to the same values when they > > are used in a parallel program: > > File Creation Properties: > > H5Pset_userblock > > H5Pset_sizes > > H5Pset_sym_k > > H5Pset_istore_k > > > > File Access Properties: > > H5Pset_fapl_mpio > > H5Pset_meta_block_size > > H5Pset_small_data_block_size > > H5Pset_alignment > > H5Pset_cache > > H5Pset_gc_references > > > > Dataset Creation Properties: > > H5Pset_layout > > H5Pset_chunk > > H5Pset_fill_value > > H5Pset_deflate > > H5Pset_shuffle > > > > Dataset Access Properties: > > H5Pset_buffer > > H5Pset_preserve > > H5Pset_hyper_cache > > H5Pset_btree_ratios > > H5Pset_dxpl_mpio > > > > Notes: > > (1) - All the processes must participate only if this is the last > > reference to the file ID. > > (2) - All the processes must participate only if all the file IDs > > for > > a file have been closed and this is the last outstanding object > > ID. > > (3) - Because the raw data for an attribute is cached locally, all > > processes must participate in order to guarantee that future > > H5Aread calls return the correct results on all processes. > > (4) - All processes must participate only if the datatype is for a > > committed datatype, all the file IDs for the file have been > > closed > > and this is the last outstanding object ID. > > (5) - All processes must participate only if the number of chunks in > > the dataset actually changes. > > (6) - All processes must participate only if the datatype of the > > attribute a a variable-length datatype (sequence or string). > > (7) - This function may be called independently if the object ID > > does > > not refer to an object that was collectively opened. > > > > (A) - Available only in v1.6 or later versions of the library. > > (B) - Available only in v1.7 or later versions of the library. >
netcdf-hdf
archives: