Re: [netcdfgroup] Informations about parallel netCDF4



On 05/07/2014 09:20 AM, Alexis Praga wrote:
On Wed, May 07, 2014 at 09:02:23AM -0500, Rob Latham wrote:
i'm not entirely sure what you're asking here.  Most parallel I/O
libraries carry out I/O to different regions of the file
simultaneously (in parallel), and thereby extract more aggregate
performance out of the storage system.

for any application using any I/O library, the trickiest part is how
to decompose your domain over N parallel processes and how to
describe that decomposition.

To clarify: the way I see it, you can do parallel I/O in three different ways.
The first is to reserve a process which will only deal with I/O and other
process will exchange data to read/write with it.
The second is to have each process read/write independantly.
The third is to aggregate the I/O for several processes to improve performances.

So my question was: in practice, which approach does parallel netCDF use ?

you can use any of the I/O libraries (netcdf4, Parallel-NetCDF, HDF5) in either of those three models, but the third approach you describe is the use case for which all these libraries were designed.


in strict performance terms -- which in the end is not really the
be-all end all -- Argonne-Northwestern Parallel-NetCDF will be hard
to beat, unless you are working with record variables.
Do you speak from personal experience ? I would be very interested in seeing
some data or benchmark about it.

Second hand: Babak Behzad spent a summer at NCAR working with John Dennis doing I/O workload experiments in support of the CESM climate simulation project. I don't know if the results ended up in some kind of paper or other presentation.

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: