Re: [netcdfgroup] Informations about parallel netCDF4



On 04/17/2014 03:22 AM, Alexis Praga wrote:
Hi,

I have some questions about parallel netCDF4 (using HDF5, not PnetCDF).
I think it's best to just ask them, so please excuse the long list :

1) What is its strategy for parallel I/O ?

i'm not entirely sure what you're asking here. Most parallel I/O libraries carry out I/O to different regions of the file simultaneously (in parallel), and thereby extract more aggregate performance out of the storage system.

for any application using any I/O library, the trickiest part is how to decompose your domain over N parallel processes and how to describe that decomposition.

2) How is it related to HDF5 ? Is it just a wrapper around it ?

in one way of looking, yes. in order to adopt HDF5 as one possible backend, though, the unidata netCDF folks designed a dispatch system so one might write via the classic netCDF interface, via the Argonne-Northwestern Parallel-NetCDF interface, via HDF5, or via DAP.

3) When writing a netCDF4 file, is it really netCDF or is it HDF5 ?
ncdump -k returns "netCDF4" but I am not sure.

the new file format is an HDF5 file that can be examined with the broad ecosystem of HDF5 utilities. this hdf5 file, though, has a particular schema or layout that indicates it's a netcdf4 kind of HDF5 file.

4) Is there some documentation online ? I only found that :
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial/Parallel.html
which is very light.
5) Any references (paper or benchmarks) are welcomed. At the moment, I only
found the paper by Li et al. (2003) about PnetCDF.

in strict performance terms -- which in the end is not really the be-all end all -- Argonne-Northwestern Parallel-NetCDF will be hard to beat, unless you are working with record variables. The classic netcdf (CDF-1, CDF-2 and CDF-5) file formats are incredibly friendly to parallel I/O, but this friendly layout comes at a cost -- record variables can have only one UNLIMITED dimension, the layout of record variables is sub-optimal for I/O.

HDF5's file format allows for greater flexibility but that flexibility comes at a metadata cost. Once you start operating on large enough datasets and large enough levels of parallelism, the underlying file system becomes the limit on performance.

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: