Re: [netcdfgroup] Informations about parallel netCDF4

To: <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: Re: [netcdfgroup] Informations about parallel netCDF4
From: Rob Latham <robl@xxxxxxxxxxx>
Date: Wed, 7 May 2014 09:02:23 -0500



On 04/17/2014 03:22 AM, Alexis Praga wrote:

Hi,

I have some questions about parallel netCDF4 (using HDF5, not PnetCDF).
I think it's best to just ask them, so please excuse the long list :

1) What is its strategy for parallel I/O ?

i'm not entirely sure what you're asking here. Most parallel I/Olibraries carry out I/O to different regions of the file simultaneously(in parallel), and thereby extract more aggregate performance out of thestorage system.

for any application using any I/O library, the trickiest part is how todecompose your domain over N parallel processes and how to describe thatdecomposition.

2) How is it related to HDF5 ? Is it just a wrapper around it ?

in one way of looking, yes. in order to adopt HDF5 as one possiblebackend, though, the unidata netCDF folks designed a dispatch system soone might write via the classic netCDF interface, via theArgonne-Northwestern Parallel-NetCDF interface, via HDF5, or via DAP.

3) When writing a netCDF4 file, is it really netCDF or is it HDF5 ?
ncdump -k returns "netCDF4" but I am not sure.

the new file format is an HDF5 file that can be examined with the broadecosystem of HDF5 utilities. this hdf5 file, though, has a particularschema or layout that indicates it's a netcdf4 kind of HDF5 file.

4) Is there some documentation online ? I only found that :
http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial/Parallel.html
which is very light.
5) Any references (paper or benchmarks) are welcomed. At the moment, I only
found the paper by Li et al. (2003) about PnetCDF.

in strict performance terms -- which in the end is not really the be-allend all -- Argonne-Northwestern Parallel-NetCDF will be hard to beat,unless you are working with record variables. The classic netcdf(CDF-1, CDF-2 and CDF-5) file formats are incredibly friendly toparallel I/O, but this friendly layout comes at a cost -- recordvariables can have only one UNLIMITED dimension, the layout of recordvariables is sub-optimal for I/O.

HDF5's file format allows for greater flexibility but that flexibilitycomes at a metadata cost. Once you start operating on large enoughdatasets and large enough levels of parallelism, the underlying filesystem becomes the limit on performance.


==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Follow-Ups:
- Re: [netcdfgroup] Informations about parallel netCDF4
  - From: Alexis Praga

References:
- [netcdfgroup] Informations about parallel netCDF4
  - From: Alexis Praga

2014 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: