Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

  • To: Wei Huang <huangwei@xxxxxxxx>
  • Subject: Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  • From: Rob Latham <robl@xxxxxxxxxxx>
  • Date: Mon, 19 Sep 2011 12:36:09 -0500
On Mon, Sep 19, 2011 at 11:09:23AM -0600, Wei Huang wrote:
> Jim,
> 
> I am using the gpfs filesystem, but did not set any MPI-IO hints.
> I did not do processor binding, but I guess binding could help if
> less processors used on a node.
> I am actually using NC_MPIPOSIX, rather than NC_MPIIO as the later will give
> even worse timing.
> 
> The 5G file has 170 variables, with some of them have size:
> [ 1 <time | unlimited>, 27 <ilev>, 768 <lat>, 1152 <lon> ]
> and used chunk size (1, 1, 192, 288).
> 
> The last part more like a netcdf developers work.

Perhaps you can make the netcdf developers' job a bit easier by
providing a test case.  If the dataset contains 170 variables, then it
must be part of some larger program and so might be hard to extract.

I'll be honest: I'm mostly curious how pnetcdf handles this workload
(my guess as a pnetcdf developer is "poorly" because of the record
variable i/o).  Still, the test case will help the netcdf, hdf5, and
MPI-IO developers...

==rob

> On Sep 19, 2011, at 10:48 AM, Jim Edwards wrote:
> 
> > Hi Wei,
> > 
> > 
> > Are you using the gpfs filesystem and are you setting any MPI-IO hints for 
> > that filesystem?
> > 
> > Are you using any processor binding technique?   Have you experimented with 
> > other settings?
> > 
> > You stated that the file is 5G but what is the size of a single field and 
> > how is it distributed?  In other words is it already aggregated into a nice 
> > blocksize or are you expecting netcdf/MPI-IO to handle that?
> > 
> > I think that in order to really get a good idea of where the performance 
> > problem might be, you need to start by writing and timing a binary file of 
> > roughly equivalent size, then write an hdf5 file, then write a netcdf4 
> > file.    My guess is that you will find that the performance problem is 
> > lower on the tree...
> > 
> > - Jim
> > 
> > On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:
> > Hi, netcdfgroup,
> > 
> > Currently, we are trying to use parallel-enabled NetCDF4. We started with 
> > read/write a 5G file and some computation, we got the following timing (in 
> > wall-clock) on a IBM power machine:
> > Number of Processors    Total(seconds)  read(seconds)   Write(seconds)  
> > Computation(seconds)
> > seq                                     89.137          28.206          
> > 48.327          11.717
> > 1                                       178.953         44.837          
> > 121.17          11.644
> > 2                                       167.25          46.571          
> > 113.343         5.648
> > 4                                       168.138         44.043          
> > 118.968         2.729
> > 8                                       137.74          25.161          
> > 108.986         1.064
> > 16                                      113.354         16.359          
> > 93.253          0.494
> > 32                                      439.481         122.201         
> > 311.215         0.274
> > 64                                      831.896         277.363         
> > 588.653         0.203
> > 
> > First thing we can see is that when run parallel-enabled code at one 
> > processor, the total
> > wall-clok time doubled.
> > Then we did not see the scaling when more processors added.
> > 
> > Anyone wants to share their experience?
> > 
> > Thanks,
> > 
> > Wei Huang
> > huangwei@xxxxxxxx
> > VETS/CISL
> > National Center for Atmospheric Research
> > P.O. Box 3000 (1850 Table Mesa Dr.)
> > Boulder, CO 80307-3000 USA
> > (303) 497-8924
> > 
> > 
> > 
> > _______________________________________________
> > netcdfgroup mailing list
> > netcdfgroup@xxxxxxxxxxxxxxxx
> > For list information or to unsubscribe,  visit: 
> > http://www.unidata.ucar.edu/mailing_lists/
> > 
> 

> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 


-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA



  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: