Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

Jim,

I am using the gpfs filesystem, but did not set any MPI-IO hints.
I did not do processor binding, but I guess binding could help if
less processors used on a node.
I am actually using NC_MPIPOSIX, rather than NC_MPIIO as the later will give
even worse timing.

The 5G file has 170 variables, with some of them have size:
[ 1 <time | unlimited>, 27 <ilev>, 768 <lat>, 1152 <lon> ]
and used chunk size (1, 1, 192, 288).

The last part more like a netcdf developers work.

Thanks,

Wei

huangwei@xxxxxxxx
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924





On Sep 19, 2011, at 10:48 AM, Jim Edwards wrote:

> Hi Wei,
> 
> 
> Are you using the gpfs filesystem and are you setting any MPI-IO hints for 
> that filesystem?
> 
> Are you using any processor binding technique?   Have you experimented with 
> other settings?
> 
> You stated that the file is 5G but what is the size of a single field and how 
> is it distributed?  In other words is it already aggregated into a nice 
> blocksize or are you expecting netcdf/MPI-IO to handle that?
> 
> I think that in order to really get a good idea of where the performance 
> problem might be, you need to start by writing and timing a binary file of 
> roughly equivalent size, then write an hdf5 file, then write a netcdf4 file.  
>   My guess is that you will find that the performance problem is lower on the 
> tree...
> 
> - Jim
> 
> On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:
> Hi, netcdfgroup,
> 
> Currently, we are trying to use parallel-enabled NetCDF4. We started with 
> read/write a 5G file and some computation, we got the following timing (in 
> wall-clock) on a IBM power machine:
> Number of Processors    Total(seconds)  read(seconds)   Write(seconds)  
> Computation(seconds)
> seq                                     89.137          28.206          
> 48.327          11.717
> 1                                       178.953         44.837          
> 121.17          11.644
> 2                                       167.25          46.571          
> 113.343         5.648
> 4                                       168.138         44.043          
> 118.968         2.729
> 8                                       137.74          25.161          
> 108.986         1.064
> 16                                      113.354         16.359          
> 93.253          0.494
> 32                                      439.481         122.201         
> 311.215         0.274
> 64                                      831.896         277.363         
> 588.653         0.203
> 
> First thing we can see is that when run parallel-enabled code at one 
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
> 
> Anyone wants to share their experience?
> 
> Thanks,
> 
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
> 
> 
> 
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/
> 

  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: