Hi Wei,
Are you using the gpfs filesystem and are you setting any MPI-IO hints for
that filesystem?
Are you using any processor binding technique? Have you experimented with
other settings?
You stated that the file is 5G but what is the size of a single field and
how is it distributed? In other words is it already aggregated into a nice
blocksize or are you expecting netcdf/MPI-IO to handle that?
I think that in order to really get a good idea of where the performance
problem might be, you need to start by writing and timing a binary file of
roughly equivalent size, then write an hdf5 file, then write a netcdf4
file. My guess is that you will find that the performance problem is
lower on the tree...
- Jim
On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:
> Hi, netcdfgroup,
>
> Currently, we are trying to use parallel-enabled NetCDF4. We started with
> read/write a 5G file and some computation, we got the following timing (in
> wall-clock) on a IBM power machine:
> Number of Processors Total(seconds) read(seconds) Write(seconds)
> Computation(seconds)
> seq 89.137 28.206
> 48.327 11.717
> 1 178.953 44.837
> 121.17 11.644
> 2 167.25 46.571
> 113.343 5.648
> 4 168.138 44.043
> 118.968 2.729
> 8 137.74 25.161
> 108.986 1.064
> 16 113.354 16.359
> 93.253 0.494
> 32 439.481 122.201
> 311.215 0.274
> 64 831.896 277.363
> 588.653 0.203
>
> First thing we can see is that when run parallel-enabled code at one
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
>
> Anyone wants to share their experience?
>
> Thanks,
>
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>