Rob,
Below is the result use pnetcdf (but under netcdf4).
Thanks,
Wei Huang
huangwei@xxxxxxxx
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924
Number of Processors Total(seconds) read(seconds) Write(seconds)
Computation(seconds)
seq 89.137 28.206 48.327
11.717
1 89.055 18.190 58.977
11.612
2 189.892 14.577 168.999
5.729
4 229.825 24.265 202.11
2.585
8 263.488 26.528 234.199
1.130
16 298.131 48.399 247.07
0.625
32 421.336 63.559 352.373
0.484
64 549.144 71.947 462.465
0.525
On Sep 19, 2011, at 11:36 AM, Rob Latham wrote:
> On Mon, Sep 19, 2011 at 11:09:23AM -0600, Wei Huang wrote:
>> Jim,
>>
>> I am using the gpfs filesystem, but did not set any MPI-IO hints.
>> I did not do processor binding, but I guess binding could help if
>> less processors used on a node.
>> I am actually using NC_MPIPOSIX, rather than NC_MPIIO as the later will give
>> even worse timing.
>>
>> The 5G file has 170 variables, with some of them have size:
>> [ 1 <time | unlimited>, 27 <ilev>, 768 <lat>, 1152 <lon> ]
>> and used chunk size (1, 1, 192, 288).
>>
>> The last part more like a netcdf developers work.
>
> Perhaps you can make the netcdf developers' job a bit easier by
> providing a test case. If the dataset contains 170 variables, then it
> must be part of some larger program and so might be hard to extract.
>
> I'll be honest: I'm mostly curious how pnetcdf handles this workload
> (my guess as a pnetcdf developer is "poorly" because of the record
> variable i/o). Still, the test case will help the netcdf, hdf5, and
> MPI-IO developers...
>
> ==rob
>
>> On Sep 19, 2011, at 10:48 AM, Jim Edwards wrote:
>>
>>> Hi Wei,
>>>
>>>
>>> Are you using the gpfs filesystem and are you setting any MPI-IO hints for
>>> that filesystem?
>>>
>>> Are you using any processor binding technique? Have you experimented with
>>> other settings?
>>>
>>> You stated that the file is 5G but what is the size of a single field and
>>> how is it distributed? In other words is it already aggregated into a nice
>>> blocksize or are you expecting netcdf/MPI-IO to handle that?
>>>
>>> I think that in order to really get a good idea of where the performance
>>> problem might be, you need to start by writing and timing a binary file of
>>> roughly equivalent size, then write an hdf5 file, then write a netcdf4
>>> file. My guess is that you will find that the performance problem is
>>> lower on the tree...
>>>
>>> - Jim
>>>
>>> On Mon, Sep 19, 2011 at 10:28 AM, Wei Huang <huangwei@xxxxxxxx> wrote:
>>> Hi, netcdfgroup,
>>>
>>> Currently, we are trying to use parallel-enabled NetCDF4. We started with
>>> read/write a 5G file and some computation, we got the following timing (in
>>> wall-clock) on a IBM power machine:
>>> Number of Processors Total(seconds) read(seconds) Write(seconds)
>>> Computation(seconds)
>>> seq 89.137 28.206
>>> 48.327 11.717
>>> 1 178.953 44.837
>>> 121.17 11.644
>>> 2 167.25 46.571
>>> 113.343 5.648
>>> 4 168.138 44.043
>>> 118.968 2.729
>>> 8 137.74 25.161
>>> 108.986 1.064
>>> 16 113.354 16.359
>>> 93.253 0.494
>>> 32 439.481 122.201
>>> 311.215 0.274
>>> 64 831.896 277.363
>>> 588.653 0.203
>>>
>>> First thing we can see is that when run parallel-enabled code at one
>>> processor, the total
>>> wall-clok time doubled.
>>> Then we did not see the scaling when more processors added.
>>>
>>> Anyone wants to share their experience?
>>>
>>> Thanks,
>>>
>>> Wei Huang
>>> huangwei@xxxxxxxx
>>> VETS/CISL
>>> National Center for Atmospheric Research
>>> P.O. Box 3000 (1850 Table Mesa Dr.)
>>> Boulder, CO 80307-3000 USA
>>> (303) 497-8924
>>>
>>>
>>>
>>> _______________________________________________
>>> netcdfgroup mailing list
>>> netcdfgroup@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>>>
>>
>
>> _______________________________________________
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA