Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

Hi Sergei,

Could you please elaborate on what you wrote? Do you think the problem is a file (or record) locking issue? if yes, would that explain the different result for read and write?

I've tried to test parallel-io on our CRAY XT4 but didn't manage to get even close to the serial io performance...

Thanks, Ingo

Siterer "Shibaev, Sergei" <Sergei.Shibaev@xxxxxxxxxx>:

Hi Wei,

Your result is quite obvious because the file itself is a serial device,
so "parallel" read/write means serialising of parallel requests. Of
course, it is at least two times slower than serial requests from single
process.
If you can make file access serialising in your program it could be much
faster than common parallel-enabled API.

Regards,
Sergei Shibaev

-----Original Message-----
From: netcdfgroup-bounces@xxxxxxxxxxxxxxxx
[mailto:netcdfgroup-bounces@xxxxxxxxxxxxxxxx] On Behalf Of Wei Huang
Sent: 19 September 2011 17:28
To: netcdfgroup@xxxxxxxxxxxxxxxx
Subject: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue


Hi, netcdfgroup,

Currently, we are trying to use parallel-enabled NetCDF4. We started
with read/write a 5G file and some computation, we got the following
timing (in wall-clock) on a IBM power machine:
Number of Processors    Total(seconds)  read(seconds)   Write(seconds)
Computation(seconds)
seq                                     89.137          28.206
48.327          11.717
1                                       178.953         44.837
121.17          11.644
2                                       167.25          46.571
113.343         5.648
4                                       168.138         44.043
118.968         2.729
8                                       137.74          25.161
108.986         1.064
16                                      113.354         16.359
93.253          0.494
32                                      439.481         122.201
311.215         0.274
64                                      831.896         277.363
588.653         0.203

First thing we can see is that when run parallel-enabled code at one
processor, the total wall-clok time doubled. Then we did not see the
scaling when more processors added.

Anyone wants to share their experience?

Thanks,

Wei Huang
huangwei@xxxxxxxx
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924



_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit:
http://www.unidata.ucar.edu/mailing_lists/


Click
https://www.mailcontrol.com/sr/qAdBheWrG8zTndxI!oX7UhQ1x5oWB0K1JQiz+EsP7
a8E+4PlxZ84awGZMwDw3dulXStBmSRlfipTHufDF4Ashw==  to report this email as
spam.

_______________________________________________
netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit: http://www.unidata.ucar.edu/mailing_lists/





  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: