Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

  • To: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
  • Subject: Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  • From: Wei Huang <huangwei@xxxxxxxx>
  • Date: Mon, 26 Sep 2011 10:03:48 -0600
Ed,

Tried run tst_h_par few times, and the timing is below.

Wei Huang
huangwei@xxxxxxxx
VETS/CISL
National Center for Atmospheric Research
P.O. Box 3000 (1850 Table Mesa Dr.)
Boulder, CO 80307-3000 USA
(303) 497-8924





On Sep 20, 2011, at 3:57 PM, Ed Hartnett wrote:

> Wei Huang <huangwei@xxxxxxxx> writes:
> 
>> I have run nc_test4/tst_nc4perf for 1, 2, 4, and 8 processors, results 
>> attached.
>> 
>> To me, the performance decreases when processors increase.
>> Someone may have a better interpret.
> 
> I have looked at your nc4perf output and agree that there is some
> performance problem here. This seems to indicate that parallel I/O is not
> working well on your system for some reason.
> 
>> I also run tst_parallel4, with result:
>> num_proc   time(s)  write_rate(B/s)
>> 1       9.2015  1.16692e+08
>> 2       12.4557 8.62048e+07
>> 4       6.30644 1.70261e+08
>> 8       5.53761 1.939e+08
>> 16      2.25639 4.75866e+08
>> 32      2.28383 4.7015e+08
>> 64      2.19041 4.90202e+08
> 
> Yet this test clearly is working, as the time decreases for all
> processors past 2, until about 16, at which point the I/O system is
> saturated, and then performance levels off. This is what I would expect.
> 
> But these test results are not compatible with your nc4perf results.
> 
>>   We can modify this program to mimic our data size, but do not know
>>   if this will help us.
> 
>>> 
>>> If the program shows that parallel I/O is not working, take a look at
>>> the netCDF test program h5_test/tst_h_par.c. This is a HDF5-only program
>>> (no netcdf code at all) that does parallel I/O. If this program does not
>>> show that parallel I/O is working, then your problem is not with the
>>> netCDF layer, but somewhere in HDF5 or even lower in the stack.
> 
> Try timing tst_h_par for several different numbers of processors, and
> see if you get a performance improvement there.

*** Creating file for parallel I/O read, and rereading it...
p= 1, write_rate=113.568, read_rate=51.5761
p= 2, write_rate=142.687, read_rate=239.493
p= 4, write_rate=543.575, read_rate=1280.54
p= 8, write_rate=167.021, read_rate=1398.42
p=16, write_rate=204.08,  read_rate=1555.1
p=32, write_rate=72.7069, read_rate=720.396
p=64, write_rate=40.2151, read_rate=358.09


*** Creating file for parallel I/O read, and rereading it...
p= 1, write_rate=117.562, read_rate=733.768
p= 2, write_rate=358.092, read_rate=1457.53
p= 4, write_rate=528.873, read_rate=1439.01
p= 8, write_rate=230.93,  read_rate=1282.31
p=16, write_rate=174.401, read_rate=468.23
p=32, write_rate=98.98,   read_rate=2057.22
p=64, write_rate=103.817, read_rate=794.755


*** Creating file for parallel I/O read, and rereading it...
p=1, write_rate=114.031, read_rate=770.388
p=2, write_rate=425.982, read_rate=1429.43
p=4, write_rate=428.331, read_rate=1393.3
p=8, write_rate=344.846, read_rate=1397.72
p=16, write_rate=288.448, read_rate=1239.88
p=32, write_rate=102.718, read_rate=2751.15
p=64, write_rate=62.3665, read_rate=879.375

> 

> Thanks,
> 
> Ed
> 
> -- 
> Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx



  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: