Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue

To: Wei Huang <huangwei@xxxxxxxx>
Subject: Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
From: Ed Hartnett <ed@xxxxxxxxxxxxxxxx>
Date: Mon, 19 Sep 2011 16:43:20 -0600

Wei Huang <huangwei@xxxxxxxx> writes:

> Hi, netcdfgroup,
>
> Currently, we are trying to use parallel-enabled NetCDF4. We started with 
> read/write a 5G file and some computation, we got the following timing (in 
> wall-clock) on a IBM power machine:
> Number of Processors  Total(seconds)  read(seconds)   Write(seconds)  
> Computation(seconds)
> seq                                   89.137          28.206          48.327  
>         11.717
> 1                                     178.953         44.837          121.17  
>         11.644
> 2                                     167.25          46.571          113.343 
>         5.648
> 4                                     168.138         44.043          118.968 
>         2.729
> 8                                     137.74          25.161          108.986 
>         1.064
> 16                                    113.354         16.359          93.253  
>         0.494
> 32                                    439.481         122.201         311.215 
>         0.274
> 64                                    831.896         277.363         588.653 
>         0.203
>
> First thing we can see is that when run parallel-enabled code at one 
> processor, the total
> wall-clok time doubled.
> Then we did not see the scaling when more processors added.
>
> Anyone wants to share their experience?
>
> Thanks,
>
> Wei Huang
> huangwei@xxxxxxxx
> VETS/CISL
> National Center for Atmospheric Research
> P.O. Box 3000 (1850 Table Mesa Dr.)
> Boulder, CO 80307-3000 USA
> (303) 497-8924
>
>

Howdy Wei and all!

Are you using the 4.1.2 release? Did you configure with
--enable-parallel-tests, and did those tests pass?

I would suggest building netCDF with --enable-parallel-tests and then
running nc_test4/tst_nc4perf. This simple program, based on
user-contributed test code, performs parallel I/O with a wide variety of
options, and prints a table of results.

This will tell you whether parallel I/O is working on your platform, and
at least give some idea of reasonable settings.

Parallel I/O is a very complex topic. However, if everything is working
well, you should see I/O improvement which scales reasonably linearly,
for less then about 8 processors (perhaps more, depending on your
system, but not much more.) At this point, your parallel application is
saturating your I/O subsystem, and further I/O performance is
marginal.

In general, HDF5 I/O will not be faster than netCDF-4 I/O. The netCDF-4
layer is very light in this area, and simply calls the HDF5 that the
user would call anyway.

Key settings are: 

* MPI_IO vs. POSIX_IO (varies from platform to platform which is
  faster. See nc4perf results for your machine/compiler.)

* Chunking and caching play a big role, as always. Caching is
  turned off by default, otherwise netCDF caches on all the processors
  will consume too much memory. But you should set this to at least the
  size of one chunk. Note that this cache will happen on all processors
  involved.

* Collective vs. independent access. Seems (to my naive view) like
  independent should usually be faster, but the opposite seems to be
  the case. This is because the I/O subsystems are good at grouping I/O
  requests into larger, more efficient units. Collective access gives
  the I/O layer the maximum chance to exercise its magic.

Best thing to do is get tst_nc4perf working on your platform, and then
modify it to write data files that match yours (i.e. same size
variables). The program will then tell you the best set of settings to
use in your case.

If the program shows that parallel I/O is not working, take a look at
the netCDF test program h5_test/tst_h_par.c. This is a HDF5-only program
(no netcdf code at all) that does parallel I/O. If this program does not
show that parallel I/O is working, then your problem is not with the
netCDF layer, but somewhere in HDF5 or even lower in the stack.

Thanks!

Ed

-- 
Ed Hartnett  -- ed@xxxxxxxxxxxxxxxx

Follow-Ups:
- Re: [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  - From: Wei Huang

References:
- [netcdfgroup] Unidata developers blog...
  - From: Ed Hartnett
- [netcdfgroup] NetCDF4 Parallel-enabled IO performance issue
  - From: Wei Huang

2011 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: