Re: [netcdfgroup] NetCDF for parallel usage

To: Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
Subject: Re: [netcdfgroup] NetCDF for parallel usage
From: Samrat Rao <samrat.rao@xxxxxxxxx>
Date: Mon, 27 Oct 2014 12:02:08 +0530
Hi Rob,

Please ensure you are running something close to the latest version.
(sometimes we find users -- somehow -- running ten year old MPICH code)

You need a recent-ish HDF5 library to make full use of the MPI-IO library.

You need the very latest netcdf library for assorted bug fixes (and
compatibility with the latest HDF5 library)

I am using Intel's MPI compilers - mpiifort from intel-cluster-studio-2013.
I installed HDF5-1.8.13 and netCDF-4.3.1, netCDF-fortran-4.4.1. The code is
also pretty new.

I found out that there are also some issues with the nodes --- sometimes
the job gets submitted but stays idle. So it is not worth your effort to
debug the errors via email.

Whenever the code does run, i've managed to get outputs from each of 2000
odd procs using simple 'chunking' that fortran offers -
do k = 1,kmax
   write(10) field(:,:,k)
end do
I'll look into mpi-io also.

Also i am getting some errors with large files sizes using serial netcdf-4
for which i have sent another mail.

Rob, it was nice talking to you.

===

Hi Ed,

Thanks for your info. but as i wrote above, i'm having issues with even
serial netCDF-4 for which i've opened another ticket. Hopefully the netCDF
issue will be clearer to me.

Thanks,
Samrat.


On Sun, Oct 19, 2014 at 7:44 AM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
wrote:

> WRT performance, there is a point at which you max out your file system.
> You might have 10K processors, but you don't have 10K disk drives. So once
> you reach the limits of your disk array and internal bandwidth, more
> parallelization will not help your overall I/O performance.
>
> Although there are limits to what may be achieved, it is still worth
> achieving them, as this may provide an order of magnitude or more overall
> performance improvement. But once you saturate your disk array, you will
> not see further performance improvements when you add more processors.
>
> Another reason to use the parallel I/O interfaces (either netCDF-4 with
> parallel, or pnetcdf) is simplification of code. It is a lot easier to
> write the parallel code using netCDF-4 or pnetcdf than to write the code
> which collects data from all the processes and writes it to a file in a
> serial way. By using the parallel interfaces, you get very simple, natural
> code, where each process directly writes data without passing it to another
> process.
>
> With the parallel interfaces, you have to pay attention to collective vs.
> independent in order to get good performance. See the docs for more.
>
> Good luck!
> Ed
>
> On Sat, Oct 18, 2014 at 12:05 PM, Rob Latham <robl@xxxxxxxxxxx> wrote:
>
>>
>>
>> On 10/18/2014 04:39 AM, Samrat Rao wrote:
>>
>>>
>>> Hi Rob & Ed,
>>>
>>> I think that the machine i am using is not that bad. It was commissioned
>>> in '12. Some basic info:
>>>
>>> Performance
>>> 360 TFLOPS Peak & 304 TFLOPS sustained on LINPACK
>>> Hardware
>>> HP blade system C7000 with BL460c Gen8 blades
>>> 1088 nodes with 300 GB disk/node (319 TB)
>>> 2,176 Intel Xeon E5 2670 processors@ 2.6 GHz
>>> 17,408 processor cores, 68 TB main memory
>>> FDR Infiniband based fully non-blocking fat-tree topology
>>> 2 PB high performance storage with lustre parallel file system
>>>
>>
>> OK,then let's work up the software stack.
>>
>> You've got a lustre file system, so  you're going to need a halfway
>> decent MPI-IO implementation. good news: OpenMPI, MPICH, and MVAPICH all
>> have good lustre drivers.  Please ensure you are running something close to
>> the latest version.  (sometimes we find users -- somehow -- running ten
>> year old MPICH code)
>>
>> You need a recent-ish HDF5 library to make full use of the MPI-IO library.
>>
>> You need the very latest netcdf library for assorted bug fixes (and
>> compatibility with the latest HDF5 library)
>>
>> Debugging this stack over the mailing list is a bit of a challenge.
>>
>> ==rob
>>
>>
>>> ----
>>>
>>> Using netCDF configured for parallel applications, i did manage to write
>>> data on a single netCDF file using 512 procs --- but this was when i
>>> reduced the grid nodes per proc to about 20-30. When i made the grid
>>> nodes to about 100 i got this error too:
>>>
>>> NetCDF: HDF error
>>>
>>> ----
>>>
>>> There is another issue i need to share --- while compiling netCDF4 for
>>> parallel usage, i had encountered errors during 'make check' in these
>>> files: run_par_test.sh, run_f77_par_test.sh and run_f90_par_test.sh
>>>
>>> These were related to mpiexec commands --- mpd.hosts issue. These errors
>>> did not occur when i compiled netcdf for parallel on my desktop.
>>>
>>> ----
>>>
>>> Dumping outputs from each processor gave me these  errors --- it is not
>>> that all such errors appear together - they are a bit random.
>>>
>>> proxy:0:13@cn0083] HYDT_bscu_wait_for_completion
>>> (./tools/bootstrap/utils/bscu_wait.c:73): one of the processes
>>> terminated badly; aborting
>>> [proxy:0:13@cn0083] HYDT_bsci_wait_for_completion
>>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
>>> for completion
>>> [proxy:0:13@cn0083] HYD_pmci_wait_for_childs_completion
>>> (./pm/pmiserv/pmip_utils.c:1476): bootstrap server returned error
>>> waiting for completion
>>> [proxy:0:13@cn0083] main (./pm/pmiserv/pmip.c:392): error waiting for
>>> event children completion
>>>
>>> [mpiexec@cn0002] control_cb (./pm/pmiserv/pmiserv_cb.c:674): assert
>>> (!closed) failed
>>> [mpiexec@cn0002] HYDT_dmxu_poll_wait_for_event
>>> (./tools/demux/demux_poll.c:77): callback returned error status
>>> [mpiexec@cn0002] HYD_pmci_wait_for_completion
>>> (./pm/pmiserv/pmiserv_pmci.c:388): error waiting for event
>>> [mpiexec@cn0002] main (./ui/mpich/mpiexec.c:718): process manager error
>>> waiting for completion
>>>
>>> cn0137:b279:beba2700: 132021042 us(132021042 us!!!):  ACCEPT_RTU: rcv
>>> ERR, rcnt=0 op=1 <- 10.1.1.136
>>> cn1068:48c5:4b280700: 132013538 us(132013538 us!!!):  ACCEPT_RTU: rcv
>>> ERR, rcnt=-1 op=1 <- 10.1.5.47
>>> cn1075:dba3:f8d7700: 132099675 us(132099675 us!!!):  CONN_REQUEST:
>>> SOCKOPT ERR Connection refused -> 10.1.1.51 16193 - RETRYING... 5
>>> cn1075:dba3:f8d7700: 132099826 us(151 us):  CONN_REQUEST: SOCKOPT ERR
>>> Connection refused -> 10.1.1.51 16193 - RETRYING...4
>>> cn1075:dba3:f8d7700: 132099942 us(116 us):  CONN_REQUEST: SOCKOPT ERR
>>> Connection refused -> 10.1.1.51 16193 - RETRYING...3
>>> cn1075:dba3:f8d7700: 132100049 us(107 us):  CONN_REQUEST: SOCKOPT ERR
>>> Connection refused -> 10.1.1.51 16193 - RETRYING...2
>>> cn1075:dba3:f8d7700: 132100155 us(106 us):  CONN_REQUEST: SOCKOPT ERR
>>> Connection refused -> 10.1.1.51 16193 - RETRYING...1
>>> cn1075:dba3:f8d7700: 132100172 us(17 us): dapl_evd_conn_cb() unknown
>>> event 0x0
>>>
>>> ----
>>>
>>> Rob, I guess i will need to look into the io methods you listed.
>>>
>>> Thanks for your time,
>>> Samrat.
>>>
>>>
>>> On Fri, Oct 17, 2014 at 10:00 PM, Rob Latham <robl@xxxxxxxxxxx
>>> <mailto:robl@xxxxxxxxxxx>> wrote:
>>>
>>>
>>>
>>>     On 10/17/2014 11:25 AM, Ed Hartnett wrote:
>>>
>>>         Unless things have changed since my day, it is possible to read
>>>         pnetcdf
>>>         files with the netCDF library. It must be built with
>>>         --enable-pnetcdf
>>>         and with-pnetcdf=/some/location, IIRC.
>>>
>>>
>>>     Ed!
>>>
>>>     In this case, Samrat Rao was using pnetcdf to create CDF-5 (giant
>>>     variable) formatted files.  To refresh your memory,  Argonne and
>>>     Northwestern developed this file format with UCARS's signoff, with
>>>     the understanding that we (ANL and NWU) would never expect UCAR to
>>>     add support for it unless we did the work.  I took a stab at it a
>>>     few years back and Wei-keng is taking a second crack at it right now.
>>>
>>>     the classic file formats CDF-1 and CDF-2 are fully inter-operable
>>>     between pnetcdf and netcdf.
>>>     ==rob
>>>
>>>
>>>
>>>         On Fri, Oct 17, 2014 at 6:33 AM, Samrat Rao
>>>         <samrat.rao@xxxxxxxxx <mailto:samrat.rao@xxxxxxxxx>
>>>         <mailto:samrat.rao@xxxxxxxxx <mailto:samrat.rao@xxxxxxxxx>>>
>>> wrote:
>>>
>>>              Hi,
>>>
>>>              I'm sorry for the late reply.
>>>
>>>              I have no classic/netcdf-3 datasets --- datasets are to be
>>>              generated. All my codes are also new.
>>>
>>>              Initially i tried with pnetcdf, wrote a few variables, but
>>>         found
>>>              that the format was CDF-5 which 'normal' netcdf would not
>>> read.
>>>
>>>              I also need to read some bits of netcdf data in Matlab, so
>>>         i thought
>>>              of sticking to the usual netcdf-4 compiled for parallel io.
>>>         It is
>>>              also likely that i will have to share my workload with
>>>         others in my
>>>              group and/or leave the code for future people to work on.
>>>
>>>              Does matlab read cdf-5 files?
>>>
>>>              So i preferred the usual netcdf. Rob, i hope you are not
>>>         annoyed.
>>>
>>>              But most of the above is for another day. Currently i am
>>> stuck
>>>              elsewhere.
>>>
>>>              With a less no of processors, 216, the single netcdf file
>>> gets
>>>              created (i create i single netcdf file for each variable),
>>>         but for
>>>              anything above that i get these errors:
>>>              NetCDF: Bad chunk sizes.
>>>              Not sure where these errors come from.
>>>
>>>              Then i shifted to dumping outputs from each processor in
>>> simple
>>>              binary --- this works till about 1500 processors. Above
>>>         this number
>>>              the code gets stuck and eventually aborts.
>>>
>>>              This issue is not new. My colleague too had problems with
>>>         running
>>>              his code on 1500+ procs.
>>>
>>>              Today i came to know that opening a large number of files
>>>         (each proc
>>>              writes 1 file) can overwhelm the system --- solving this
>>>         requires
>>>              more than rudimentary techniques of writing --- or
>>>         understanding the
>>>              system's inherent parameters/bottlenecks.
>>>
>>>              So netcdf is probably out of bounds for now --- will try
>>>         again if
>>>              the simple binary write from each processor gets sorted out.
>>>
>>>              Does anyone have any suggestion?
>>>
>>>              Thanks,
>>>              Samrat.
>>>
>>>
>>>              On Thu, Oct 2, 2014 at 7:52 PM, Rob Latham
>>>         <robl@xxxxxxxxxxx <mailto:robl@xxxxxxxxxxx>
>>>              <mailto:robl@xxxxxxxxxxx <mailto:robl@xxxxxxxxxxx>>> wrote:
>>>
>>>
>>>
>>>                  On 10/02/2014 01:24 AM, Samrat Rao wrote:
>>>
>>>                      Thanks for your replies.
>>>
>>>                      I estimate that i will be requiring approx 4000
>>>         processors
>>>                      and a total
>>>                      grid resolution of 2.5 billion for my F90 code. So
>>>         i need to
>>>                      think/understand which is better - parallel netCDF
>>>         or the
>>>                      'normal' one.
>>>
>>>
>>>                  There are a few specific nifty features in pnetcdf that
>>>         can let
>>>                  you get really good performance, but 'normal' netCDF is
>>>         a fine
>>>                  choice, too.
>>>
>>>                      Right now I do not know how to use parallel-netCDF.
>>>
>>>
>>>                  It's almost as simple as replacing every 'nf' call with
>>>         'nfmpi'
>>>                  but you will be just fine if you stick with UCAR
>>> netCDF-4
>>>
>>>                      Secondly, i hope that the netCDF-4 files created by
>>>         either
>>>                      parallel
>>>                      netCDF or the 'normal' one are mutually compatible.
>>> For
>>>                      analysis I will
>>>                      be extracting data using the usual netCDF library,
>>>         so in
>>>                      case i use
>>>                      parallel-netCDF then there should be no
>>>         inter-compatibility
>>>                      issues.
>>>
>>>
>>>                  For truly large variables, parallel-netcdf introduced,
>>>         with some
>>>                  consultation from the UCAR folks, a 'CDF-5' file
>>>         format.  You
>>>                  have to request it explicitly, and then in that one
>>>         case you
>>>                  would have a pnetcdf file that netcdf tools would not
>>>         understand.
>>>
>>>                  In all other cases, we work hard to keep pnetcdf and
>>>         "classic"
>>>                  netcdf compatible.  UCAR NetCDF has the option for an
>>>         HDF5-based
>>>                  backend -- and in fact it's not an option if you want
>>>         parallel
>>>                  I/O with NetCDF-4 -- is not compatible with
>>>         parallel-netcdf.  By
>>>                  now, your analysis tools surely are updated to
>>>         understand the
>>>                  new HDF5-based backend?
>>>
>>>                  I suppose it's possible you've got some 6 year old
>>>         analysis tool
>>>                  that does not understand NetCDF-4's HDF5-based file
>>> format.
>>>                  Parallel-netcdf would allow you to simulate with
>>>         parallel i/o
>>>                  and produce a classic netCDF file.  But I would be
>>>         shocked and a
>>>                  little bit angry if that was actually a good reason to
>>> use
>>>                  parallel-netcdf in 2014.
>>>
>>>
>>>                  ==rob
>>>
>>>
>>>                  --
>>>                  Rob Latham
>>>                  Mathematics and Computer Science Division
>>>                  Argonne National Lab, IL USA
>>>
>>>
>>>
>>>
>>>              --
>>>
>>>              Samrat Rao
>>>              Research Associate
>>>              Engineering Mechanics Unit
>>>              Jawaharlal Centre for Advanced Scientific Research
>>>              Bangalore - 560064, India
>>>
>>>              _________________________________________________
>>>              netcdfgroup mailing list
>>>         netcdfgroup@xxxxxxxxxxxxxxxx
>>>         <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>>>         <mailto:netcdfgroup@unidata.__ucar.edu
>>>         <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>>
>>>              For list information or to unsubscribe,  visit:
>>>         http://www.unidata.ucar.edu/__mailing_lists/
>>>         <http://www.unidata.ucar.edu/mailing_lists/>
>>>
>>>
>>>
>>>     --
>>>     Rob Latham
>>>     Mathematics and Computer Science Division
>>>     Argonne National Lab, IL USA
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Samrat Rao
>>> Research Associate
>>> Engineering Mechanics Unit
>>> Jawaharlal Centre for Advanced Scientific Research
>>> Bangalore - 560064, India
>>>
>>
>> --
>> Rob Latham
>> Mathematics and Computer Science Division
>> Argonne National Lab, IL USA
>>
>
>


-- 

Samrat Rao
Research Associate
Engineering Mechanics Unit
Jawaharlal Centre for Advanced Scientific Research
Bangalore - 560064, India
References:
- [netcdfgroup] NetCDF for parallel usage
  - From: Samrat Rao
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Rob Latham
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Samrat Rao
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Ed Hartnett
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Rob Latham
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Samrat Rao
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Rob Latham
- Re: [netcdfgroup] NetCDF for parallel usage
  - From: Ed Hartnett
2014 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: