Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?

To: Heiko Klein <Heiko.Klein@xxxxxx>
Subject: Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
From: Wei-keng Liao <wkliao@xxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 21 Sep 2015 20:24:02 -0500

Hi, Heiko

Parallel I/O to the classical netCDF format is supported by netCDF through 
PnetCDF underneath.
It allows you to write concurrently to a single shared file from multiple MPI 
processes.
Of course, you will have to build PnetCDF first and then build netCDF with 
--enable-pnetcdf configure option.

Your netCDF program does not need much changes to make use this feature. All 
you have to
do is the followings.
1. call nc_create_par() instead of nc_create()
2. add NC_PNETCDF to the create mode argument of nc_create_par
3. call nc_var_par_access(ncid, varid, NC_COLLECTIVE) after nc_enddef to enable 
collective I/O mode

There are a couple example codes available in this URL.
http://cucis.ece.northwestern.edu/projects/PnetCDF/#InteroperabilityWithNetCDF4

There are instructions in each example file for building netCDF with PnetCDF.
For downloading PnetCDF, please see 
http://cucis.ece.northwestern.edu/projects/PnetCDF/download.html

Wei-keng

On Sep 21, 2015, at 9:14 AM, Heiko Klein wrote:

> Hi Nick,
> 
> yes, they are all writing to the same file - we want to have one file at
> the end.
> 
> I've been scanning through the source-code of netcdf3. I guess the
> problem of the partly written sections is caused by the translation of
> the nc_put_vara calls to internal pages, and the from the internal pages
> to disk. And eventually, the internal pages are not aligned with my
> nc_put_vara calls, so even when the region of nc_put_vara doesn't
> overlap between concurrent calls, the internal pages do? Is there a way
> to enforce proper alignment? I see nc__enddef has several align parameters.
> 
> 
> I'm aware that concurrent writes are not officially supported by the
> netcdf-library. But IT-infrastructure has changed a lot since the start
> of the netcdf-library and systems are nowadays highly parallelized, both
> on CPU and also in IO/filesystems. I'm trying to find a way to allow for
> simple parallelization. Having many output-files from a model is risky
> for data-consistency - so I would like to avoid it without sacrificing
> to much speed.
> 
> Best regards,
> 
> Heiko
> 
> 
> On 2015-09-21 15:18, Nick Papior wrote:
>> So, are they writing to the same files?
>> 
>> I.e. job1 writes a(:,1) to test.nc <http://test.nc> and job2 writes
>> a(:,2) to test.nc <http://test.nc>?
>> Because that is not allowed.
>> 
>> 2015-09-21 15:13 GMT+02:00 Heiko Klein <Heiko.Klein@xxxxxx
>> <mailto:Heiko.Klein@xxxxxx>>:
>> 
>>    Hi,
>> 
>>    I'm trying to convert about 90GB of NWP data 4 times daily from grib to
>>    netcdf. The grib-files arrive as fast as the data can be downloaded from
>>    the HPC machines. They come by 10 files/forecast timestep.
>> 
>>    Currently, I manage to convert 1 file/forecast timestep and I would like
>>    to parallelize the conversion into independent jobs (i.e. neither MPI or
>>    OpenMP), with a theoretical performance increase of 10. The underlying
>>    IO system is fast enough to handle 10 jobs, and I have enough CPUs, but
>>    the concurrently written netcdf-files show data which is only written
>>    half to the disk, or mixed with other slices.
>> 
>>    What I do is create a _FILL_VALUE 'template' file, containing all
>>    definitions before the NWP job runs. When a new set of files arrives,
>>    the data is put to the respective data-slices which don't have any
>>    overlap, there is never a redefine, only functions like: nc_put_vara_*
>>    with different slices.
>> 
>>    Since the nc_put_vara_* calls are non-overlapping, I hoped that this
>>    type of concurrent write would work - but it doesn't. Is my idea really
>>    so bad to write data in parallel (e.g. there are internal buffers which
>>    are rewritten)? Any ideas how to improve the conversion process?
>> 
>>    Best regards,
>> 
>>    Heiko
>> 
>>    _______________________________________________
>>    netcdfgroup mailing list
>>    netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>>    For list information or to unsubscribe,  visit:
>>    http://www.unidata.ucar.edu/mailing_lists/
>> 
>> 
>> 
>> 
>> -- 
>> Kind regards Nick
> 
> -- 
> Dr. Heiko Klein                   Norwegian Meteorological Institute
> Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
> http://www.met.no                 0313 Oslo NORWAY
> 
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/

Follow-Ups:
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein

References:
- [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Nick Papior
- Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  - From: Heiko Klein

2015 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: