Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?

  • To: Heiko Klein <Heiko.Klein@xxxxxx>
  • Subject: Re: [netcdfgroup] Concurrent writes to netcdf3, what goes wrong?
  • From: Nick Papior <nickpapior@xxxxxxxxx>
  • Date: Mon, 21 Sep 2015 16:47:01 +0200
Again

2015-09-21 16:14 GMT+02:00 Heiko Klein <Heiko.Klein@xxxxxx>:

> Hi Nick,
>
> yes, they are all writing to the same file - we want to have one file at
> the end.
>
> I've been scanning through the source-code of netcdf3. I guess the
> problem of the partly written sections is caused by the translation of
> the nc_put_vara calls to internal pages, and the from the internal pages
> to disk. And eventually, the internal pages are not aligned with my
> nc_put_vara calls, so even when the region of nc_put_vara doesn't
> overlap between concurrent calls, the internal pages do? Is there a way
> to enforce proper alignment? I see nc__enddef has several align parameters.
>
>
> I'm aware that concurrent writes are not officially supported by the
> netcdf-library. But IT-infrastructure has changed a lot since the start
> of the netcdf-library and systems are nowadays highly parallelized, both
> on CPU and also in IO/filesystems. I'm trying to find a way to allow for
> simple parallelization. Having many output-files from a model is risky
> for data-consistency - so I would like to avoid it without sacrificing
> to much speed.
>
The library and the infrastructure are not correlated. To gain performance,
either use MPI, or take each timestep/file separately using embarrassingly
parallel run.

>
> Best regards,
>
> Heiko
>
>
> On 2015-09-21 15:18, Nick Papior wrote:
> > So, are they writing to the same files?
> >
> > I.e. job1 writes a(:,1) to test.nc <http://test.nc> and job2 writes
> > a(:,2) to test.nc <http://test.nc>?
> > Because that is not allowed.
> >
> > 2015-09-21 15:13 GMT+02:00 Heiko Klein <Heiko.Klein@xxxxxx
> > <mailto:Heiko.Klein@xxxxxx>>:
> >
> >     Hi,
> >
> >     I'm trying to convert about 90GB of NWP data 4 times daily from grib
> to
> >     netcdf. The grib-files arrive as fast as the data can be downloaded
> from
> >     the HPC machines. They come by 10 files/forecast timestep.
> >
> >     Currently, I manage to convert 1 file/forecast timestep and I would
> like
> >     to parallelize the conversion into independent jobs (i.e. neither
> MPI or
> >     OpenMP), with a theoretical performance increase of 10. The
> underlying
> >     IO system is fast enough to handle 10 jobs, and I have enough CPUs,
> but
> >     the concurrently written netcdf-files show data which is only written
> >     half to the disk, or mixed with other slices.
> >
> >     What I do is create a _FILL_VALUE 'template' file, containing all
> >     definitions before the NWP job runs. When a new set of files arrives,
> >     the data is put to the respective data-slices which don't have any
> >     overlap, there is never a redefine, only functions like:
> nc_put_vara_*
> >     with different slices.
> >
> >     Since the nc_put_vara_* calls are non-overlapping, I hoped that this
> >     type of concurrent write would work - but it doesn't. Is my idea
> really
> >     so bad to write data in parallel (e.g. there are internal buffers
> which
> >     are rewritten)? Any ideas how to improve the conversion process?
> >
> >     Best regards,
> >
> >     Heiko
> >
> >     _______________________________________________
> >     netcdfgroup mailing list
> >     netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
> >     For list information or to unsubscribe,  visit:
> >     http://www.unidata.ucar.edu/mailing_lists/
> >
> >
> >
> >
> > --
> > Kind regards Nick
>
> --
> Dr. Heiko Klein                   Norwegian Meteorological Institute
> Tel. + 47 22 96 32 58             P.O. Box 43 Blindern
> http://www.met.no                 0313 Oslo NORWAY
>



-- 
Kind regards Nick
  • 2015 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: