Hi Nick,
yes, they are all writing to the same file - we want to have one file at
the end.
I've been scanning through the source-code of netcdf3. I guess the
problem of the partly written sections is caused by the translation of
the nc_put_vara calls to internal pages, and the from the internal pages
to disk. And eventually, the internal pages are not aligned with my
nc_put_vara calls, so even when the region of nc_put_vara doesn't
overlap between concurrent calls, the internal pages do? Is there a way
to enforce proper alignment? I see nc__enddef has several align parameters.
I'm aware that concurrent writes are not officially supported by the
netcdf-library. But IT-infrastructure has changed a lot since the start
of the netcdf-library and systems are nowadays highly parallelized, both
on CPU and also in IO/filesystems. I'm trying to find a way to allow for
simple parallelization. Having many output-files from a model is risky
for data-consistency - so I would like to avoid it without sacrificing
to much speed.
Best regards,
Heiko
On 2015-09-21 15:18, Nick Papior wrote:
> So, are they writing to the same files?
>
> I.e. job1 writes a(:,1) to test.nc <http://test.nc> and job2 writes
> a(:,2) to test.nc <http://test.nc>?
> Because that is not allowed.
>
> 2015-09-21 15:13 GMT+02:00 Heiko Klein <Heiko.Klein@xxxxxx
> <mailto:Heiko.Klein@xxxxxx>>:
>
> Hi,
>
> I'm trying to convert about 90GB of NWP data 4 times daily from grib to
> netcdf. The grib-files arrive as fast as the data can be downloaded from
> the HPC machines. They come by 10 files/forecast timestep.
>
> Currently, I manage to convert 1 file/forecast timestep and I would like
> to parallelize the conversion into independent jobs (i.e. neither MPI or
> OpenMP), with a theoretical performance increase of 10. The underlying
> IO system is fast enough to handle 10 jobs, and I have enough CPUs, but
> the concurrently written netcdf-files show data which is only written
> half to the disk, or mixed with other slices.
>
> What I do is create a _FILL_VALUE 'template' file, containing all
> definitions before the NWP job runs. When a new set of files arrives,
> the data is put to the respective data-slices which don't have any
> overlap, there is never a redefine, only functions like: nc_put_vara_*
> with different slices.
>
> Since the nc_put_vara_* calls are non-overlapping, I hoped that this
> type of concurrent write would work - but it doesn't. Is my idea really
> so bad to write data in parallel (e.g. there are internal buffers which
> are rewritten)? Any ideas how to improve the conversion process?
>
> Best regards,
>
> Heiko
>
> _______________________________________________
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>
>
>
>
> --
> Kind regards Nick
--
Dr. Heiko Klein Norwegian Meteorological Institute
Tel. + 47 22 96 32 58 P.O. Box 43 Blindern
http://www.met.no 0313 Oslo NORWAY