Thank you for the advice! Your English is just fine (much better than my
French). As you suggest, I think I will open up the file with a single
processor, define any missing variables, and then close the file (flushing
all data to disk). Then all of the processors can open the file, read it,
and proceed as normal. I think that will work the best.
Thank you,
Howard Salis
On Mon, 7 Feb 2005, Philippe Poilbarbe wrote:
> Howard Salis a écrit :
>
> > Hello all,
> >
> > Does anyone have any experience in reading and defining new
> > NetCDF variables midway through a multiple processor MPI program? I've
> > experienced a very strange bug:
> > I have a MPI program where each processor reads in data from multiple
> > NetCDF variables. If a variable is missing, one of the processors then
> > defines the new variable while the other processors halt. Once the
> > variable is defined, all of the processors move forward in the
> > program. If the newly defined variable is small, this works with no
> > problems. However, if the definition of the new variables takes a
> > considerable amount of time, then the other processors actually
> > _forget_ the data they previously read in. How weird is that?
> > I am using the NO_Fill mode to define the variables, but it still
> > takes a long time. The variable could be huge, though (500+ MB).
> >
> > There's an easy-to-implement solution (define first, then read),
> > but I was curious if anyone has had a similar problem and what causes it?
> >
> > ...
>
> I don't know if the netCDF library is MPI safe (each processor may have
> its own cache of the metadata and there is no 'lock while redefining')
> For the long time it takes for defining a variable even if it is not
> filled, it may depend on how your data are defined/organized.
>
> When a new variable is added, it's description have to be put in the
> metadata which resides at the begining of the file. If the metadata have
> to grow, during the call to nc_enddef (or nf_enddef or any language
> variant), all the bytes of the data themselves have to move in order to
> give some more space at the beginning of the file.
>
> So as you add variables it takes longer and longer.
> The way we have found to avoid this is to define all variables
> /attributes (they also take some space) in one session
> nc_redef/nc_enddef, and doing so each time we can (i.e. when we know
> them at the beginning and not 'on the fly'). Some programs were modified
> to be able to know all they need in a netcdf file before they create it
> (or to do all structure modification in one pass). Then they cache some
> data (or recompute them) before writing. It was shorter to do that than
> adding one variable after another on big files (400Mb to 1.8Gb).
>
> Normally if your variable is the first one defined in the file it would
> be something like a flash (if not filled), but if there is already one
> big variable (or many small) the long time can result from this reason.
>
> For the MPI access, if each processor knows an another one have created
> a variable (since you say they are waiting for this), they could
> close/reopen the file to reload all the metadata (but only when the
> variable creation has completed, i.e., when nc_enddef has finished its
> job). It it the same as concurrent access to a file from many programs.
>
> Expecting this can help.
>
> Ph.P.
>
> PS: My english is probably not as good as I would like :-)
>
> --
> Philippe Poilbarbe CLS Space Oceanography Group
>
> mailto:Philippe.Poilbarbe@xxxxxx
> phoneto:+33(5)61394727
>
> Parc technologique du canal
> 8-10, Rue Hermes
> 31520 Ramonville St-Agne
> France
>
>