I have a problem that my netcdf write performance degrades as my file size
increases. I have several variables with the unlimited dimension, and each
timestep of my simulation I write out those variables, presumably to the
end of the netcdf file. I am not manipulating any dimension attributes or
accessing the file anywhere but at the end (in theory). At the beginning
of my simulation, the netcdf writes are fast and the code runs normally.
As the simulation proceedes, however, the netcdf write call takes longer
and longer, eventually overwhelming the simulation to dominate the
processor. It feels like the whole file is read/manipulated on each
timestep, even though it shouldn't be.
I am using Konrad Hinson's Python wrappers for netcdf on a Pentium III
cluster running Redhat Linux 7.1 (and lam 6.5.2 for MPI, but that shouldn't
matter here), and netcdf 3.5-beta6. I have corresponded with Konrad about
this already, and he has not seen this problem before and thinks that the
Python wrapper should be consistently fast (and so do I, looking at the
code). The call he uses to write the data is ncvarputg() (from the old
NetCDF API, right?).
A simple way for me to demonstrate the problem is to write out the data at
a constant value of the unlimited dimension, instead of incrementing it by
one each time. If I always write, for example, to time = 5 (time is the
unlimited dimension), then performance is consistent and fast. If I write
to frame 5000, for instance, performance is consistent and slow.
Thanks for any insight,
John
--
John Galbraith email: jgalb@xxxxxxxx
Los Alamos National Laboratory, home phone: (505) 662-3849
work phone: (505) 665-6301