I've seen this performance problem as well when I tried to convert a
ASCII file with over 1.7 million lines of data to NetCDF
(to a variable with unlimited dimension). First I tried to write each sample,
one at a time. This went badly towards the end of the file.
Then I tried to read blocks of data and write blocks of data. That works
very well. It appears to me that the size of the block you write at once
does not matter too much (at this very moment I'm trying it with 100 samples
at once and it writes just as quickly at the end as at the beginning of the
file).
This reading and writing of blocks gives you some extra work
with respect to bookkeeping where you are in the output array, but this
shouldn't be too difficult.
By the way, I did this on an HP9000. Haven't tried it on Linux (it might be
system-specific of it has something to do with buffering in the file system)
Regards,
Arnold Moene
PS: so my write statement looks something like:
outfile.variables['time'][n:n+nlines] = Atime[:nlines]
where Atime is the temporary container for the data.
On Monday 10 September 2001 19:23, you wrote:
> I have a problem that my netcdf write performance degrades as my file size
> increases. I have several variables with the unlimited dimension, and each
> timestep of my simulation I write out those variables, presumably to the
> end of the netcdf file. I am not manipulating any dimension attributes or
> accessing the file anywhere but at the end (in theory). At the beginning
> of my simulation, the netcdf writes are fast and the code runs normally.
> As the simulation proceedes, however, the netcdf write call takes longer
> and longer, eventually overwhelming the simulation to dominate the
> processor. It feels like the whole file is read/manipulated on each
> timestep, even though it shouldn't be.
>
> I am using Konrad Hinson's Python wrappers for netcdf on a Pentium III
> cluster running Redhat Linux 7.1 (and lam 6.5.2 for MPI, but that shouldn't
> matter here), and netcdf 3.5-beta6. I have corresponded with Konrad about
> this already, and he has not seen this problem before and thinks that the
> Python wrapper should be consistently fast (and so do I, looking at the
> code). The call he uses to write the data is ncvarputg() (from the old
> NetCDF API, right?).
>
> A simple way for me to demonstrate the problem is to write out the data at
> a constant value of the unlimited dimension, instead of incrementing it by
> one each time. If I always write, for example, to time = 5 (time is the
> unlimited dimension), then performance is consistent and fast. If I write
> to frame 5000, for instance, performance is consistent and slow.
>
> Thanks for any insight,
> John
--
------------------------------------------------------------------------
Arnold F. Moene tel: +31 (0)317 482109
Meteorology and Air Quality Group fax: +31 (0)317 482811
Wageningen Agricultural University e-mail: afmoene@xxxxxxxxxxxxxx
Duivendaal 2 url: http://www.met.wau.nl
6701 AP Wageningen
The Netherlands
------------------------------------------------------------------------