Konrad,
> I think I found what slows it down. In my Python interface, every
> write operation to an array with an unlimited dimension is followed by
> a call to nc_inq_dimlen() in order to keep the internal dimension
> count up to date. I had assumed that this call would be cheap and
> independent of the file size, since it ought to be no more than an
> access to some data structure that the netCDF library should keep in
> memory.
The implementation of nc_inq_dimlen() is a bit more complicated, since
changes were added to support multiprocessing on Cray T3E systems
(September 1999). However, it still should take constant time,
independent of the file size. So I was puzzled by your findings and
tried to reproduce them.
Since the Python, C++, and Fortran netCDF interfaces all use the C
interface to do the I/O, I tried to duplicate the reported performance
problem in the C interface. Translating the Python example to C and
running it still shows no apparent performance problem, even when I
add a call to nc_inq_dimlen() after each write operation.
Specifically, the following C program:
http://www.unidata.ucar.edu/packages/netcdf/jg1.c
accepts a single command-line argument for how many records to write,
creates a netCDF file with the same structure as John Galbraith's
Python script, and prints the time required to append each additional
batch of 5000 records to the initially empty file.
The times are nearly constant even when the file has grown to 1.2
Gbytes:
$ ./jg1 300000
record 5000: 1.010 secs
record 10000: 0.950 secs
record 15000: 0.940 secs
record 20000: 1.020 secs
record 25000: 0.930 secs
record 30000: 1.030 secs
record 35000: 1.080 secs
record 40000: 1.060 secs
record 45000: 1.040 secs
record 50000: 0.940 secs
...
record 240000: 1.160 secs
record 245000: 1.150 secs
record 250000: 1.130 secs
record 255000: 1.120 secs
record 260000: 1.180 secs
record 265000: 1.160 secs
record 270000: 1.140 secs
line 107 of jg1.c: No space left on device
If those reporting the problem could please compile and run this C
program on their systems and report the circumstances under which they
observe performance degrading with file size, or modify the program to
demonstrate the problem, that would assist us in determining the cause
of the problem and fixing it.
--Russ
P.S. I'm trying to continue doing my job in the face of the terrible
tragedies that occurred earlier today. It has been very difficult to
focus on work, but not working feels like giving in to those who would
demand that we instead focus on their acts of terror.