Re: performance degrades with filesize

I've followed the discussion on this subject today.  I've not looked at
this issue in eons.  I recall that this certainly was a problem with early
netCDF releases, apparently from forcing long seeks in big files (assuming
I remembered correctly).  At that time, I tended to work around the problem
by keeping relatively few time steps in each file mapped to the unlimited
dimension, and then creating multiple files.  The files would have multiple
(static) dimensions and variables, but would have fast access.  The
applications then took care of the bookkeeping to treat the set of files as
a single data set.  I still use that approach today, having not revisited
how to do it better with more recent versions of netCDF.

--------------------------
Lloyd A. Treinish
Deep Computing Institute
IBM Thomas J. Watson Research Center
P. O. Box 218
Yorktown Heights, NY 10598
914-945-2770 (voice)
914-945-3434 (facsimile)
lloydt@xxxxxxxxxx
http://www.research.ibm.com/people/l/lloydt/
http://www.research.ibm.com/weather


John Galbraith <john@xxxxxxxxxxxxxxx>@unidata.ucar.edu on 09/10/2001
03:26:23 PM

Please respond to John Galbraith <john@xxxxxxxxxxxxxxx>

Sent by:  owner-netcdfgroup@xxxxxxxxxxxxxxxx


cc:   Ethan Alpert <ethan@xxxxxxxxxxxx>, john@xxxxxxxxxxxxxxx,
      netcdfgroup@xxxxxxxxxxxxxxxx



>>>>> "Steve" == Steve Emmerson <steve@xxxxxxxxxxxxxxxx> writes:

    >> ... I can't be certain but it seems like the entire file is
    >> rewritten when the unlimited dimension increases.

    Steve> The C implementation from the Unidata Program Center of the
    Steve> netCDF API *does not* rewrite the entire netCDF file when the
    Steve> unlimited dimension is increased -- effectively, the file is
    Steve> simply appended to.

That is why I say "seems", because the write time "seems" to be
proportional to the size of the file.  I have no evidence that the file is
actually being copied.  In fact, I would be surprised if it was and would
suspect that I am calling the netcdf library incorrectly.  I probably am
calling it incorrectly, based on this trouble I am having, but I don't know
what my problem is yet.

    Steve> I don't know about the Python interface.

The Python interface basically just converts the python array slices to
netcdf arguments and calls ncvarputg().  (Python arrays are contiguous
values).  The Python module never actually touches the netcdf file except
through that call to ncvarputg().  Even if the python wrapper was deathly
slow, it would be the same deathly slow interval every write and it
wouldn't be dependent on the file size.

Maybe there is some issue with calling the old netcdf API?

Thanks,
     John


--
John Galbraith                  email: john@xxxxxxxxxxxxxxx
Los Alamos National Laboratory,   home phone: (505) 662-3849
                                  work phone: (505) 665-6301




  • 2001 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: