Re: performance degrades with filesize

To: john@xxxxxxxxxxxxxxx
Subject: Re: performance degrades with filesize
From: Ethan Alpert <ethan@xxxxxxxxxxxx>
Date: Mon, 10 Sep 2001 12:42:47 -0600 (MDT)


I don't know anything about the underlying format and it's implementation but
I have experienced the performance degradation you are describing. Growing
the unlimited dimension is the cause. I can't be certain but it seems
like the entire file is rewritten when the unlimited dimension increases.
Also if I have 5 variables all using the unlimited dimension and I increase
the unlimited dimension the file size increase 5 times. This means that
something is going through the *entire* file every time to make more
space for each of the variables. 

My suggestion is not use the unlimited dimension to create your files. If
at all possible predefine all variables and attributes x in one define
mode. If you do this you won't incure this penalty of re-writting the file
every time you grow a dimension. 

        -ethan
> 
> I have a problem that my netcdf write performance degrades as my file size
> increases.  I have several variables with the unlimited dimension, and each
> timestep of my simulation I write out those variables, presumably to the
> end of the netcdf file.  I am not manipulating any dimension attributes or
> accessing the file anywhere but at the end (in theory).  At the beginning
> of my simulation, the netcdf writes are fast and the code runs normally.
> As the simulation proceedes, however, the netcdf write call takes longer
> and longer, eventually overwhelming the simulation to dominate the
> processor.  It feels like the whole file is read/manipulated on each
> timestep, even though it shouldn't be.
> 
> I am using Konrad Hinson's Python wrappers for netcdf on a Pentium III
> cluster running Redhat Linux 7.1 (and lam 6.5.2 for MPI, but that shouldn't
> matter here), and netcdf 3.5-beta6.  I have corresponded with Konrad about
> this already, and he has not seen this problem before and thinks that the
> Python wrapper should be consistently fast (and so do I, looking at the
> code).  The call he uses to write the data is ncvarputg() (from the old
> NetCDF API, right?).
> 
> A simple way for me to demonstrate the problem is to write out the data at
> a constant value of the unlimited dimension, instead of incrementing it by
> one each time.  If I always write, for example, to time = 5 (time is the
> unlimited dimension), then performance is consistent and fast.  If I write
> to frame 5000, for instance, performance is consistent and slow.
> 
> Thanks for any insight,
>        John
> 
> -- 
> John Galbraith                  email: jgalb@xxxxxxxx
> Los Alamos National Laboratory,   home phone: (505) 662-3849
>                                   work phone: (505) 665-6301
>

Follow-Ups:
- Re: performance degrades with filesize
  - From: Russ Rew
- Re: performance degrades with filesize
  - From: Steve Emmerson
- Re: performance degrades with filesize
  - From: John Galbraith

References:
- performance degrades with filesize
  - From: John Galbraith

2001 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: