Re: Performance problem with large files

Martin Dix <martin.dix@xxxxxxxxxxxx> wrote:

> For simplicity call the unlimited dimension t. A netcdf file stores
> all the data for t=1, then for t=2 etc. Your description of the
> array indices means that each subarray is scattered through the
> entire file and requires accessing almost every file block. Things
> should be a lot better if you write subarrays of 8000 x 3 x 1 or if
> you can't do this, rearrange the file so that the 8000 dimension is
> unlimited rather than the 16000 dimension.

But I must read along both major dimensions, depending on the type of
analysis I am doing. From your explanation it seems that one the two
access types will always be very slow. Shouldn't it be possible for
the netCDF library to organize the data in such a way that a scan
along any dimension is doable with acceptable efficiency? For example,
each contiguous file block could correspond to a subarray of
approximately equal extent along each dimension.

Could I gain anything from not using an unlimited dimension? In some
cases I know the final size before creating the file, and in others
it might be worth to make a fixed-size copy before some lengthy analysis.
-- 
-------------------------------------------------------------------------------
Konrad Hinsen                            | E-Mail: hinsen@xxxxxxxxxxxxxxx
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron                       | Fax:  +33-2.38.63.15.17
45071 Orleans Cedex 2                    | Deutsch/Esperanto/English/
France                                   | Nederlands/Francais
-------------------------------------------------------------------------------

  • 1999 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: