RE: Performance problem with large files


> -----Original Message-----
> From: owner-netcdfgroup@xxxxxxxxxxxxxxxx
> [mailto:owner-netcdfgroup@xxxxxxxxxxxxxxxx]On Behalf Of Konrad Hinsen
> Sent: Monday, June 28, 1999 1:55 PM
> To: netcdfgroup@xxxxxxxxxxxxxxxx
> Subject: Re: Performance problem with large files
>
>
> Martin Dix <martin.dix@xxxxxxxxxxxx> wrote:
>
> > For simplicity call the unlimited dimension t. A netcdf file stores
> > all the data for t=1, then for t=2 etc. Your description of the
> > array indices means that each subarray is scattered through the
> > entire file and requires accessing almost every file block. Things
> > should be a lot better if you write subarrays of 8000 x 3 x 1 or if
> > you can't do this, rearrange the file so that the 8000 dimension is
> > unlimited rather than the 16000 dimension.
>
> But I must read along both major dimensions, depending on the type of
> analysis I am doing. From your explanation it seems that one the two
> access types will always be very slow. Shouldn't it be possible for
> the netCDF library to organize the data in such a way that a scan
> along any dimension is doable with acceptable efficiency? For example,
> each contiguous file block could correspond to a subarray of
> approximately equal extent along each dimension.


Netcdf can't alter the physics of disk drives and memory chips (although
sometimes the OS caching strategy can be tweaked). However, you can't
optimize more than one thing at a time; efficient data reading of large
files in today's computers depends on data locality. Its possible to
store an array in blocks as you suggest (netcdf does not, although I
think hdf can), but then that is an optimization for that access
pattern, not the acesss pattern that reads across all elements in one
dimension.

> Could I gain anything from not using an unlimited dimension? In some
> cases I know the final size before creating the file, and in others
> it might be worth to make a fixed-size copy before some
> lengthy analysis.

I believe that it wont matter whether the dimension is fixed or
unlimited. Its the ordering of the dimensions that determines how the
data is stored.


  • 1999 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: