Re: [netcdfgroup] Alternate chunking specification

On Tue, May 23, 2017 at 2:30 PM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
wrote:

> On a related note, many users have complained of very poor performance on
> files with a chunksize of 1 in the record dimension, when they are using
> the data in other ways that reading one lat-lon grid at a time. Naturally,
> this is understandable. To even get one value in the level, the entire
> lat-lon grid must be read.
>

This is the inherent problem with chunking -- a good chunking strategy
completely depends on the access pattern.


> So perhaps having all the non-1 dimensions use a chunksize of their
> fullest extent is not such a good idea.
>

exactly -- for defaults, I think it's better that full extend chunks NOT be
used.

I did some experiment a while back ,and wildly too small or large chunks
had a big impact on performance, but it was not that sensitive to mid-size
chunks.

So if, for example, you have a 10kx10k lat-lon grid, you probably don't
want to use 1,10k,10k chunks

Better to use: 1, 1k, 1k, chunks. I'd bet that it would be almost as fast
when accessing the full grid at a given time, but much faster when
accessing only a small part of the grid.

or maybe (10, 100, 100) would be best -- much better for a time series at a
single point, and still probably not too slow for the whole grid (I found
1k chunks not too bad on that particular machine anyway...)

-CHB


>>
-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2017 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: