Re: [netcdfgroup] Alternate chunking specification

On Tue, May 23, 2017 at 2:30 PM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx>

> On a related note, many users have complained of very poor performance on
> files with a chunksize of 1 in the record dimension, when they are using
> the data in other ways that reading one lat-lon grid at a time. Naturally,
> this is understandable. To even get one value in the level, the entire
> lat-lon grid must be read.

This is the inherent problem with chunking -- a good chunking strategy
completely depends on the access pattern.

> So perhaps having all the non-1 dimensions use a chunksize of their
> fullest extent is not such a good idea.

exactly -- for defaults, I think it's better that full extend chunks NOT be

I did some experiment a while back ,and wildly too small or large chunks
had a big impact on performance, but it was not that sensitive to mid-size

So if, for example, you have a 10kx10k lat-lon grid, you probably don't
want to use 1,10k,10k chunks

Better to use: 1, 1k, 1k, chunks. I'd bet that it would be almost as fast
when accessing the full grid at a given time, but much faster when
accessing only a small part of the grid.

or maybe (10, 100, 100) would be best -- much better for a time series at a
single point, and still probably not too slow for the whole grid (I found
1k chunks not too bad on that particular machine anyway...)



