On Tue, May 23, 2017 at 2:30 PM, Ed Hartnett <edwardjameshartnett@xxxxxxxxx>
wrote:
> On a related note, many users have complained of very poor performance on
> files with a chunksize of 1 in the record dimension, when they are using
> the data in other ways that reading one lat-lon grid at a time. Naturally,
> this is understandable. To even get one value in the level, the entire
> lat-lon grid must be read.
>
This is the inherent problem with chunking -- a good chunking strategy
completely depends on the access pattern.
> So perhaps having all the non-1 dimensions use a chunksize of their
> fullest extent is not such a good idea.
>
exactly -- for defaults, I think it's better that full extend chunks NOT be
used.
I did some experiment a while back ,and wildly too small or large chunks
had a big impact on performance, but it was not that sensitive to mid-size
chunks.
So if, for example, you have a 10kx10k lat-lon grid, you probably don't
want to use 1,10k,10k chunks
Better to use: 1, 1k, 1k, chunks. I'd bet that it would be almost as fast
when accessing the full grid at a given time, but much faster when
accessing only a small part of the grid.
or maybe (10, 100, 100) would be best -- much better for a time series at a
single point, and still probably not too slow for the whole grid (I found
1k chunks not too bad on that particular machine anyway...)
-CHB
>>
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@xxxxxxxx