Hi Russ,
On Thu, Feb 27, 2014 at 2:38 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote:
> #define CHUNK_THRESHOLD (8192) /* variables with fewer bytes don't get
> chunked */
>
> The intent of the CHUNK_THRESHOLD minimum is to not create chunks
> smaller than a physical disk block, as an I/O optimization, because
> attempting to read a smaller chunk will still cause a whole disk block
> to be read.
So I take it 8k is a reasonable expectation for disk cache these days?
But this is a great tidbit -- I'm working on code to write data in the
"new" UGRID standard:
https://github.com/ugrid-conventions/ugrid-conventions
And the code:
https://github.com/pyugrid/pyugrid
And I wanted to set some reasonable defaults for chunking. In this case,
you tend to have a lot of large 1-d arrays, and most of the discussions
I've seen are about multi-dimensional arrays. It sounds like I should set a
minimum chunk size of 8k bytes then.
> However, I think for the next
> release, we should lower the default threshold to 512 bytes, and
> document the behavior.
>
Document -- of course, but why lower the threshold?
Though maybe the thresholds are good for defaults, but if a user asks for
smaller than optimum chunk sizes, maybe that's what they should get.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@xxxxxxxx