Re: [netcdfgroup] nccopy -c does not rechunk properly (4.3.1.1)

Hi Russ,

On Thu, Feb 27, 2014 at 2:38 PM, Russ Rew <russ@xxxxxxxxxxxxxxxx> wrote:

>   #define CHUNK_THRESHOLD (8192)   /* variables with fewer bytes don't get
> chunked */
>
> The intent of the CHUNK_THRESHOLD minimum is to not create chunks
> smaller than a physical disk block, as an I/O optimization, because
> attempting to read a smaller chunk will still cause a whole disk block
> to be read.


So I take it 8k is a reasonable expectation for disk cache these days?

But this is a great tidbit -- I'm working on code to write data in the
"new" UGRID standard:

https://github.com/ugrid-conventions/ugrid-conventions

And the code:
https://github.com/pyugrid/pyugrid

And I wanted to set some reasonable defaults for chunking. In this case,
you tend to have a lot of large 1-d arrays, and most of the discussions
I've seen are about multi-dimensional arrays. It sounds like I should set a
minimum chunk size of 8k bytes then.


>   However, I think for the next
> release, we should lower the default threshold to 512 bytes, and
> document the behavior.
>

Document -- of course, but why lower the threshold?

Though maybe the thresholds are good for defaults, but if a user asks for
smaller than optimum chunk sizes, maybe that's what they should get.

-Chris

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx
  • 2014 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: