On Thu, Jan 9, 2014 at 3:28 PM, Charlie Zender <zender@xxxxxxx> wrote:
> I read with interest your discussion on chunking.
> I added Chris's suggestion to NCO's supported chunking options.
>
> NCO 4.4.0 now implements six "chunking maps"
> http://nco.sf.net/nco.html#cnk
>
nice!
One small doc comment:
"""
Unchunking
Definition: Unchunk all variables possible. The HDF5 storge layer requires
that record variables (i.e., variables that contain at least one record
dimension) must be chunked. Also variables that are compressed or use
checksums must be chunked.
"""
Unlimited dimensions must be chunked as well. Not sure if NCO preserves
those.
And some thoughts:
I may be mis-interpreting some of this (and not totally sure what a "record
dimension" is), but
"""
Chunksize Equals Dimension Size except Record Dimension
Definition: Chunksize equals dimension size except record dimension has
size one. Explicitly specify chunksizes for particular dimensions with
‘--cnk_dmn’ option.
cnk_map key values: ‘rd1’, ‘cnk_rd1’, ‘map_rd1’
Mnemonic: Record Dimension size 1
"""
if you had a 1-d variable of records, would that mean chunks equal the
record size? 'cause that would be way too small in the common case.
"""
Chunksize Lefter Product Matches Scalar Size Specified
Definition: The product of the chunksizes for each variable (approximately)
equals the size specified with the ‘--cnk_scl’ option. This is accomplished
by using dimension sizes as chunksizes for the rightmost (most rapidly
varying) dimensions, and then “flexing” the chunksize of the leftmost
(least rapidly varying) dimensions such that the product of all chunksizes
matches the specified size. All dimensions to the left of and including the
first record dimension define the left-hand side. This map was first
proposed by Chris Barker.
cnk_map key values: ‘lfp’, ‘cnk_lfp’, ‘map_lfp’
Mnemonic: LeFter Product
"""
That sounds good -- and thanks for the credit!
Not so clear from the amount of time I've spent reading that, but what
would be the default chunking for a 1-d unlimited variable? or a 2-d, with
one dimension very small (Nx3, for instance)?
Those were the use cases where the default chunking in netcdf4 killed us.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@xxxxxxxx