Hi Dennis,
I agree with you that your proposed slicing strategy is what we most
often use. Since the input-data often is weather-data and therefore
grib-related, the 'grib-strategy' with c=m-1, i.e. 2-d arrays is our
default.
We have a few exceptions to this strategy.
a) global high-resolution datasets, tiling strategy
Since most of our reads are only interested in our region, we chunk the
world into a few (4x3 or 5x2) tiles, which gives us usually 4x faster IO
on gzipped chunks.
b) timeseries strategy (this is not operational, just testing)
For serving point-timeseries of weather data to the public, we rechunk
the files to 2x2 or 4x4 tiles in x/y direction, and make the time-chunk
as large possible.
In most cases, we don't chunk per variable but per dimension.
About the usefulness of your approach: It is not as flexible as the old
approach, so point a) and b) aren't covered. It would be a nice
simplification if one could easily set a chunking strategy like in
netcdf-java, e.g. GRIB_CHUNK_STRATEGY, or,
"COMPLETE_RIGHT_DIMENSIONS_CHUNK_STRATEGY, 2". I prefer to set c from
the right, rather than the left, since I often have (time,y,x),
(time,z,y,x) and (time,ensemble,z,y,x) variables in the same file and
it's the rightmost part which is the same.
Best regards,
Heiko
On 2017-05-15 21:29, dmh@xxxxxxxx wrote:
> I am soliciting opinions about an alternate way to specify chunking
> for netcdf files. If you are not familiar with chunking, then
> you probably can ignore this message.
>
> Currently, one species a per-dimension decomposition that
> together determine how a the data for a variable is decomposed
> into chunks. So e.g. if I have variable (pardon the shorthand notation)
> x[d1=8,d2=12]
> and I say d1 is chunked 4 and d2 is chunked 4, then x will be decomposed
> into 6 chunks (8/4 * 12/4).
>
> I am proposing this alternate. Suppose we have
> x[d1,d2,...dm]
> And we specify a position 1<=c<m
> Then the idea is that we create chunks of size
> d(c+1) * d(c+2) *...dm
> There will be d1*d2*...dc such chunks.
> In other words, we split the set of dimensions at some point (c)
> and create the chunks based on that split.
>
> The claim is that for many situations, the leftmost dimensions
> are what we want to iterate over: e.g. time; and we then want
> to read all of the rest of the data associated with that time.
>
> So, my question is: is such a style of chunking useful?
>
> If this is not clear, let me know and I will try to clarify.
> =Dennis Heimbigner
> Unidata
>
>
>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
--
Dr. Heiko Klein Norwegian Meteorological Institute
Tel. + 47 22 96 32 58 P.O. Box 43 Blindern
http://www.met.no 0313 Oslo NORWAY