Howdy Dennis and All!
I applaud the effort to improve user-specification of chunking. It is a
topic that causes confusion to many users, in my experience.
However I'm not sure I understand your algorithm.
If I have dimids 0, 1, 2, which have dimlens NC_UNLIMITED, 30, and 50, and
I indicate that I want c = 1 (0-based index, as God and K&R intended), then
would I get chunksizes 1, 30, 50?
Thanks,
Ed Hartnett
On Tue, May 16, 2017 at 4:31 PM, Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:
> Okay, it sounds like you are NOT proposing any changes to the netcdf-4
> file format, or to the existing API functions. Good.
>
> You just asked for use cases DIFFERENT THAN 1,1,...1,di,dj,...dm. Here is
> one.
>
> My local agency's data portal has gridded data sets that are normally
> dimensioned (time, lat, lon) or (time, level, lat, lon). These are chunked
> 1,d2,d3 or 1,1,d3,d4 for normal access, which is very popular. These use
> cases could be served by your proposed alternate chunk spec method.
>
> However, some of our on-line applications serve long time series for
> single grid points. The normal chunking schemes like 1,d2,d3 prove to be
> unacceptably slow for grid point time series. "Compromise" chunking
> schemes were tested, and they did not seem to perform well enough.
>
> So we created mirror data sets which are chunked optimally for reading
> single grid points, e.g. (d1,1,1). These perform very well in live
> operation, and we think that the double storage is worthwhile.
>
> This is almost the same use case as Heiko Klein's second one, "b)
> timeseries strategy".
>
> --Dave A.
> NOAA/OAR/ESRL/PSD/CIRES
> Boulder, Colorado
>
>
> On Tue, May 16, 2017 at 1:56 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:
>
>> Note that I am proposing an second way to specify chunking on a variable.
>> I am not proposing to remove any existing functionality.
>>
>> But let me restate my question.
>> Question: what are some good use cases for having a chunking spec
>> that is different than
>> 1,1,...1,di,dj,...dm
>> where di is the full size of the ith dimension of the variable.
>> Heiko Klein has given a couple of good use cases, and I am looking for
>> more.
>> =Dennis
>>
>>
>> On 5/16/2017 1:30 PM, Dave Allured - NOAA Affiliate wrote:
>>
>>> Dennis,
>>>
>>> Are you saying that the original function nc_def_var_chunking will be
>>> kept intact, and there will be a new function that will simplify chunk
>>> setting for some data scenarios? You are not proposing any changes in the
>>> netcdf-4 file format?
>>>
>>> --Dave
>>>
>>>
>>> On Mon, May 15, 2017 at 1:29 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx> <
>>> dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:
>>>
>>> I am soliciting opinions about an alternate way to specify chunking
>>> for netcdf files. If you are not familiar with chunking, then
>>> you probably can ignore this message.
>>>
>>> Currently, one species a per-dimension decomposition that
>>> together determine how a the data for a variable is decomposed
>>> into chunks. So e.g. if I have variable (pardon the shorthand
>>> notation)
>>> x[d1=8,d2=12]
>>> and I say d1 is chunked 4 and d2 is chunked 4, then x will be
>>> decomposed
>>> into 6 chunks (8/4 * 12/4).
>>>
>>> I am proposing this alternate. Suppose we have
>>> x[d1,d2,...dm]
>>> And we specify a position 1<=c<m
>>> Then the idea is that we create chunks of size
>>> d(c+1) * d(c+2) *...dm
>>> There will be d1*d2*...dc such chunks.
>>> In other words, we split the set of dimensions at some point (c)
>>> and create the chunks based on that split.
>>>
>>> The claim is that for many situations, the leftmost dimensions
>>> are what we want to iterate over: e.g. time; and we then want
>>> to read all of the rest of the data associated with that time.
>>>
>>> So, my question is: is such a style of chunking useful?
>>>
>>> If this is not clear, let me know and I will try to clarify.
>>> =Dennis Heimbigner
>>> Unidata
>>>
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>