Howdy Dennis!
I think that what you propose is a very natural extension of the default
chunksizes when a record dimension is used. In that case, the record
dimension gets a chunksize of 1, and the other dimensions get chunksizes of
their full extend. So for dimensions time, lat, lon, that allows a
timestep, which is a full lat - lon grid to be one chunk.
What you propose is that instead of the record dimension getting a
chunksize of 1, other dimensions could also. So an array of time, level,
lat, and lon, could still get a chunk of one lat-lon grid, by specifying 1
for the time and level chunksizes.
I think that is a good idea.
On a related note, many users have complained of very poor performance on
files with a chunksize of 1 in the record dimension, when they are using
the data in other ways that reading one lat-lon grid at a time. Naturally,
this is understandable. To even get one value in the level, the entire
lat-lon grid must be read. So perhaps having all the non-1 dimensions use a
chunksize of their fullest extent is not such a good idea.
Keep on NetCDFing!!
Ed
On Tue, May 23, 2017 at 3:22 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:
> yes
>
> On 5/22/2017 9:08 AM, Ed Hartnett wrote:
>
>> Howdy Dennis and All!
>>
>> I applaud the effort to improve user-specification of chunking. It is a
>> topic that causes confusion to many users, in my experience.
>>
>> However I'm not sure I understand your algorithm.
>>
>> If I have dimids 0, 1, 2, which have dimlens NC_UNLIMITED, 30, and 50,
>> and I indicate that I want c = 1 (0-based index, as God and K&R intended),
>> then would I get chunksizes 1, 30, 50?
>>
>> Thanks,
>> Ed Hartnett
>>
>>
>>
>> On Tue, May 16, 2017 at 4:31 PM, Dave Allured - NOAA Affiliate <
>> dave.allured@xxxxxxxx <mailto:dave.allured@xxxxxxxx>> wrote:
>>
>> Okay, it sounds like you are NOT proposing any changes to the
>> netcdf-4 file format, or to the existing API functions. Good.
>>
>> You just asked for use cases DIFFERENT THAN 1,1,...1,di,dj,...dm.
>> Here is one.
>>
>> My local agency's data portal has gridded data sets that are
>> normally dimensioned (time, lat, lon) or (time, level, lat, lon).
>> These are chunked 1,d2,d3 or 1,1,d3,d4 for normal access, which is
>> very popular. These use cases could be served by your proposed
>> alternate chunk spec method.
>>
>> However, some of our on-line applications serve long time series for
>> single grid points. The normal chunking schemes like 1,d2,d3 prove
>> to be unacceptably slow for grid point time series. "Compromise"
>> chunking schemes were tested, and they did not seem to perform well
>> enough.
>>
>> So we created mirror data sets which are chunked optimally for
>> reading single grid points, e.g. (d1,1,1). These perform very well
>> in live operation, and we think that the double storage is worthwhile.
>>
>> This is almost the same use case as Heiko Klein's second one, "b)
>> timeseries strategy".
>>
>> --Dave A.
>> NOAA/OAR/ESRL/PSD/CIRES
>> Boulder, Colorado
>>
>>
>> On Tue, May 16, 2017 at 1:56 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>
>> <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:
>>
>> Note that I am proposing an second way to specify chunking on a
>> variable. I am not proposing to remove any existing functionality.
>>
>> But let me restate my question.
>> Question: what are some good use cases for having a chunking spec
>> that is different than
>> 1,1,...1,di,dj,...dm
>> where di is the full size of the ith dimension of the variable.
>> Heiko Klein has given a couple of good use cases, and I am
>> looking for
>> more.
>> =Dennis
>>
>>
>> On 5/16/2017 1:30 PM, Dave Allured - NOAA Affiliate wrote:
>>
>> Dennis,
>>
>> Are you saying that the original function
>> nc_def_var_chunking will be kept intact, and there will be a
>> new function that will simplify chunk setting for some data
>> scenarios? You are not proposing any changes in the
>> netcdf-4 file format?
>>
>> --Dave
>>
>>
>> On Mon, May 15, 2017 at 1:29 PM, dmh@xxxxxxxx
>> <mailto:dmh@xxxxxxxx> <mailto:dmh@xxxxxxxx
>> <mailto:dmh@xxxxxxxx>> <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>
>> <mailto:dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>>> wrote:
>>
>> I am soliciting opinions about an alternate way to
>> specify chunking
>> for netcdf files. If you are not familiar with
>> chunking, then
>> you probably can ignore this message.
>>
>> Currently, one species a per-dimension decomposition that
>> together determine how a the data for a variable is
>> decomposed
>> into chunks. So e.g. if I have variable (pardon the
>> shorthand notation)
>> x[d1=8,d2=12]
>> and I say d1 is chunked 4 and d2 is chunked 4, then x
>> will be decomposed
>> into 6 chunks (8/4 * 12/4).
>>
>> I am proposing this alternate. Suppose we have
>> x[d1,d2,...dm]
>> And we specify a position 1<=c<m
>> Then the idea is that we create chunks of size
>> d(c+1) * d(c+2) *...dm
>> There will be d1*d2*...dc such chunks.
>> In other words, we split the set of dimensions at some
>> point (c)
>> and create the chunks based on that split.
>>
>> The claim is that for many situations, the leftmost
>> dimensions
>> are what we want to iterate over: e.g. time; and we
>> then want
>> to read all of the rest of the data associated with
>> that time.
>>
>> So, my question is: is such a style of chunking useful?
>>
>> If this is not clear, let me know and I will try to
>> clarify.
>> =Dennis Heimbigner
>> Unidata
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>> <http://www.unidata.ucar.edu/mailing_lists/>
>>
>>
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>