Correction, inserted below. I said "increasing the per-variable chunk
size", I meant to say "per-variable CACHE size".
On Thu, Dec 15, 2016 at 6:03 PM, Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:
> On Thu, Dec 15, 2016 at 4:46 PM, Chris Barker <chris.barker@xxxxxxxx>
> wrote:
>
>> On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:
>>
>>> 1. Adding this feature to ncdump also requires adding
>>> it to the netcdf-c library API. But providing some means
>>> for client programs to pass thru parameter settings to the hdf5 lib
>>> seems like a good idea.
>>>
>>
>> absolutely! that would be very helpful.
>>
>> -CHB
>>
>
> This may be premature. The netcdf API already has its own chunk cache
> with at least two functions to adjust tuning parameters. It seems to me
> that the netcdf facility would probably handle the current ncdump and gdal
> cases nicely, though I have not tested it. Please see this relevant
> documentation:
>
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html
>
> Simon, you might want to ask your gdal maintainer to give this a try. If
> it works, it should be simple and robust. I would suggest increasing the
> per-variable CACHE size to at least 5 qualityFlags.nc uncompressed chunks,
> and probably more. 5 is the number of chunks that span a single row for
> this particular file. This advice presumes that your typical read pattern
> is similar to ncdump, which I speculate is first across single whole rows,
> as I said earlier.
>
> columns = 4865 ;
> rows = 3682 ;
> uint quality_flags(rows, columns) ;
> quality_flags:_ChunkSizes = 891, 1177 ;
>
> 5 x 891 x 1177 x 4 bytes per uint uncompressed ~= 21 Mbytes
>
> Note this is likely to be a little larger than the default cache size in
> the current netcdf-C library, thus explaining some of the slow read
> behavior.
>
> You might also consider rechunking such data sets to smaller chunk size.
> Nccopy and ncks can do that. Rechunking may depend on your anticipated
> spatial read patterns, so give that a little thought.
>
> You might also consider reading the entire grid in a single get_vara call
> to the netcdf API. That is what my fast fortran test program did. A naive
> reader that, for example, loops over single rows may incur bad cache
> activity that could be avoided.
>
> --Dave
>