Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

Correction, inserted below.  I said "increasing the per-variable chunk
size", I meant to say "per-variable CACHE size".

On Thu, Dec 15, 2016 at 6:03 PM, Dave Allured - NOAA Affiliate <
dave.allured@xxxxxxxx> wrote:

> On Thu, Dec 15, 2016 at 4:46 PM, Chris Barker <chris.barker@xxxxxxxx>
> wrote:
>
>> On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:
>>
>>> 1. Adding this feature to ncdump also requires adding
>>>    it to the netcdf-c library API. But providing some means
>>>    for client programs to pass thru parameter settings to the hdf5 lib
>>>    seems like a good idea.
>>>
>>
>> absolutely! that would be very helpful.
>>
>> -CHB
>>
>
> This may be premature.  The netcdf API already has its own chunk cache
> with at least two functions to adjust tuning parameters.  It seems to me
> that the netcdf facility would probably handle the current ncdump and gdal
> cases nicely, though I have not tested it.  Please see this relevant
> documentation:
>
> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf_chunking.html
>
> Simon, you might want to ask your gdal maintainer to give this a try.  If
> it works, it should be simple and robust.  I would suggest increasing the
> per-variable CACHE size to at least 5 qualityFlags.nc uncompressed chunks,
> and probably more.  5 is the number of chunks that span a single row for
> this particular file.  This advice presumes that your typical read pattern
> is similar to ncdump, which I speculate is first across single whole rows,
> as I said earlier.
>
>   columns = 4865 ;
>   rows = 3682 ;
>   uint quality_flags(rows, columns) ;
>     quality_flags:_ChunkSizes = 891, 1177 ;
>
> 5 x 891 x 1177 x 4 bytes per uint uncompressed ~= 21 Mbytes
>
> Note this is likely to be a little larger than the default cache size in
> the current netcdf-C library, thus explaining some of the slow read
> behavior.
>
> You might also consider rechunking such data sets to smaller chunk size.
> Nccopy and ncks can do that.  Rechunking may depend on your anticipated
> spatial read patterns, so give that a little thought.
>
> You might also consider reading the entire grid in a single get_vara call
> to the netcdf API.  That is what my fast fortran test program did.  A naive
> reader that, for example, loops over single rows may incur bad cache
> activity that could be avoided.
>
> --Dave
>