Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

I looked at the GDAL driver for netcdf. Adding
nc_set_chunk_cache(100000000, 10000, 0.75)
in GDALRegister_netCDF() helps, but I'm not sure whether it's a good
general solution.

The code on lines 482-500
<https://github.com/OSGeo/gdal/blob/trunk/gdal/frmts/netcdf/netcdfdataset.cpp#L482>
sets
some chunking-related properties if NETCDF_HAS_NC4 is defined. My problem
persists, though, whether or not this is defined (unless I add the caching
call). Note that later on line 511 nBlockSize is set to 1 regardless of the
define for bottom-up datasets (which mine is). However, for other datasets
that I have nBlockSize is also set to 1, and they are fast.

Does anything else in netcdfdataset.cpp ring any bells? I can bring this up
with gdal folks, but wanted to check here first for a possible
recommendation.

On Thu, Dec 15, 2016 at 5:24 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:

> The interactions between two independent caches can cause
> problems. I should look at the netcdf cache and see how it interacts
> with the hdf5 cache.
> =Dennis Heimbigner
>  Unidata
>
> On 12/15/2016 6:03 PM, Dave Allured - NOAA Affiliate wrote:
>
>> On Thu, Dec 15, 2016 at 4:46 PM, Chris Barker <chris.barker@xxxxxxxx
>> <mailto:chris.barker@xxxxxxxx>> wrote:
>>
>>     On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>
>>
>>     <dmh@xxxxxxxx <mailto:dmh@xxxxxxxx>> wrote:
>>
>>         1. Adding this feature to ncdump also requires adding
>>            it to the netcdf-c library API. But providing some means
>>            for client programs to pass thru parameter settings to the
>>         hdf5 lib
>>            seems like a good idea.
>>
>>
>>     absolutely! that would be very helpful.
>>
>>     -CHB
>>
>>
>> This may be premature.  The netcdf API already has its own chunk cache
>> with at least two functions to adjust tuning parameters.  It seems to me
>> that the netcdf facility would probably handle the current ncdump and
>> gdal cases nicely, though I have not tested it.  Please see this
>> relevant documentation:
>>
>> http://www.unidata.ucar.edu/software/netcdf/docs/netcdf_perf
>> _chunking.html
>>
>> Simon, you might want to ask your gdal maintainer to give this a try.
>> If it works, it should be simple and robust.  I would suggest increasing
>> the per-variable chunk size to at least 5 qualityFlags.nc uncompressed
>> chunks, and probably more.  5 is the number of chunks that span a single
>> row for this particular file.  This advice presumes that your typical
>> read pattern is similar to ncdump, which I speculate is first across
>> single whole rows, as I said earlier.
>>
>>   columns = 4865 ;
>>   rows = 3682 ;
>>   uint quality_flags(rows, columns) ;
>>     quality_flags:_ChunkSizes = 891, 1177 ;
>>
>> 5 x 891 x 1177 x 4 bytes per uint uncompressed ~= 21 Mbytes
>>
>> Note this is likely to be a little larger than the default cache size in
>> the current netcdf-C library, thus explaining some of the slow read
>> behavior.
>>
>> You might also consider rechunking such data sets to smaller chunk
>> size.  Nccopy and ncks can do that.  Rechunking may depend on your
>> anticipated spatial read patterns, so give that a little thought.
>>
>> You might also consider reading the entire grid in a single get_vara
>> call to the netcdf API.  That is what my fast fortran test program did.
>> A naive reader that, for example, loops over single rows may incur bad
>> cache activity that could be avoided.
>>
>> --Dave
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web.  Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>