Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

To: "dmh@xxxxxxxx" <dmh@xxxxxxxx>
Subject: Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
From: "Simon (Vsevolod) Ilyushchenko" <simonf@xxxxxxxxxx>
Date: Thu, 15 Dec 2016 19:38:33 -0800

nccopy is fast in both cases.

On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:

> Two notes:
> 1. Adding this feature to ncdump also requires adding
>    it to the netcdf-c library API. But providing some means
>    for client programs to pass thru parameter settings to the hdf5 lib
>    seems like a good idea.
> 2. Have you tried using nccopy to see if that is slow as well?
> =Dennis Heimbigner
>  Unidata
>
>
> On 12/15/2016 1:03 PM, Chris Barker wrote:
>
>> On Thu, Dec 15, 2016 at 11:30 AM, Julian Kunkel
>> <juliankunkel@xxxxxxxxxxxxxx <mailto:juliankunkel@xxxxxxxxxxxxxx>> wrote:
>>
>>     HDF5 does provide a chunk cache, but I presume in this case it is
>>     simply too small to fit.
>>     You can imagine if you increase the number of columns further at some
>>     point the 2D chunks will exceed (any) sized cache.
>>     This is however, suboptimal in that case as ncdump outputs row after
>>     row.
>>
>> so to see if I have this right -- libhdf5 manages chunk caching for you.
>> But it's up to the client application to set an appropriate size for the
>> chunk cache.
>>
>> I suspect that the defaults in HDF are pretty small, to be conservative.
>>
>>     One way to fix this would be to adjust ncdump to increase the chunk
>>     cache to a reasonable amount of main memory,
>>
>>
>> I think this is a good idea -- ncdump is usually (always?) used in a
>> one-off -- t will red the file and then the program closes -- so no real
>> worry it hanging on to a lot of memory.
>>
>>
>>     potentially offering
>>     users a way to influence this value from command line.
>>
>>
>> sure -- though that's kind of a low-level feature -- no sure if anyone
>> using ncdump is liley to use it.
>>
>> But there is still the question as to the regression in performance?
>> what changed in netcdf or ncdump in this regard???
>>
>> Also -- the OP states they first noticed this with GDAL, not ncdump --
>> so maybe GDAL needs to be smarter about setting the cache size -- but
>> I'm thinking that netcdf, or even hdf, may want to increase the default
>> chunk cache size.
>>
>> I see this in the HDF docs: "The default size is 1 MB" -- 1MB was a lot
>> not that long ago, but with laptops sporting many GB of memory -- it
>> would make sense to make the default a lot bigger (10% of system memory?
>> even 1% of system memory would be more these days.
>>
>> Of course, another reminder that you really don't want to use large (or
>> highly non-square) chunk sizes unless you really know your access
>> patterns.
>>
>> I did some experiments with this a while back, because netcdf's default
>> chunk sizes had a bug of sorts (1 element chunk sizes for 1-d arrays!)
>> -- in that one use case, with that particular system, the writing
>> performance was about the same once chunk sizes got about about 1k or
>> so. I can't recall if I tested read performance.
>>
>> The point being -- really large chunks are not helpful, and can kill you
>> if you are using an incompatible access pattern
>>
>> -CHB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     See https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Propert
>> y-SetChunkCache
>>     <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Proper
>> ty-SetChunkCache>
>>
>>
>>
>>     There is quite a good doku here about general aspects of Chunking:
>>     https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/
>>     <https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/>
>>
>>     Julian
>>
>>     2016-12-15 20:17 GMT+01:00 Chris Barker <chris.barker@xxxxxxxx
>>     <mailto:chris.barker@xxxxxxxx>>:
>>     > On Wed, Dec 14, 2016 at 9:13 PM, Dave Allured - NOAA Affiliate
>>     > <dave.allured@xxxxxxxx <mailto:dave.allured@xxxxxxxx>> wrote:
>>     >
>>     >>
>>     >> So I think you have a read cacheing failure, due to interaction
>>     between
>>     >> the ncdump read pattern, and your chunking scheme.
>>     >
>>     > ...
>>     >>
>>     >> A sampling tool found that ncdump was spending more than 96% of
>>     its time
>>     >> inside an HDF5 chunk reader with decompression.  Every time an
>>     HDF5 chunk is
>>     >> physically read from disk, the *entire* chunk must be
>>     decompressed, even to
>>     >> access a single value.  You see why chunk cacheing is important.
>>     >
>>     >
>>     > Does HDF5 not do any chunk caching itself? or for that matter,
>>     netcdf4? Is
>>     > is really up to the application level to manage the caching? that
>>     seems like
>>     > handling it at the wrong level to me.
>>     >
>>     > -CHB
>>     >
>>     >
>>     >
>>     > --
>>     >
>>     > Christopher Barker, Ph.D.
>>     > Oceanographer
>>     >
>>     > Emergency Response Division
>>     > NOAA/NOS/OR&R            (206) 526-6959
>>     <tel:%28206%29%20526-6959>   voice
>>     > 7600 Sand Point Way NE   (206) 526-6329
>>     <tel:%28206%29%20526-6329>   fax
>>     > Seattle, WA  98115       (206) 526-6317
>>     <tel:%28206%29%20526-6317>   main reception
>>     >
>>     > Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
>>     >
>>     > _______________________________________________
>>     > NOTE: All exchanges posted to Unidata maintained email lists are
>>     > recorded in the Unidata inquiry tracking system and made publicly
>>     > available through the web.  Users who post to any of the lists we
>>     > maintain are reminded to remove any personal information that they
>>     > do not want to be made public.
>>     >
>>     >
>>     > netcdfgroup mailing list
>>     > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>>     > For list information or to unsubscribe,  visit:
>>     > http://www.unidata.ucar.edu/mailing_lists/
>>     <http://www.unidata.ucar.edu/mailing_lists/>
>>
>>
>>
>>     --
>>     http://wr.informatik.uni-hamburg.de/people/julian_kunkel
>>     <http://wr.informatik.uni-hamburg.de/people/julian_kunkel>
>>
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web.  Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web.  Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit:
> http://www.unidata.ucar.edu/mailing_lists/
>

References:
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Charlie Zender
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Ed Hartnett
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Simon (Vsevolod) Ilyushchenko
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Simon (Vsevolod) Ilyushchenko
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Julian Kunkel
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: dmh@xxxxxxxx

2016 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: