Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

To: Chris Barker <chris.barker@xxxxxxxx>, Julian Kunkel <juliankunkel@xxxxxxxxxxxxxx>
Subject: Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
From: "dmh@xxxxxxxx" <dmh@xxxxxxxx>
Date: Thu, 15 Dec 2016 14:00:15 -0700

Two notes:
1. Adding this feature to ncdump also requires adding
   it to the netcdf-c library API. But providing some means
   for client programs to pass thru parameter settings to the hdf5 lib
   seems like a good idea.
2. Have you tried using nccopy to see if that is slow as well?
=Dennis Heimbigner
 Unidata


On 12/15/2016 1:03 PM, Chris Barker wrote:

On Thu, Dec 15, 2016 at 11:30 AM, Julian Kunkel
<juliankunkel@xxxxxxxxxxxxxx <mailto:juliankunkel@xxxxxxxxxxxxxx>> wrote:

    HDF5 does provide a chunk cache, but I presume in this case it is
    simply too small to fit.
    You can imagine if you increase the number of columns further at some
    point the 2D chunks will exceed (any) sized cache.
    This is however, suboptimal in that case as ncdump outputs row after
    row.

so to see if I have this right -- libhdf5 manages chunk caching for you.
But it's up to the client application to set an appropriate size for the
chunk cache.

I suspect that the defaults in HDF are pretty small, to be conservative.

    One way to fix this would be to adjust ncdump to increase the chunk
    cache to a reasonable amount of main memory,

I think this is a good idea -- ncdump is usually (always?) used in a
one-off -- t will red the file and then the program closes -- so no real
worry it hanging on to a lot of memory.

    potentially offering
    users a way to influence this value from command line.

sure -- though that's kind of a low-level feature -- no sure if anyone
using ncdump is liley to use it.

But there is still the question as to the regression in performance?
what changed in netcdf or ncdump in this regard???

Also -- the OP states they first noticed this with GDAL, not ncdump --
so maybe GDAL needs to be smarter about setting the cache size -- but
I'm thinking that netcdf, or even hdf, may want to increase the default
chunk cache size.

I see this in the HDF docs: "The default size is 1 MB" -- 1MB was a lot
not that long ago, but with laptops sporting many GB of memory -- it
would make sense to make the default a lot bigger (10% of system memory?
even 1% of system memory would be more these days.

Of course, another reminder that you really don't want to use large (or
highly non-square) chunk sizes unless you really know your access patterns.

I did some experiments with this a while back, because netcdf's default
chunk sizes had a bug of sorts (1 element chunk sizes for 1-d arrays!)
-- in that one use case, with that particular system, the writing
performance was about the same once chunk sizes got about about 1k or
so. I can't recall if I tested read performance.

The point being -- really large chunks are not helpful, and can kill you
if you are using an incompatible access pattern

-CHB

    See 
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache

<https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache>

    There is quite a good doku here about general aspects of Chunking:
    https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/
    <https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/>

    Julian

    2016-12-15 20:17 GMT+01:00 Chris Barker <chris.barker@xxxxxxxx
    <mailto:chris.barker@xxxxxxxx>>:
    > On Wed, Dec 14, 2016 at 9:13 PM, Dave Allured - NOAA Affiliate
    > <dave.allured@xxxxxxxx <mailto:dave.allured@xxxxxxxx>> wrote:
    >
    >>
    >> So I think you have a read cacheing failure, due to interaction
    between
    >> the ncdump read pattern, and your chunking scheme.
    >
    > ...
    >>
    >> A sampling tool found that ncdump was spending more than 96% of
    its time
    >> inside an HDF5 chunk reader with decompression.  Every time an
    HDF5 chunk is
    >> physically read from disk, the *entire* chunk must be
    decompressed, even to
    >> access a single value.  You see why chunk cacheing is important.
    >
    >
    > Does HDF5 not do any chunk caching itself? or for that matter,
    netcdf4? Is
    > is really up to the application level to manage the caching? that
    seems like
    > handling it at the wrong level to me.
    >
    > -CHB
    >
    >
    >
    > --
    >
    > Christopher Barker, Ph.D.
    > Oceanographer
    >
    > Emergency Response Division
    > NOAA/NOS/OR&R            (206) 526-6959
    <tel:%28206%29%20526-6959>   voice
    > 7600 Sand Point Way NE   (206) 526-6329
    <tel:%28206%29%20526-6329>   fax
    > Seattle, WA  98115       (206) 526-6317
    <tel:%28206%29%20526-6317>   main reception
    >
    > Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
    >
    > _______________________________________________
    > NOTE: All exchanges posted to Unidata maintained email lists are
    > recorded in the Unidata inquiry tracking system and made publicly
    > available through the web.  Users who post to any of the lists we
    > maintain are reminded to remove any personal information that they
    > do not want to be made public.
    >
    >
    > netcdfgroup mailing list
    > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
    > For list information or to unsubscribe,  visit:
    > http://www.unidata.ucar.edu/mailing_lists/
    <http://www.unidata.ucar.edu/mailing_lists/>

    --
    http://wr.informatik.uni-hamburg.de/people/julian_kunkel
    <http://wr.informatik.uni-hamburg.de/people/julian_kunkel>

--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>

_______________________________________________
NOTE: All exchanges posted to Unidata maintained email lists are
recorded in the Unidata inquiry tracking system and made publicly
available through the web.  Users who post to any of the lists we
maintain are reminded to remove any personal information that they
do not want to be made public.

netcdfgroup mailing list
netcdfgroup@xxxxxxxxxxxxxxxx
For list information or to unsubscribe,  visit: 
http://www.unidata.ucar.edu/mailing_lists/

Follow-Ups:
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Simon (Vsevolod) Ilyushchenko
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker

References:
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Charlie Zender
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Ed Hartnett
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Simon (Vsevolod) Ilyushchenko
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Simon (Vsevolod) Ilyushchenko
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Dave Allured - NOAA Affiliate
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Julian Kunkel
- Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files
  - From: Chris Barker

2016 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: