Re: [netcdfgroup] slow reads in 4.4.1.1 vs 4.1.3 for some files

On Thu, Dec 15, 2016 at 11:30 AM, Julian Kunkel <juliankunkel@xxxxxxxxxxxxxx
> wrote:

> HDF5 does provide a chunk cache, but I presume in this case it is
> simply too small to fit.
> You can imagine if you increase the number of columns further at some
> point the 2D chunks will exceed (any) sized cache.
> This is however, suboptimal in that case as ncdump outputs row after row.
>
> so to see if I have this right -- libhdf5 manages chunk caching for you.
But it's up to the client application to set an appropriate size for the
chunk cache.

I suspect that the defaults in HDF are pretty small, to be conservative.

One way to fix this would be to adjust ncdump to increase the chunk
> cache to a reasonable amount of main memory,


I think this is a good idea -- ncdump is usually (always?) used in a
one-off -- t will red the file and then the program closes -- so no real
worry it hanging on to a lot of memory.


> potentially offering
> users a way to influence this value from command line.
>

sure -- though that's kind of a low-level feature -- no sure if anyone
using ncdump is liley to use it.

But there is still the question as to the regression in performance? what
changed in netcdf or ncdump in this regard???

Also -- the OP states they first noticed this with GDAL, not ncdump -- so
maybe GDAL needs to be smarter about setting the cache size -- but I'm
thinking that netcdf, or even hdf, may want to increase the default chunk
cache size.

I see this in the HDF docs: "The default size is 1 MB" -- 1MB was a lot not
that long ago, but with laptops sporting many GB of memory -- it would make
sense to make the default a lot bigger (10% of system memory? even 1% of
system memory would be more these days.

Of course, another reminder that you really don't want to use large (or
highly non-square) chunk sizes unless you really know your access patterns.

I did some experiments with this a while back, because netcdf's default
chunk sizes had a bug of sorts (1 element chunk sizes for 1-d arrays!) --
in that one use case, with that particular system, the writing performance
was about the same once chunk sizes got about about 1k or so. I can't
recall if I tested read performance.

The point being -- really large chunks are not helpful, and can kill you if
you are using an incompatible access pattern

-CHB




>
>



> See https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#
> Property-SetChunkCache



> There is quite a good doku here about general aspects of Chunking:
> https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/
>
> Julian
>
> 2016-12-15 20:17 GMT+01:00 Chris Barker <chris.barker@xxxxxxxx>:
> > On Wed, Dec 14, 2016 at 9:13 PM, Dave Allured - NOAA Affiliate
> > <dave.allured@xxxxxxxx> wrote:
> >
> >>
> >> So I think you have a read cacheing failure, due to interaction between
> >> the ncdump read pattern, and your chunking scheme.
> >
> > ...
> >>
> >> A sampling tool found that ncdump was spending more than 96% of its time
> >> inside an HDF5 chunk reader with decompression.  Every time an HDF5
> chunk is
> >> physically read from disk, the *entire* chunk must be decompressed,
> even to
> >> access a single value.  You see why chunk cacheing is important.
> >
> >
> > Does HDF5 not do any chunk caching itself? or for that matter, netcdf4?
> Is
> > is really up to the application level to manage the caching? that seems
> like
> > handling it at the wrong level to me.
> >
> > -CHB
> >
> >
> >
> > --
> >
> > Christopher Barker, Ph.D.
> > Oceanographer
> >
> > Emergency Response Division
> > NOAA/NOS/OR&R            (206) 526-6959   voice
> > 7600 Sand Point Way NE   (206) 526-6329   fax
> > Seattle, WA  98115       (206) 526-6317   main reception
> >
> > Chris.Barker@xxxxxxxx
> >
> > _______________________________________________
> > NOTE: All exchanges posted to Unidata maintained email lists are
> > recorded in the Unidata inquiry tracking system and made publicly
> > available through the web.  Users who post to any of the lists we
> > maintain are reminded to remove any personal information that they
> > do not want to be made public.
> >
> >
> > netcdfgroup mailing list
> > netcdfgroup@xxxxxxxxxxxxxxxx
> > For list information or to unsubscribe,  visit:
> > http://www.unidata.ucar.edu/mailing_lists/
>
>
>
> --
> http://wr.informatik.uni-hamburg.de/people/julian_kunkel
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@xxxxxxxx