nccopy is fast in both cases.
On Thu, Dec 15, 2016 at 1:00 PM, dmh@xxxxxxxx <dmh@xxxxxxxx> wrote:
> Two notes:
> 1. Adding this feature to ncdump also requires adding
> it to the netcdf-c library API. But providing some means
> for client programs to pass thru parameter settings to the hdf5 lib
> seems like a good idea.
> 2. Have you tried using nccopy to see if that is slow as well?
> =Dennis Heimbigner
> Unidata
>
>
> On 12/15/2016 1:03 PM, Chris Barker wrote:
>
>> On Thu, Dec 15, 2016 at 11:30 AM, Julian Kunkel
>> <juliankunkel@xxxxxxxxxxxxxx <mailto:juliankunkel@xxxxxxxxxxxxxx>> wrote:
>>
>> HDF5 does provide a chunk cache, but I presume in this case it is
>> simply too small to fit.
>> You can imagine if you increase the number of columns further at some
>> point the 2D chunks will exceed (any) sized cache.
>> This is however, suboptimal in that case as ncdump outputs row after
>> row.
>>
>> so to see if I have this right -- libhdf5 manages chunk caching for you.
>> But it's up to the client application to set an appropriate size for the
>> chunk cache.
>>
>> I suspect that the defaults in HDF are pretty small, to be conservative.
>>
>> One way to fix this would be to adjust ncdump to increase the chunk
>> cache to a reasonable amount of main memory,
>>
>>
>> I think this is a good idea -- ncdump is usually (always?) used in a
>> one-off -- t will red the file and then the program closes -- so no real
>> worry it hanging on to a lot of memory.
>>
>>
>> potentially offering
>> users a way to influence this value from command line.
>>
>>
>> sure -- though that's kind of a low-level feature -- no sure if anyone
>> using ncdump is liley to use it.
>>
>> But there is still the question as to the regression in performance?
>> what changed in netcdf or ncdump in this regard???
>>
>> Also -- the OP states they first noticed this with GDAL, not ncdump --
>> so maybe GDAL needs to be smarter about setting the cache size -- but
>> I'm thinking that netcdf, or even hdf, may want to increase the default
>> chunk cache size.
>>
>> I see this in the HDF docs: "The default size is 1 MB" -- 1MB was a lot
>> not that long ago, but with laptops sporting many GB of memory -- it
>> would make sense to make the default a lot bigger (10% of system memory?
>> even 1% of system memory would be more these days.
>>
>> Of course, another reminder that you really don't want to use large (or
>> highly non-square) chunk sizes unless you really know your access
>> patterns.
>>
>> I did some experiments with this a while back, because netcdf's default
>> chunk sizes had a bug of sorts (1 element chunk sizes for 1-d arrays!)
>> -- in that one use case, with that particular system, the writing
>> performance was about the same once chunk sizes got about about 1k or
>> so. I can't recall if I tested read performance.
>>
>> The point being -- really large chunks are not helpful, and can kill you
>> if you are using an incompatible access pattern
>>
>> -CHB
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> See https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Propert
>> y-SetChunkCache
>> <https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Proper
>> ty-SetChunkCache>
>>
>>
>>
>> There is quite a good doku here about general aspects of Chunking:
>> https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/
>> <https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/>
>>
>> Julian
>>
>> 2016-12-15 20:17 GMT+01:00 Chris Barker <chris.barker@xxxxxxxx
>> <mailto:chris.barker@xxxxxxxx>>:
>> > On Wed, Dec 14, 2016 at 9:13 PM, Dave Allured - NOAA Affiliate
>> > <dave.allured@xxxxxxxx <mailto:dave.allured@xxxxxxxx>> wrote:
>> >
>> >>
>> >> So I think you have a read cacheing failure, due to interaction
>> between
>> >> the ncdump read pattern, and your chunking scheme.
>> >
>> > ...
>> >>
>> >> A sampling tool found that ncdump was spending more than 96% of
>> its time
>> >> inside an HDF5 chunk reader with decompression. Every time an
>> HDF5 chunk is
>> >> physically read from disk, the *entire* chunk must be
>> decompressed, even to
>> >> access a single value. You see why chunk cacheing is important.
>> >
>> >
>> > Does HDF5 not do any chunk caching itself? or for that matter,
>> netcdf4? Is
>> > is really up to the application level to manage the caching? that
>> seems like
>> > handling it at the wrong level to me.
>> >
>> > -CHB
>> >
>> >
>> >
>> > --
>> >
>> > Christopher Barker, Ph.D.
>> > Oceanographer
>> >
>> > Emergency Response Division
>> > NOAA/NOS/OR&R (206) 526-6959
>> <tel:%28206%29%20526-6959> voice
>> > 7600 Sand Point Way NE (206) 526-6329
>> <tel:%28206%29%20526-6329> fax
>> > Seattle, WA 98115 (206) 526-6317
>> <tel:%28206%29%20526-6317> main reception
>> >
>> > Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
>> >
>> > _______________________________________________
>> > NOTE: All exchanges posted to Unidata maintained email lists are
>> > recorded in the Unidata inquiry tracking system and made publicly
>> > available through the web. Users who post to any of the lists we
>> > maintain are reminded to remove any personal information that they
>> > do not want to be made public.
>> >
>> >
>> > netcdfgroup mailing list
>> > netcdfgroup@xxxxxxxxxxxxxxxx <mailto:netcdfgroup@xxxxxxxxxxxxxxxx>
>> > For list information or to unsubscribe, visit:
>> > http://www.unidata.ucar.edu/mailing_lists/
>> <http://www.unidata.ucar.edu/mailing_lists/>
>>
>>
>>
>> --
>> http://wr.informatik.uni-hamburg.de/people/julian_kunkel
>> <http://wr.informatik.uni-hamburg.de/people/julian_kunkel>
>>
>>
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R (206) 526-6959 voice
>> 7600 Sand Point Way NE (206) 526-6329 fax
>> Seattle, WA 98115 (206) 526-6317 main reception
>>
>> Chris.Barker@xxxxxxxx <mailto:Chris.Barker@xxxxxxxx>
>>
>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web. Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> netcdfgroup mailing list
>> netcdfgroup@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe, visit:
>> http://www.unidata.ucar.edu/mailing_lists/
>>
>>
> _______________________________________________
> NOTE: All exchanges posted to Unidata maintained email lists are
> recorded in the Unidata inquiry tracking system and made publicly
> available through the web. Users who post to any of the lists we
> maintain are reminded to remove any personal information that they
> do not want to be made public.
>
>
> netcdfgroup mailing list
> netcdfgroup@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
>