[netcdfgroup] Strided reads slow

To: "'netcdfgroup@xxxxxxxxxxxxxxxx'" <netcdfgroup@xxxxxxxxxxxxxxxx>
Subject: [netcdfgroup] Strided reads slow
From: "Peglar, Patrick" <patrick.peglar@xxxxxxxxxxxxxxxx>
Date: Mon, 12 Aug 2013 13:55:51 +0000

Hi

I just thought I'd ask the world in general whether other people are having 
trouble with this.

I was contacted for an internal support issue by someone getting very slow 
reading performance from large Netcdf4 files.
He was doing "strided" access to a variable (i.e. reading 1-of-every-N points).
I produced a simple C api testcase, which reads all of a 1M float array in 
about 2 mSecs, but takes nearly 4 seconds to load every-other-point (stride=2).

This has already been discussed with the dev team, who replied variously...
   -----Original Message-----
   From: Unidata netCDF Support [mailto:support-netcdf@xxxxxxxxxxxxxxxx]
   Sent: 09 August 2013 21:57
   To: Peglar, Patrick
   Cc: support-netcdf@xxxxxxxxxxxxxxxx
   Subject: [netCDF #ZFB-587742]: Reading variable with strides very slow

   Patrick,

   This turns out to be a known problem with HDF5 performance:

     
http://mail.lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2012-November/006195.html

   --Russ

(from older discussions ..)
   > > > Patrick-
   > > >
   > > > Vars in netcdf is inherently slow
   > > > (when stride > 1) because it cannot
   > > > easily make use of bulk read operations.
   > > > So the library must read element by element
   > > > from the underlying disk storage. This has
   > > > a noticeable effect on performance. This is not
   > > > easy to fix because it must do the read using only
   > > > the memory that is passed to it by the client.
   > > >
   > > > For netcdf versions before 4.3.0 (including 4.1.3)
   > > > there was an additional factor. For historical
   > > > reasons, vars was implemented in terms of varm
   > > > so there was some additional overhead.
   > > >
   > > > If you upgrade to 4.3.0, you will see some performance
   > > > improvement but not, probably, enough to solve your problem.
   > > >
   > > > Sorry I do not have better news.
   > > > =Dennis Heimbigner
   > > >  Unidata
   > >
   > > On the netcdf-3 vs netcdf-4 issue I can at the moment
   > > only speculate. As a rule, reading small quantities of data
   > > with netcdf-4 is always slower than netcdf-3 because the
   > > underlying HDF5 file format is based on b-trees rather than the
   > > linear disk layout of netcdf-3. Since vars reads a single
   > > element at a time, that overhead can, I suspect, be significant.
   > > I am, however surprised that it is as large as you show.
   > >
   > > =Dennis Heimbigner
   > >  Unidata
   > >
   > In this case, no b-trees are involved, because the data storage is
   > contiguous, not chunked (according to ncdump -h -s).  So I'm
   > surprised how slow the strided netCDF access is, and suspect there
   > might be a performance bug in how netCDF-4 uses the HDF5 API for
   > strided access.

   Russ Rew                                         UCAR Unidata Program
   russ@xxxxxxxxxxxxxxxx                      http://www.unidata.ucar.edu


Our original usecase is constrained by memory space limitations.
Obviously, workarounds are possible, but all a bit awkward.

It seems it is not yet clear that the HDF5 problem alone can explain the 
magnitude of the problem, so I think there may still be more to learn about 
this.

The question is, does this really need addressing
-- so, is anyone else having serious problems with this ?

Regards
Patrick
--
Patrick Peglar  AVD Team Software Engineer
Analysis, Visualisation and Data Team  http://www-avd/
Tel: +44 (0)1392 88 5748
Email: patrick.peglar@xxxxxxxxxxxxxxxx<mailto:patrick.peglar@xxxxxxxxxxxxxxxx>
Met Office  Fitzroy Road  Exeter  EX1 3PB  
web:www.metoffice.gov.uk<http://www.metoffice.gov.uk>

Follow-Ups:
- Re: [netcdfgroup] Strided reads slow
  - From: Dennis Heimbigner

2013 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the netcdfgroup archives: