Hi,
We've noticed a large difference in single variable read times between
netCDF-4/HDF5 reads made with the netCDF-4 API (slower) and reads made with
the HDF5 API (faster). Single variable reads are a common access pattern in
fusion research, where 1000s of files are often scanned to analyse
experimental data for a particular variable.
This came to light when benchmarking reads of a netCDF-4/HDF5 file with 3000
variables, where a single variable read (open, read, close) took 11ms using
the HDF5 API and 1300ms using the netCDF-4 API. In contrast, multiple
variable reads of the open file with the netCDF-4 API took 0.7ms each.
Evidently the netCDF-4 API builds an internal data structure upfront to
assist possible later access, while the HDF5 API postpones that until access
is actually needed.
We are considering using the HDF5 API for fast single variable access. Is
there any other way to avoid the netCDF-4 file open overhead?
Very interesting!!! We implemented parallel I/O in ROMS recently using the
netCDF-4/HDF5 libraries. We discovered, to our surprise, that it is extremely
inefficient and much slower than our serial I/O implementation. I tried to
follow this in the debugger but it was too cumbersome. There are so many calls
in both libraries to just read a single variable (scalar or array). Some of the
call are recursive. This may explain why is so slow.
Best, H
-----------------------------------------------------------------------
Hernan G. Arango Institute of Marine and Coastal Sciences
arango@xxxxxxxxxxxxxxxxxx Rutgers University
off: (732) 932-6555 x266 71 Dudley Road
FAX: (732) 932-6520 New Brunswick, NJ 08901-8521, USA
http://marine.rutgers.edu/po/arango
http://marine.rutgers.edu/po/arango/rocco
http://marine.rutgers.edu/roms
http://www.myroms.org
http://www.ocean-modeling.org