There have been many performance improvements in the upcoming netCDF-4.1.2 release.
One improvement is a complete refactor of all netCDF-4 memory structures. Now the metadata of a netCDF file occupies the smallest possible amount of memory. I have added many more Valgrind tests, and the HDF5 team has worked hard to track down memory issues in HDF5. (Most were not really bugs, but just doing things that Valrgrid doesn't like.)
It's particularly important on high performance platforms that memory used be minimized. If you run a program with 10,000 processors, and each of them uses too much memory for the metadata, that adds up to a lot of wasted memory. And in HPC they have better uses for their memory.
The biggest improvement in performance came from a rewrite of the way that netCDF-4 reads the HDF5 file. The code has been rewritten in terms of the H5LIterate() function, and this has resulted in a huge performance gain. Here's an email from Russ quantifying this gain:
From: Russ Rew <russ-AT-unidata.ucar-DOT-edu>
Subject: timings of nc_open speedup
To: ed-AT-unidata.ucar-DOT-edu
Date: Thu, 23 Sep 2010 15:23:12 -0600
Organization: UCAR Unidata Program
Reply-to: russ-AT-unidata.ucar-DOT-edu
Ed,
On Jennifer Adam's file, here's the before and after timings on buddy (on the file and a separate copy, to defeat caching):
real 0m32.60s
user 0m0.15s
sys 0m0.46s
real 0m0.14s
user 0m0.01s
sys 0m0.02s
which is a 233x speedup.
Here's before and after for test files I created that have twice as many levels as Jennifer Adam's and much better compression:
real 0m23.78s
user 0m0.24s
sys 0m0.60s
real 0m0.05s
user 0m0.01s
sys 0m0.01s
which is a 475x speedup. By using even more levels, the speedup becomes arbitrarily large, because now nc_open takes a fixed amount of time that depends on the amount of metadata, not the amount of data.
--Russ
As Russ notes, this is a speedup that can be defined as arbitrarily large, if we tailor the input file correctly. But Jennifer's file is a real one, and at18.4 giga-bytes (name: T159_1978110112.nc4) this file is a real disk-buster. Yet it has a simple metadata structure. At a > 200 times speedup is nice. We had been talking about a new file open mode which would not open the file and read the metadata, all because it was taking so long. I guess I don't have to code that up now, so that's a least a couple of weeks work saved by this fix! (Not to mention that now netCDF-4 will work much better for these really big files, which are becoming more and more common.)
Here's the ncdump -h of this lovely test file:
netcdf T159_1978110112 {
dimensions:
lon = 320 ;
lat = 160 ;
lev = 11 ;
time = 1581 ;
variables:
double lon(lon) ;
lon:units = "degrees_east" ;
lon:long_name = "Longitude" ;
double lat(lat) ;
lat:units = "degrees_north" ;
lat:long_name = "Latitude" ;
double lev(lev) ;
lev:units = "millibar" ;
lev:long_name = "Level" ;
double time(time) ;
time:long_name = "Time" ;
time:units = "minutes since 1978-11-01 12:00" ;
float temp(time, lev, lat, lon) ;
temp:missing_value = -9.99e+08f ;
temp:longname = "Temperature [K]" ;
temp:units = "K" ;
float geop(time, lev, lat, lon) ;
geop:missing_value = -9.99e+08f ;
geop:longname = "Geopotential [m^2/s^2]" ;
geop:units = "m^2/s^2" ;
float relh(time, lev, lat, lon) ;
relh:missing_value = -9.99e+08f ;
relh:longname = "Relative Humidity [%]" ;
relh:units = "%" ;
float vor(time, lev, lat, lon) ;
vor:missing_value = -9.99e+08f ;
vor:longname = "Vorticity [s^-1]" ;
vor:units = "s^-1" ;
float div(time, lev, lat, lon) ;
div:missing_value = -9.99e+08f ;
div:longname = "Divergence [s^-1]" ;
div:units = "s^-1" ;
float uwnd(time, lev, lat, lon) ;
uwnd:missing_value = -9.99e+08f ;
uwnd:longname = "U-wind [m/s]" ;
uwnd:units = "m/s" ;
float vwnd(time, lev, lat, lon) ;
vwnd:missing_value = -9.99e+08f ;
vwnd:longname = "V-wind [m/s]" ;
vwnd:units = "m/s" ;
float sfp(time, lat, lon) ;
sfp:missing_value = -9.99e+08f ;
sfp:longname = "Surface Pressure [Pa]" ;
sfp:units = "Pa" ;
// global attributes:
:NCO = "4.0.2" ;
}
Special thanks to Jennifer Adams, from the GrADS project. Not only did she provide this great test file, but she also built my branch distribution and tested the fix for me! Thanks Jennifer! Thanks also to Quincey of HDF5 for helping me sort out the best way to read a HDF5 file.
Now I just have to make sure that parallel I/O is working OK, and then 4.1.2 will be ready for release!