A last minute change before the 4.1 release ensures that this common case will get good performance.
There is a terrible performance hit if your chunk cache is too small to hold even one chunk, and your data are deflated.
Since the default HDF5 chunk cache size is 1 MB, this is not hard to do.
So I have added code such that, when a file is opened, if the data are
compressed, and if the chunksize is greater than the default chunk cache
size for that var, then the chunk cache is increased to a multiple of
the chunk size.
The code looks like this:
/* Is this a deflated variable with a chunksize greater than the
* current cache size? */
if (!var->contiguous && var->deflate)
{
chunk_size_bytes = 1;
for (d = 0; d < var->ndims; d++)
chunk_size_bytes *= var->chunksizes[d];
if (var->type_info->size)
chunk_size_bytes *= var->type_info->size;
else
chunk_size_bytes *= sizeof(char *);
#define NC_DEFAULT_NUM_CHUNKS_IN_CACHE 10
#define NC_DEFAULT_MAX_CHUNK_CACHE 67108864
if (chunk_size_bytes > var->chunk_cache_size)
{
var->chunk_cache_size = chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE;
if (var->chunk_cache_size > NC_DEFAULT_MAX_CHUNK_CACHE)
var->chunk_cache_size = NC_DEFAULT_MAX_CHUNK_CACHE;
if ((retval = nc4_reopen_dataset(grp, var)))
return retval;
}
}
I am setting the chunk cache to 10 times the chunk size, up to 64 MB max. Reasonable? Comments are welcome.
The timing results show a clear difference. First, two runs without any per-variable caching, but the second run sets a 64MB file level chunk
cache that speeds up timing considerably. (The last number in the row is the average read time for a horizontal layer, in miro-seconds.)
bash-3.2$ ./tst_ar4_3d pr_A1_z1_256_128_256.nc
256 128 256 1.0 1 0 836327 850607
bash-3.2$ ./tst_ar4_3d -c 68000000 pr_A1_z1_256_128_256.nc
256 128 256 64.8 1 0 833453 3562
Without the cache it is over 200 times slower.
Now I have turned on automatic variable caches when appropriate:
bash-3.2$ ./tst_ar4_3d pr_A1_z1_256_128_256.nc
256 128 256 1.0 1 0 831470 3568
In this run, although no file level cache was turned on, I got the same response time. That's because when opening the file the netCDF library noticed that this deflated var had a chunk size bigger than the default cache size, and opened a bigger cache.
All of this work is in support of the general netCDF user writing very large files, and specifically in support of the AR-5 effort.
The only downside is that, if you open up a file with many such variables, and you have very little memory on your machine, you will run out of memory.