Unidata Developer's Blog

Showing entries tagged [chunk-cache]

Proof New Default Chunk Cache in 4.1 Improves Performance

14 January 2010

A last minute change before the 4.1 release ensures that this common case will get good performance.

There is a terrible performance hit if your chunk cache is too small to hold even one chunk, and your data are deflated.

Since the default HDF5 chunk cache size is 1 MB, this is not hard to do.

So I have added code such that, when a file is opened, if the data are compressed, and if the chunksize is greater than the default chunk cache size for that var, then the chunk cache is increased to a multiple of the chunk size.

The code looks like this:

/* Is this a deflated variable with a chunksize greater than the                                                                                               
 * current cache size? */
if (!var->contiguous && var->deflate)
{
   chunk_size_bytes = 1;
   for (d = 0; d < var->ndims; d++)
     chunk_size_bytes *= var->chunksizes[d];
   if (var->type_info->size)
     chunk_size_bytes *= var->type_info->size;
   else
     chunk_size_bytes *= sizeof(char *);
#define NC_DEFAULT_NUM_CHUNKS_IN_CACHE 10
#define NC_DEFAULT_MAX_CHUNK_CACHE 67108864
   if (chunk_size_bytes > var->chunk_cache_size)
   {
     var->chunk_cache_size = chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE;
     if (var->chunk_cache_size > NC_DEFAULT_MAX_CHUNK_CACHE)
        var->chunk_cache_size = NC_DEFAULT_MAX_CHUNK_CACHE;
     if ((retval = nc4_reopen_dataset(grp, var)))
        return retval;
   }
}

I am setting the chunk cache to 10 times the chunk size, up to 64 MB max. Reasonable? Comments are welcome.

The timing results show a clear difference. First, two runs without any per-variable caching, but the second run sets a 64MB file level chunk
cache that speeds up timing considerably. (The last number in the row is the average read time for a horizontal layer, in miro-seconds.)

bash-3.2$ ./tst_ar4_3d  pr_A1_z1_256_128_256.nc 
256     128     256     1.0             1       0           836327       850607


bash-3.2$ ./tst_ar4_3d -c 68000000 pr_A1_z1_256_128_256.nc 
256     128     256     64.8            1       0           833453       3562

Without the cache it is over 200 times slower.

Now I have turned on automatic variable caches when appropriate:

bash-3.2$ ./tst_ar4_3d  pr_A1_z1_256_128_256.nc 
256     128     256     1.0             1       0           831470       3568

In this run, although no file level cache was turned on, I got the same response time. That's because when opening the file the netCDF library noticed that this deflated var had a chunk size bigger than the default cache size, and opened a bigger cache.

All of this work is in support of the general netCDF user writing very large files, and specifically in support of the AR-5 effort.

The only downside is that, if you open up a file with many such variables, and you have very little memory on your machine, you will run out of memory.

Posted by $entry.creator.screenName [ Comments [3] ]