Proof New Default Chunk Cache in 4.1 Improves Performance

A last minute change before the 4.1 release ensures that this common case will get good performance.

There is a terrible performance hit if your chunk cache is too small to hold even one chunk, and your data are deflated.

Since the default HDF5 chunk cache size is 1 MB, this is not hard to do.

So I have added code such that, when a file is opened, if the data are compressed, and if the chunksize is greater than the default chunk cache size for that var, then the chunk cache is increased to a multiple of the chunk size.

The code looks like this:

/* Is this a deflated variable with a chunksize greater than the                                                                                               
* current cache size? */
if (!var->contiguous && var->deflate)
{
   chunk_size_bytes = 1;
   for (d = 0; d < var->ndims; d++)
     chunk_size_bytes *= var->chunksizes[d];
   if (var->type_info->size)
     chunk_size_bytes *= var->type_info->size;
   else
     chunk_size_bytes *= sizeof(char *);
#define NC_DEFAULT_NUM_CHUNKS_IN_CACHE 10
#define NC_DEFAULT_MAX_CHUNK_CACHE 67108864
   if (chunk_size_bytes > var->chunk_cache_size)
   {
     var->chunk_cache_size = chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE;
     if (var->chunk_cache_size > NC_DEFAULT_MAX_CHUNK_CACHE)
        var->chunk_cache_size = NC_DEFAULT_MAX_CHUNK_CACHE;
     if ((retval = nc4_reopen_dataset(grp, var)))
        return retval;
   }
}

I am setting the chunk cache to 10 times the chunk size, up to 64 MB max. Reasonable? Comments are welcome.

The timing results show a clear difference. First, two runs without any per-variable caching, but the second run sets a 64MB file level chunk
cache that speeds up timing considerably. (The last number in the row is the average read time for a horizontal layer, in miro-seconds.)

bash-3.2$ ./tst_ar4_3d  pr_A1_z1_256_128_256.nc 
256     128     256     1.0             1       0           836327       850607

bash-3.2$ ./tst_ar4_3d -c 68000000 pr_A1_z1_256_128_256.nc
256     128     256     64.8            1       0           833453       3562

Without the cache it is over 200 times slower.

Now I have turned on automatic variable caches when appropriate:

bash-3.2$ ./tst_ar4_3d  pr_A1_z1_256_128_256.nc 
256     128     256     1.0             1       0           831470       3568

In this run, although no file level cache was turned on, I got the same response time. That's because when opening the file the netCDF library noticed that this deflated var had a chunk size bigger than the default cache size, and opened a bigger cache.

All of this work is in support of the general netCDF user writing very large files, and specifically in support of the AR-5 effort.

The only downside is that, if you open up a file with many such variables, and you have very little memory on your machine, you will run out of memory.

Comments:

I think an improvement would be to make sure the chunk cache is large enough to hold at least one chunk, even in the case that the chunk size turns out to be more than NC_DEFAULT_NUM_CHUNKS_IN_CACHE * NC_DEFAULT_MAX_CHUNK_CACHE.

I suggest the following replacement code:


if (chunk_size_bytes > NC_DEFAULT_MAX_CHUNK_CACHE){
/* big chunks, only room for 1 chunk in cache */
var->chunk_cache_size = chunk_size_bytes;
} else if (chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE > NC_DEFAULT_MAX_CHUNK_CACHE){ /* pretty big chunks, only room for a few in chunk cache */
var->chunk_cache_size = (NC_DEFAULT_MAX_CHUNK_CACHE / chunk_size_bytes) * chunk_size_bytes;
} else {
/* room for the default number of chunks in cache */
var->chunk_cache_size = chunk_size_bytes * NC_DEFAULT_NUM_CHUNKS_IN_CACHE;
}

This would make sure that the performance was reasonable by default for accessing deflated data even in the case of very large chunks.

Posted by Russ Rew on January 15, 2010 at 02:08 AM MST #

Hey, Ed, I have noticed that the memory usage has gone up. What about the case where one chunk is all you need, say, when the chunk size is the same as the limited dimensions? Then multiplying by 10 is a big waste. Can the max cache size just be set to the limited dimensions? Say I have a bunch of variables like this: float var(unlimited, nz, ny, nx) then the max chunk cache should be nz*ny*nx*(size of float), right? -- Ted

Posted by Ted Mansell on March 10, 2010 at 08:28 PM MST #

Howdy Ted!

Sorry it took me so long to reply to your blog message. I didn't have email notification turned on, so I didn't even know anyone was replying! ;-)

For memory usage, first try the latest daily snapshot. Since the 4.1.2-beta2 release I have made major improvements in memory use and removing all memory leaks. Get the snapshot for these fixes:

ftp://ftp.unidata.ucar.edu/pub/netcdf/snapshot/netcdf-4-daily.tar.gz

As for the default cache size, it is just the default. If you want to change it, call nc_set_var_chunk_cache and set the cache to whatever size suits your needs.

Thanks!

Posted by Ed Harnett39 on November 30, 2010 at 12:45 AM MST #

Post a Comment:
Comments are closed for this entry.
Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« December 2024
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    
       
Today