« Effects of Clearing... | Main | NetCDF-4 Performance... »

Demonstrating Caching and Its Effect on Timing

02 January 2010

The cache can really mess up benchmarking!

For example:

bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       66           2102

bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1859         2324282

In the first run of tst_ar4_3d, with the -c option, the sample data file is first created and then read. The read time for the time series read is really low, because the file (having just been created) is still loaded in a disk cache somewhere in the OS or in the disk hardware.

When I clear the cache and rerun without the -c option, the sample data file is not created, it is assumed to already exist. Since the cache has been cleared, the time series read has to read the data from disk, and it takes 1000 times longer.

Well, that's why they invented disk caches.

This leads me to believe that my horizontal read times are fake too, because first I am doing a time series read, those loading some or all of the file into cache. I need to break that out into a separate test, I see, or perhaps make the order of the two tests controllable from the command line.

Oy, this benchmarking stuff is tricky business! I thought I had found some really good performance for netCDF-4, but now I am not sure. I need to look again more carefully and make sure that I am not being faked out by the caches.

Ed

Posted by $entry.creator.screenName