The cache can really mess up benchmarking!
For example:
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us) 64 256 128 4.0 0 0 66 2102
bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64 256 128 4.0 0 0 1859 2324282
In the first run of tst_ar4_3d, with the -c option, the sample data file
is first created and then read. The read time for the time series read
is really low, because the file (having just been created) is still
loaded in a disk cache somewhere in the OS or in the disk hardware.
When I clear the cache and rerun without the -c option, the sample data
file is not created, it is assumed to already exist. Since the cache has
been cleared, the time series read has to read the data from disk, and
it takes 1000 times longer.
Well, that's why they invented disk caches.
This leads me to believe that my horizontal read times are fake too,
because first I am doing a time series read, those loading some or all
of the file into cache. I need to break that out into a separate test, I
see, or perhaps make the order of the two tests controllable from the
command line.
Oy, this benchmarking stuff is tricky business! I thought I had found
some really good performance for netCDF-4, but now I am not sure. I need
to look again more carefully and make sure that I am not being faked
out by the caches.
Ed