The HDF5 chunk cache must be large enough to hold an uncompressed chunk.
Here's some test runs showing that a large enough cache is very
important when reading compressed data. If the chunk cache is not big
enough, then the data have to be deflated again and again.
The first run below uses the default 1MB chunk cache. The second uses a
16 MB cache. Note that the times to read the first time step are comparable, but the run with the large cache has a much lower average
time, because each chunk is only uncompressed one time.
bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
64 128 256 1.0 1 0 387147 211280
bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h \
bash-3.2$ -c 16000000 pr_A1_z1_64_128_256.nc
s[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
64 128 256 15.3 1 0 320176 4558
For comparison, here's the time for the netCDF-4/HDF5 file which is not compressed:
bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h pr_A1_64_128_256.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
64 128 256 1.0 0 0 459 1466
And here's the same run on the classic netCDF version of the file:
bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h \
bash-3.2$ pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0 0 0 0.0 0 0 2172 1538
So the winner is NetCDF-4/HDF5 for performance, with the best read time
for the first time step, and the best average read time. Next comes the
netCDF classic file, then the netCDF-4/HDF5 compressed file, which takes
two order of magnitude longer than the classic file for the first
time step, but then catches up so that the average read time is only 4
time slower than the classic file.
The file sizes show that this read penalty is probably not worth it:
pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc 204523236
pr_A1_z1_64_128_256.nc 185543248
pr_A1_64_128_256.nc 209926962
So the compressed NetCDF-4/HDF5 file saves only 20 MB out of about 200, about 10%.
The uncompressed NetCDF-4/HDF5 file is 5 MB larger than the classic file, or about 2.5% larger.