Unidata Developer's Blog

Main | Next page »

Showing entries tagged [cache]

Large-Enough Cache Very Important When Reading Compressed NetCDF-4/HDF5 Data

07 January 2010

The HDF5 chunk cache must be large enough to hold an uncompressed chunk.

Here's some test runs showing that a large enough cache is very important when reading compressed data. If the chunk cache is not big enough, then the data have to be deflated again and again.

The first run below uses the default 1MB chunk cache. The second uses a 16 MB cache. Note that the times to read the first time step are comparable, but the run with the large cache has a much lower average time, because each chunk is only uncompressed one time.

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us)   avg_read_hor(us)
64    128   256   1.0       1       0       387147             211280

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 pr_A1_z1_64_128_256.nc -h 
bash-3.2$ -c 16000000 pr_A1_z1_64_128_256.nc
s[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)   avg_read_hor(us)
64   128   256   15.3       1       0       320176             4558

For comparison, here's the time for the netCDF-4/HDF5 file which is not compressed:

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h pr_A1_64_128_256.nc
cs[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)  avg_read_hor(us)
64    128   256   1.0        0       0       459               1466

And here's the same run on the classic netCDF version of the file:

bash-3.2$ sudo ./clear_cache.sh && ./tst_ar4 -h 
bash-3.2$ pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc
cs[0] cs[1] cs[2] cache(MB)  deflate shuffle 1st_read_hor(us)  avg_read_hor(us)
0     0     0     0.0        0       0       2172              1538

So the winner is NetCDF-4/HDF5 for performance, with the best read time for the first time step, and the best average read time. Next comes the netCDF classic file, then the netCDF-4/HDF5 compressed file, which takes two order of magnitude longer than the classic file for the first time step, but then catches up so that the average read time is only 4 time slower than the classic file.

The file sizes show that this read penalty is probably not worth it:

pr_A1.20C3M_8.CCSM.atmm.1870-01_cat_1999-12.nc    204523236
pr_A1_z1_64_128_256.nc                            185543248
pr_A1_64_128_256.nc                               209926962

So the compressed NetCDF-4/HDF5 file saves only 20 MB out of about 200, about 10%.

The uncompressed NetCDF-4/HDF5 file is 5 MB larger than the classic file, or about 2.5% larger.

Posted by $entry.creator.screenName

Email this

Smaller Chunk Sizes For Unlimited Dimension

05 January 2010

More tests...

r_A1_4_64_128.nc pr_A1_8_64_128.nc pr_A1_16_64_128.nc pr_A1_32_64_128.nc \
pr_A1_64_64_128.nc
cs[0] cs[1] cs[2]  cache(MB) deflate shuffle  1st_read_hor(us) avg_read_hor(us)
0     0     0      0.0       0       0        2155             1603
4     64    128    1.0       0       0        7021             1567
8     64    128    1.0       0       0        14084            1538
16    64    128    1.0       0       0        82906            1570
32    64    128    1.0       0       0        145295           2138
64    64    128    1.0       0       0        21960            2825

cs[0] cs[1] cs[2]  cache(MB) deflate shuffle  1st_read_ser(us) avg_read_ser(us)
0     0     0      0.0       0       0        2399157          9181
4     64    128    1.0       0       0        2434194          15954
8     64    128    1.0       0       0        2317802          13627
16    64    128    1.0       0       0        1531121          12686
32    64    128    1.0       0       0        1299189          12265
64    64    128    1.0       0       0        863365           2356

Posted by $entry.creator.screenName

Email this

File Size and Chunking in NetCDF-4 on AR-4 Data File

04 January 2010

Trying to pick chunksizes can be hard!

chunk sizes     Size Difference (bytes)
1_128_128       0.33
1_128_256       0.25
1_128_32        0.86
1_16_128        1.56
1_16_256        0.86
1_16_32         5.75
1_64_128        0.51
1_64_256        0.33
1_64_32         1.56
10_128_128      0.18
10_128_256      0.17
10_128_32       0.23
10_16_128       0.3
10_16_256       0.23
10_16_32        0.72
10_64_128       0.2
10_64_256       0.18
10_64_32        0.3
1024_128_128    64.12
1024_128_256    64.12
1024_128_32     64.12
1024_16_128     64.12
1024_16_256     64.12
1024_16_32      64.13
1024_64_128     64.12
1024_64_256     64.12
1024_64_32      64.12
1560_128_128    0.16
1560_128_256    0.16
1560_128_32     0.16
1560_16_128     0.16
1560_16_256     0.16
1560_16_32      0.16
1560_64_128     0.16
1560_64_256     0.16
1560_64_32      0.16
256_128_128     30.57
256_128_256     30.57
256_128_32      30.57
256_16_128      30.58
256_16_256      30.57
256_16_32       30.59
256_64_128      30.57
256_64_256      30.57
256_64_32       30.58
classic         0

Posted by $entry.creator.screenName

Email this

NetCDF-4 AR-4 Timeseries Reads and Cache Sizes

03 January 2010

Faster time series for the people!

What HDF5 chunk cache sizes are good for reading timeseries data in netCDF-4? I'm sure you have wondered - I know I have. Now we know: .5 to 4 MB. Bigger caches just slow this down. Now that came as a surprise!

The first three numbers are the chunk sizes of the 3 dimensions of the main data variable. The next two columns show the deflate (0 = none) and shuffle filter (0 = none). These are all the same for every run, because the same input file is used for all these runs - only the chunk cache size is changed when (re-)opening the file. The Unix file cache is cleared between each run.

The two times shows are the number of micro-seconds to read a time-series of the data, and the average time to read a time series after all time series are read.

*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
256   128   128   0.5       0       0       1279615          2589
256   128   128   1.0       0       0       1279613          2641
256   128   128   4.0       0       0       1298543          2789
256   128   128   16.0      0       0       1470297          34603
256   128   128   32.0      0       0       1470360          34541

Note that for cache sizes of < 4 MB, the first time series read took 1.2 - 1.3 s, and the average time was .0025 - .0028 s. But when I increased the chunk cache to 16 MB and 32MB, the time for the first read went to 1.5 s, and the avg time for all reads went to .035 s - an order of magnitude jump!

I have repeated these tests a number of times, always with this result for chunk cache buffers 16 MB and above.

I am planning on changing the netCDF-4.1 default to 1 MB, which is the HDF5 default. (I guess we should have listened to the HDF5 team in the first place.)

Posted by $entry.creator.screenName

Email this

What Cache Size Should be Used to Read AR-4/AR-5 3D Data?

03 January 2010

A question that has puzzled the greatest minds of history...

The not-yet-checked-in script nc_test4/run_bm_cache.sh tests reading a sample 3D data file with different sized caches.

Because of a weird increase in time for horizontal reads for 16MB cache size, I re-ran the test twice more to make sure I got the same results. And I did. No explanation why 16 MB works so poorly.

The current netCDF-4 default cache size is 4MB (which does fine), but I note that the original HDF5 default of 1 MB does even better. Perhaps I should just leave this cache alone as a default choice, and give users the HDF5 settings...

bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1291104
256   128   128   1.0       0       0       1298621
256   128   128   4.0       0       0       1306983
256   128   128   16.0      0       0       1472738
256   128   128   32.0      0       0       1497533

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2308
256   128   128   1.0       0       0       2291
256   128   128   4.0       0       0       2453
256   128   128   16.0      0       0       11609
256   128   128   32.0      0       0       2603

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1290340
256   128   128   1.0       0       0       1281898
256   128   128   4.0       0       0       1306885
256   128   128   16.0      0       0       1470175
256   128   128   32.0      0       0       1497529

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2298
256   128   128   1.0       0       0       2292
256   128   128   4.0       0       0       2335
256   128   128   16.0      0       0       11572
256   128   128   32.0      0       0       1841

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1298650
256   128   128   1.0       0       0       1298636
256   128   128   4.0       0       0       1565326
256   128   128   16.0      0       0       1497482
256   128   128   32.0      0       0       1497529

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2303
256   128   128   1.0       0       0       2287
256   128   128   4.0       0       0       2280
256   128   128   16.0      0       0       11584
256   128   128   32.0      0       0       1830

SUCCESS!!!

Posted by $entry.creator.screenName