The Point of All These Tests
05 January 2010
It's all about finding a good set of default chunk sizes for netCDF-4.1
Tests seem to be indicating that, for the 3D data, a chunk size
of 32 or 64 for the unlimited dimension provides a good trade-off in
performance for time series and time step reads, without inflating the
file size too much.
This makes intuitive sense as well. Larger chunk sizes mean that any
left over chunks (i.e. chunks that are only partially filled with data)
are going to take up more space on the disk and make the file bigger.
Here's some numbers from the latest tests. The top test is the netCDF classic format case. These are the time step reads.
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0 0 0 0.0 0 0 35974 3125
32 64 128 1.0 0 0 261893 2931
32 64 256 1.0 0 0 132380 3563
32 128 128 1.0 0 0 151692 3657
32 128 256 1.0 0 0 8063 2219
64 64 128 1.0 0 0 133339 4264
64 64 256 1.0 0 0 28208 3359
64 128 128 1.0 0 0 27536 3051
64 128 256 1.0 0 0 110620 2043
Here are the time series reads:
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
0 0 0 0.0 0 0 3257952 8795
32 64 128 1.0 0 0 1427863 15069
32 64 256 1.0 0 0 2219838 4394
32 128 128 1.0 0 0 2054724 4668
32 128 256 1.0 0 0 3335330 4347
64 64 128 1.0 0 0 1041324 3581
64 64 256 1.0 0 0 1893643 2995
64 128 128 1.0 0 0 1942810 3024
64 128 256 1.0 0 0 3210923 3975
For the time series test, we see that smaller chunk sizes for the horizontal dimensions work better, and larger chunk sizes for the time dimension work better.
For the horizontal read we see that larger chunk sizes for the horizontal dimensions work better, and small chunk sizes along the time dimension.
Maybe the answer *is* to go with the current default scheme, but just make the sizes of the chunks that it writes much bigger.
I would really like 64 x 64 x 128 for the data above, except for the (possibly spurious) high value for the first horizontal read in that case.
Posted by $entry.creator.screenName
Narrowing In On Correct Chunksizes For the 3D AR-4 Data
04 January 2010
We're getting there...
It seems clear that Quincey's original advice is good: use large, squarish chunks.
My former scheme of default chunk sizes worked not terribly for the
innermost dimensions (it used the full length of the dimension), but the
use of a chunksize of 1 for unlimited dimensions was a bad one for read
performance.
Here's some read numbers for chunksizes in what I believe is the correct range of chunksizes:
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0 0 0 0.0 0 0 7087 1670
64 128 256 1.0 0 0 510 1549
128 128 256 1.0 0 0 401 1688
256 128 256 1.0 0 0 384 1679
64 128 256 1.0 1 0 330548 211382
128 128 256 1.0 1 0 618035 420617
Note that the last two are deflated versions of the data, and are 1000 times slower to read as a result.
The first line is the netCDF classic file. The non-deflated HDF5 files
easily beat the read performance of the classic file, probably because
the HDF5 files are in native endianness and the netCDF classic file has
to be converted from big-endian to little-endian for this platform.
What is odd is that the HDF5 files have a higher average read time than
their first read time. I don't get that. I expected that the first read
would always be the longest wait, but once you started, subsequent reads
would be faster. But not for these uncompressed HDF5 files. I am clearing the cache between each read.
Here's my timing code:
/* Read the data variable in horizontal slices. */
start[0] = 0;
start[1] = 0;
start[2] = 0;
count[0] = 1;
count[1] = LAT_LEN;
count[2] = LON_LEN;
/* Read (and time) the first one. */
if (gettimeofday(&start_time, NULL)) ERR;
if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
if (gettimeofday(&end_time, NULL)) ERR;
if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
read_1_us = (int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec;
/* Read (and time) all the rest. */
if (gettimeofday(&start_time, NULL)) ERR;
for (start[0] = 1; start[0] < TIME_LEN; start[0]++)
if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
if (gettimeofday(&end_time, NULL)) ERR;
if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
avg_read_us = ((int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec +
read_1_us) / TIME_LEN;
Posted by $entry.creator.screenName
File Size and Chunking in NetCDF-4 on AR-4 Data File
04 January 2010
Trying to pick chunksizes can be hard!
chunk sizes Size Difference (bytes)
1_128_128 0.33
1_128_256 0.25
1_128_32 0.86
1_16_128 1.56
1_16_256 0.86
1_16_32 5.75
1_64_128 0.51
1_64_256 0.33
1_64_32 1.56
10_128_128 0.18
10_128_256 0.17
10_128_32 0.23
10_16_128 0.3
10_16_256 0.23
10_16_32 0.72
10_64_128 0.2
10_64_256 0.18
10_64_32 0.3
1024_128_128 64.12
1024_128_256 64.12
1024_128_32 64.12
1024_16_128 64.12
1024_16_256 64.12
1024_16_32 64.13
1024_64_128 64.12
1024_64_256 64.12
1024_64_32 64.12
1560_128_128 0.16
1560_128_256 0.16
1560_128_32 0.16
1560_16_128 0.16
1560_16_256 0.16
1560_16_32 0.16
1560_64_128 0.16
1560_64_256 0.16
1560_64_32 0.16
256_128_128 30.57
256_128_256 30.57
256_128_32 30.57
256_16_128 30.58
256_16_256 30.57
256_16_32 30.59
256_64_128 30.57
256_64_256 30.57
256_64_32 30.58
classic 0
Posted by $entry.creator.screenName
NetCDF-4 AR-4 Timeseries Reads and Cache Sizes
04 January 2010
Faster time series for the people!
What HDF5 chunk cache sizes are good for reading timeseries data
in netCDF-4? I'm sure you have wondered - I know I have. Now we know:
.5 to 4 MB. Bigger caches just slow this down. Now that came as a
surprise!
The first three numbers are the chunk sizes of the 3 dimensions of the
main data variable. The next two columns show the deflate (0 = none) and
shuffle filter (0 = none). These are all the same for every run,
because the same input file is used for all these runs - only the chunk
cache size is changed when (re-)opening the file. The Unix file cache is
cleared between each run.
The two times shows are the number of micro-seconds to read a
time-series of the data, and the average time to read a time series
after all time series are read.
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
256 128 128 0.5 0 0 1279615 2589
256 128 128 1.0 0 0 1279613 2641
256 128 128 4.0 0 0 1298543 2789
256 128 128 16.0 0 0 1470297 34603
256 128 128 32.0 0 0 1470360 34541
Note that for cache sizes of < 4 MB, the first time series read took
1.2 - 1.3 s, and the average time was .0025 - .0028 s. But when I
increased the chunk cache to 16 MB and 32MB, the time for the first read
went to 1.5 s, and the avg time for all reads went to .035 s - an order
of magnitude jump!
I have repeated these tests a number of times, always with this result for chunk cache buffers 16 MB and above.
I am planning on changing the netCDF-4.1 default to 1 MB, which is the
HDF5 default. (I guess we should have listened to the HDF5 team in the
first place.)
Posted by $entry.creator.screenName
What Cache Size Should be Used to Read AR-4/AR-5 3D Data?
03 January 2010
A question that has puzzled the greatest minds of history...
The not-yet-checked-in script nc_test4/run_bm_cache.sh tests reading a sample 3D data file with different sized caches.
Because of a weird increase in time for horizontal reads for 16MB cache
size, I re-ran the test twice more to make sure I got the same results.
And I did. No explanation why 16 MB works so poorly.
The current netCDF-4 default cache size is 4MB (which does fine), but I
note that the original HDF5 default of 1 MB does even better. Perhaps I
should just leave this cache alone as a default choice, and give users
the HDF5 settings...
bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256 128 128 0.5 0 0 1291104
256 128 128 1.0 0 0 1298621
256 128 128 4.0 0 0 1306983
256 128 128 16.0 0 0 1472738
256 128 128 32.0 0 0 1497533
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256 128 128 0.5 0 0 2308
256 128 128 1.0 0 0 2291
256 128 128 4.0 0 0 2453
256 128 128 16.0 0 0 11609
256 128 128 32.0 0 0 2603
SUCCESS!!!
bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256 128 128 0.5 0 0 1290340
256 128 128 1.0 0 0 1281898
256 128 128 4.0 0 0 1306885
256 128 128 16.0 0 0 1470175
256 128 128 32.0 0 0 1497529
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256 128 128 0.5 0 0 2298
256 128 128 1.0 0 0 2292
256 128 128 4.0 0 0 2335
256 128 128 16.0 0 0 11572
256 128 128 32.0 0 0 1841
SUCCESS!!!
bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256 128 128 0.5 0 0 1298650
256 128 128 1.0 0 0 1298636
256 128 128 4.0 0 0 1565326
256 128 128 16.0 0 0 1497482
256 128 128 32.0 0 0 1497529
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256 128 128 0.5 0 0 2303
256 128 128 1.0 0 0 2287
256 128 128 4.0 0 0 2280
256 128 128 16.0 0 0 11584
256 128 128 32.0 0 0 1830
SUCCESS!!!
Posted by $entry.creator.screenName