The Point of All These Tests

It's all about finding a good set of default chunk sizes for netCDF-4.1

Tests seem to be indicating that, for the 3D data, a chunk size of 32 or 64 for the unlimited dimension provides a good trade-off in performance for time series and time step reads, without inflating the file size too much.

This makes intuitive sense as well. Larger chunk sizes mean that any left over chunks (i.e. chunks that are only partially filled with data) are going to take up more space on the disk and make the file bigger.

Here's some numbers from the latest tests. The top test is the netCDF classic format case. These are the time step reads.

cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0     0     0     0.0       0       0       35974           3125
32    64    128   1.0       0       0       261893          2931
32    64    256   1.0       0       0       132380           3563
32    128   128   1.0       0       0       151692           3657
32    128   256   1.0       0       0       8063             2219
64    64    128   1.0      0       0       133339           4264
64    64    256   1.0       0       0       28208            3359
64    128   128   1.0       0       0       27536            3051
64    128   256   1.0       0       0       110620           2043

Here are the time series reads:

cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
0     0     0     0.0       0       0       3257952          8795
32    64    128   1.0       0       0       1427863          15069
32    64    256   1.0       0       0       2219838          4394
32    128  128   1.0       0       0       2054724          4668
32    128   256   1.0       0       0       3335330          4347
64    64    128   1.0       0       0       1041324          3581
64    64    256   1.0       0       0       1893643          2995
64    128   128   1.0       0       0       1942810         3024
64    128   256   1.0       0       0       3210923         3975

For the time series test, we see that smaller chunk sizes for the horizontal dimensions work better, and larger chunk sizes for the time dimension work better.

For the horizontal read we see that larger chunk sizes for the horizontal dimensions work better, and small chunk sizes along the time dimension.

Maybe the answer *is* to go with the current default scheme, but just make the sizes of the chunks that it writes much bigger.

I would really like 64 x 64 x 128 for the data above, except for the (possibly spurious) high value for the first horizontal read in that case.

Narrowing In On Correct Chunksizes For the 3D AR-4 Data

We're getting there...

It seems clear that Quincey's original advice is good: use large, squarish chunks.

My former scheme of default chunk sizes worked not terribly for the innermost dimensions (it used the full length of the dimension), but the use of a chunksize of 1 for unlimited dimensions was a bad one for read performance.

Here's some read numbers for chunksizes in what I believe is the correct range of chunksizes:

cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0     0     0     0.0       0       0       7087             1670
64    128   256   1.0       0       0       510              1549
128   128   256   1.0       0       0       401              1688
256   128   256   1.0       0       0       384              1679
64    128   256   1.0       1       0       330548           211382
128   128   256   1.0       1       0       618035           420617

Note that the last two are deflated versions of the data, and are 1000 times slower to read as a result.

The first line is the netCDF classic file. The non-deflated HDF5 files easily beat the read performance of the classic file, probably because the HDF5 files are in native endianness and the netCDF classic file has to be converted from big-endian to little-endian for this platform.

What is odd is that the HDF5 files have a higher average read time than their first read time. I don't get that. I expected that the first read would always be the longest wait, but once you started, subsequent reads would be faster. But not for these uncompressed HDF5 files. I am clearing the cache between each read.

Here's my timing code:

 /* Read the data variable in horizontal slices. */
    start[0] = 0;
    start[1] = 0;
    start[2] = 0;
    count[0] = 1;
    count[1] = LAT_LEN;
    count[2] = LON_LEN;

    /* Read (and time) the first one. */
    if (gettimeofday(&start_time, NULL)) ERR;
    if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
    if (gettimeofday(&end_time, NULL)) ERR;
    if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
    read_1_us = (int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec;

    /* Read (and time) all the rest. */
    if (gettimeofday(&start_time, NULL)) ERR;
    for (start[0] = 1; start[0] < TIME_LEN; start[0]++)
       if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
    if (gettimeofday(&end_time, NULL)) ERR;
    if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
    avg_read_us = ((int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec +
                   read_1_us) / TIME_LEN; 

File Size and Chunking in NetCDF-4 on AR-4 Data File

Trying to pick chunksizes can be hard!

chunk sizes     Size Difference (bytes)
1_128_128     0.33
1_128_256     0.25
1_128_32     0.86
1_16_128      1.56
1_16_256     0.86
1_16_32      5.75
1_64_128     0.51
1_64_256      0.33
1_64_32      1.56
10_128_128      0.18
10_128_256     0.17
10_128_32     0.23
10_16_128      0.3
10_16_256     0.23
10_16_32      0.72
10_64_128      0.2
10_64_256     0.18
10_64_32      0.3
1024_128_128    64.12
1024_128_256    64.12
1024_128_32     64.12
1024_16_128     64.12
1024_16_256     64.12
1024_16_32     64.13
1024_64_128     64.12
1024_64_256     64.12
1024_64_32     64.12
1560_128_128    0.16
1560_128_256    0.16
1560_128_32     0.16
1560_16_128     0.16
1560_16_256     0.16
1560_16_32     0.16
1560_64_128     0.16
1560_64_256     0.16
1560_64_32     0.16
256_128_128     30.57
256_128_256     30.57
256_128_32     30.57
256_16_128     30.58
256_16_256     30.57
256_16_32      30.59
256_64_128      30.57
256_64_256     30.57
256_64_32     30.58
classic     0

NetCDF-4 AR-4 Timeseries Reads and Cache Sizes

Faster time series for the people!

What HDF5 chunk cache sizes are good for reading timeseries data in netCDF-4? I'm sure you have wondered - I know I have. Now we know: .5 to 4 MB. Bigger caches just slow this down. Now that came as a surprise!

The first three numbers are the chunk sizes of the 3 dimensions of the main data variable. The next two columns show the deflate (0 = none) and shuffle filter (0 = none). These are all the same for every run, because the same input file is used for all these runs - only the chunk cache size is changed when (re-)opening the file. The Unix file cache is cleared between each run.

The two times shows are the number of micro-seconds to read a time-series of the data, and the average time to read a time series after all time series are read.

*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
256   128   128   0.5       0       0       1279615          2589
256   128   128   1.0       0       0       1279613          2641
256   128   128   4.0       0       0       1298543          2789
256   128   128   16.0      0       0       1470297          34603
256   128   128   32.0      0       0       1470360          34541

Note that for cache sizes of < 4 MB, the first time series read took 1.2 - 1.3 s, and the average time was .0025 - .0028 s. But when I increased the chunk cache to 16 MB and 32MB, the time for the first read went to 1.5 s, and the avg time for all reads went to .035 s - an order of magnitude jump!

I have repeated these tests a number of times, always with this result for chunk cache buffers 16 MB and above.

I am planning on changing the netCDF-4.1 default to 1 MB, which is the HDF5 default. (I guess we should have listened to the HDF5 team in the first place.)

What Cache Size Should be Used to Read AR-4/AR-5 3D Data?

A question that has puzzled the greatest minds of history...

The not-yet-checked-in script nc_test4/run_bm_cache.sh tests reading a sample 3D data file with different sized caches.

Because of a weird increase in time for horizontal reads for 16MB cache size, I re-ran the test twice more to make sure I got the same results. And I did. No explanation why 16 MB works so poorly.

The current netCDF-4 default cache size is 4MB (which does fine), but I note that the original HDF5 default of 1 MB does even better. Perhaps I should just leave this cache alone as a default choice, and give users the HDF5 settings...

bash-3.2$ ./run_bm_cache.sh
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches... cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us) 256   128   128   0.5       0       0       1291104 256   128   128   1.0       0       0      1298621 256   128   128   4.0       0       0       1306983 256   128   128   16.0      0       0       1472738 256   128   128   32.0      0       0       1497533
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) 256   128   128   0.5       0       0       2308 256   128   128   1.0       0       0       2291 256   128   128   4.0       0       0       2453 256   128   128   16.0      0       0       11609
256   128   128   32.0      0       0       2603

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1290340
256   128   128   1.0       0       0       1281898
256   128   128   4.0       0       0       1306885
256   128   128   16.0      0       0       1470175
256   128   128   32.0      0       0       1497529
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2298
256   128   128   1.0       0       0       2292
256   128   128   4.0       0       0       2335
256   128   128   16.0      0       0       11572
256   128   128   32.0      0       0       1841

SUCCESS!!!

bash-3.2$ ./run_bm_cache.sh 
*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_time_ser(us)
256   128   128   0.5       0       0       1298650
256   128   128   1.0       0       0       1298636
256   128   128   4.0       0       0       1565326
256   128   128   16.0      0       0       1497482
256   128   128   32.0      0       0       1497529

cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
256   128   128   0.5       0       0       2303
256   128   128   1.0       0       0       2287
256   128   128   4.0       0       0       2280
256   128   128   16.0      0       0       11584
256   128   128   32.0      0       0       1830

SUCCESS!!!

Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
  • feed AWIPS (17)
Browse by Topic
« January 2010 »
SunMonTueWedThuFriSat
     
6
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today