Narrowing In On Correct Chunksizes For the 3D AR-4 Data

We're getting there...

It seems clear that Quincey's original advice is good: use large, squarish chunks.

My former scheme of default chunk sizes worked not terribly for the innermost dimensions (it used the full length of the dimension), but the use of a chunksize of 1 for unlimited dimensions was a bad one for read performance.

Here's some read numbers for chunksizes in what I believe is the correct range of chunksizes:

cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0     0     0     0.0       0       0       7087             1670
64    128   256   1.0       0       0       510              1549
128   128   256   1.0       0       0       401              1688
256   128   256   1.0       0       0       384              1679
64    128   256   1.0       1       0       330548           211382
128   128   256   1.0       1       0       618035           420617

Note that the last two are deflated versions of the data, and are 1000 times slower to read as a result.

The first line is the netCDF classic file. The non-deflated HDF5 files easily beat the read performance of the classic file, probably because the HDF5 files are in native endianness and the netCDF classic file has to be converted from big-endian to little-endian for this platform.

What is odd is that the HDF5 files have a higher average read time than their first read time. I don't get that. I expected that the first read would always be the longest wait, but once you started, subsequent reads would be faster. But not for these uncompressed HDF5 files. I am clearing the cache between each read.

Here's my timing code:

 /* Read the data variable in horizontal slices. */
    start[0] = 0;
    start[1] = 0;
    start[2] = 0;
    count[0] = 1;
    count[1] = LAT_LEN;
    count[2] = LON_LEN;

    /* Read (and time) the first one. */
    if (gettimeofday(&start_time, NULL)) ERR;
    if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
    if (gettimeofday(&end_time, NULL)) ERR;
    if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
    read_1_us = (int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec;

    /* Read (and time) all the rest. */
    if (gettimeofday(&start_time, NULL)) ERR;
    for (start[0] = 1; start[0] < TIME_LEN; start[0]++)
       if (nc_get_vara_float(ncid, varid, start, count, hor_data)) ERR_RET;
    if (gettimeofday(&end_time, NULL)) ERR;
    if (timeval_subtract(&diff_time, &end_time, &start_time)) ERR;
    avg_read_us = ((int)diff_time.tv_sec * MILLION + (int)diff_time.tv_usec +
                   read_1_us) / TIME_LEN; 

File Size and Chunking in NetCDF-4 on AR-4 Data File

Trying to pick chunksizes can be hard!

chunk sizes     Size Difference (bytes)
1_128_128     0.33
1_128_256     0.25
1_128_32     0.86
1_16_128      1.56
1_16_256     0.86
1_16_32      5.75
1_64_128     0.51
1_64_256      0.33
1_64_32      1.56
10_128_128      0.18
10_128_256     0.17
10_128_32     0.23
10_16_128      0.3
10_16_256     0.23
10_16_32      0.72
10_64_128      0.2
10_64_256     0.18
10_64_32      0.3
1024_128_128    64.12
1024_128_256    64.12
1024_128_32     64.12
1024_16_128     64.12
1024_16_256     64.12
1024_16_32     64.13
1024_64_128     64.12
1024_64_256     64.12
1024_64_32     64.12
1560_128_128    0.16
1560_128_256    0.16
1560_128_32     0.16
1560_16_128     0.16
1560_16_256     0.16
1560_16_32     0.16
1560_64_128     0.16
1560_64_256     0.16
1560_64_32     0.16
256_128_128     30.57
256_128_256     30.57
256_128_32     30.57
256_16_128     30.58
256_16_256     30.57
256_16_32      30.59
256_64_128      30.57
256_64_256     30.57
256_64_32     30.58
classic     0

NetCDF-4 AR-4 Timeseries Reads and Cache Sizes

Faster time series for the people!

What HDF5 chunk cache sizes are good for reading timeseries data in netCDF-4? I'm sure you have wondered - I know I have. Now we know: .5 to 4 MB. Bigger caches just slow this down. Now that came as a surprise!

The first three numbers are the chunk sizes of the 3 dimensions of the main data variable. The next two columns show the deflate (0 = none) and shuffle filter (0 = none). These are all the same for every run, because the same input file is used for all these runs - only the chunk cache size is changed when (re-)opening the file. The Unix file cache is cleared between each run.

The two times shows are the number of micro-seconds to read a time-series of the data, and the average time to read a time series after all time series are read.

*** Benchmarking pr_A1 file pr_A1_256_128_128.nc with various HDF5 chunk caches...
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
256   128   128   0.5       0       0       1279615          2589
256   128   128   1.0       0       0       1279613          2641
256   128   128   4.0       0       0       1298543          2789
256   128   128   16.0      0       0       1470297          34603
256   128   128   32.0      0       0       1470360          34541

Note that for cache sizes of < 4 MB, the first time series read took 1.2 - 1.3 s, and the average time was .0025 - .0028 s. But when I increased the chunk cache to 16 MB and 32MB, the time for the first read went to 1.5 s, and the avg time for all reads went to .035 s - an order of magnitude jump!

I have repeated these tests a number of times, always with this result for chunk cache buffers 16 MB and above.

I am planning on changing the netCDF-4.1 default to 1 MB, which is the HDF5 default. (I guess we should have listened to the HDF5 team in the first place.)

Unidata Developer's Blog
A weblog about software development by Unidata developers*
Unidata Developer's Blog
A weblog about software development by Unidata developers*

Welcome

FAQs

News@Unidata blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
  • feed AWIPS (17)
Browse by Topic
« January 2010 »
SunMonTueWedThuFriSat
     
6
10
11
12
13
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today