More tests...
r_A1_4_64_128.nc pr_A1_8_64_128.nc pr_A1_16_64_128.nc pr_A1_32_64_128.nc \
pr_A1_64_64_128.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0 0 0 0.0 0 0 2155 1603
4 64 128 1.0 0 0 7021 1567
8 64 128 1.0 0 0 14084 1538
16 64 128 1.0 0 0 82906 1570
32 64 128 1.0 0 0 145295 2138
64 64 128 1.0 0 0 21960 2825
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
0 0 0 0.0 0 0 2399157 9181
4 64 128 1.0 0 0 2434194 15954
8 64 128 1.0 0 0 2317802 13627
16 64 128 1.0 0 0 1531121 12686
32 64 128 1.0 0 0 1299189 12265
64 64 128 1.0 0 0 863365 2356
Posted by $entry.creator.screenName
It's all about finding a good set of default chunk sizes for netCDF-4.1
Tests seem to be indicating that, for the 3D data, a chunk size
of 32 or 64 for the unlimited dimension provides a good trade-off in
performance for time series and time step reads, without inflating the
file size too much.
This makes intuitive sense as well. Larger chunk sizes mean that any
left over chunks (i.e. chunks that are only partially filled with data)
are going to take up more space on the disk and make the file bigger.
Here's some numbers from the latest tests. The top test is the netCDF classic format case. These are the time step reads.
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_hor(us) avg_read_hor(us)
0 0 0 0.0 0 0 35974 3125
32 64 128 1.0 0 0 261893 2931
32 64 256 1.0 0 0 132380 3563
32 128 128 1.0 0 0 151692 3657
32 128 256 1.0 0 0 8063 2219
64 64 128 1.0 0 0 133339 4264
64 64 256 1.0 0 0 28208 3359
64 128 128 1.0 0 0 27536 3051
64 128 256 1.0 0 0 110620 2043
Here are the time series reads:
cs[0] cs[1] cs[2] cache(MB) deflate shuffle 1st_read_ser(us) avg_read_ser(us)
0 0 0 0.0 0 0 3257952 8795
32 64 128 1.0 0 0 1427863 15069
32 64 256 1.0 0 0 2219838 4394
32 128 128 1.0 0 0 2054724 4668
32 128 256 1.0 0 0 3335330 4347
64 64 128 1.0 0 0 1041324 3581
64 64 256 1.0 0 0 1893643 2995
64 128 128 1.0 0 0 1942810 3024
64 128 256 1.0 0 0 3210923 3975
For the time series test, we see that smaller chunk sizes for the horizontal dimensions work better, and larger chunk sizes for the time dimension work better.
For the horizontal read we see that larger chunk sizes for the horizontal dimensions work better, and small chunk sizes along the time dimension.
Maybe the answer *is* to go with the current default scheme, but just make the sizes of the chunks that it writes much bigger.
I would really like 64 x 64 x 128 for the data above, except for the (possibly spurious) high value for the first horizontal read in that case.
Posted by $entry.creator.screenName