Unidata Developer's Blog

« Previous page | Main | Next page »

NetCDF-4 Horizontal Data Read Performance with Cache Clearing

03 January 2010

Here are my numbers for doing horizontal reads with different cache sizes.

The times are the time to read each horizontal size, reading all of them.

I realize that reading just one horizontal slice will give different (much higher) times. The reason is that when I read the first horizontal level the various caches along the way will start filling up with the following levels, and then when I read them I get very low times. So reading it this way allows the caching to work. Reading just one horizontal level and stopping the program (to clear cache), will result in the worst case scenario for the caching.

But what should I be optimizing for? Reading all horizontal levels? Or just reading one level?

cs[0]   cs[1]   cs[2]   cache(MB)       deflate shuffle read_hor(us)
0       0       0       0.0             0       0       1527
1       16      32      1.0             0       0       1577
1       16      128     1.0             0       0       1618
1       16      256     1.0             0       0       1515
1       64      32      1.0             0       0       1579
1       64      128     1.0             0       0       1586
1       64      256     1.0             0       0       1584
1       128     32      1.0             0       0       1593
1       128     128     1.0             0       0       1583
1       128     256     1.0             0       0       1571
10      16      32      1.0             0       0       2128
10      16      128     1.0             0       0       2520
10      16      256     1.0             0       0       4309
10      64      32      1.0             0       0       4083
10      64      128     1.0             0       0       1751
10      64      256     1.0             0       0       1713
10      128     32      1.0             0       0       1692
10      128     128     1.0             0       0       1862
10      128     256     1.0             0       0       1749
256     16      32      1.0             0       0       10594
256     16      128     1.0             0       0       3681
256     16      256     1.0             0       0       3074
256     64      32      1.0             0       0       3656
256     64      128     1.0             0       0       3042
256     64      256     1.0             0       0       2773
256     128     32      1.0             0       0       3828
256     128     128     1.0             0       0       2335
256     128     256     1.0             0       0       1581
1024    16      32      1.0             0       0       35622
1024    16      128     1.0             0       0       2759
1024    16      256     1.0             0       0       2912
1024    64      32      1.0             0       0       2875
1024    64      128     1.0             0       0       2868
1024    64      256     1.0             0       0       3816
1024    128     32      1.0             0       0       2780
1024    128     128     1.0             0       0       2558
1024    128     256     1.0             0       0       1628
1560    16      32      1.0             0       0       154450
1560    16      128     1.0             0       0       3063
1560    16      256     1.0             0       0       3700

Posted by $entry.creator.screenName

Email this

NetCDF-4 Performance With Cache Clearning

03 January 2010

Now I have made some changes in my timing program, and I think I am getting better (i.e. more realistic) times.

Firstly, I now clear the cache before each read.

Secondly, I don't try and read the horizontal sections and the timeseries in the same program run - whichever one is done first loads the cache for the other, and gives unrealistic times. Now I time these separately.

OK, so here are some timeseries reads times. The first row in netCDF classic data:

cs[0]   cs[1]   cs[2]   cache(MB)       deflate shuffle read_time_ser(us)
1       16      32      1.0             0       0       2434393
1       16      128     1.0             0       0       2411127
1       16      256     1.0             0       0       2358892
1       64      32      1.0             0       0       2455963
1       64      128     1.0             0       0       2510818
1       64      256     1.0             0       0       2482509
1       128     32      1.0             0       0       2480481
1       128     128     1.0             0       0       2489436
1       128     256     1.0             0       0       2504924
10      16      32      1.0             0       0       1146593
10      16      128     1.0             0       0       1156650
10      16      256     1.0             0       0       1259026
10      64      32      1.0             0       0       1150427
10      64      128     1.0             0       0       2384334
10      64      256     1.0             0       0       2438587
10      128     32      1.0             0       0       1258380
10      128     128     1.0             0       0       2521213
10      128     256     1.0             0       0       2528927
256     16      32      1.0             0       0       174062
256     16      128     1.0             0       0       358613
256     16      256     1.0             0       0       404662
256     64      32      1.0             0       0       400489
256     64      128     1.0             0       0       688528
256     64      256     1.0             0       0       1267521
256     128     32      1.0             0       0       404422
256     128     128     1.0             0       0       1374661
256     128     256     1.0             0       0       2445647
1024    16      32      1.0             0       0       78718
1024    16      128     1.0             0       0       346506
1024    16      256     1.0             0       0       378813
1024    64      32      1.0             0       0       340703
1024    64      128     1.0             0       0       665649
1024    64      256     1.0             0       0       1269936
1024    128     32      1.0             0       0       380796
1024    128     128     1.0             0       0       1269627
1024    128     256     1.0             0       0       2513330
1560    16      32      1.0             0       0       58124
1560    16      128     1.0             0       0       332641
1560    16      256     1.0             0       0       372587
1560    64      32      1.0             0       0       323445
1560    64      128     1.0             0       0       635165
1560    64      256     1.0             0       0       1263225
1560    128     32      1.0             0       0       372226
1560    128     128     1.0             0       0       1265999
1560    128     256     1.0             0       0       2712887

These numbers make more sense. It takes about 2.3 seconds to read the time series from the classic file.

Ed

Posted by $entry.creator.screenName

Email this

Demonstrating Caching and Its Effect on Timing

02 January 2010

The cache can really mess up benchmarking!

For example:

bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h -c
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       66           2102

bash-3.2$ sudo bash clear_cache.sh && ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1859         2324282

In the first run of tst_ar4_3d, with the -c option, the sample data file is first created and then read. The read time for the time series read is really low, because the file (having just been created) is still loaded in a disk cache somewhere in the OS or in the disk hardware.

When I clear the cache and rerun without the -c option, the sample data file is not created, it is assumed to already exist. Since the cache has been cleared, the time series read has to read the data from disk, and it takes 1000 times longer.

Well, that's why they invented disk caches.

This leads me to believe that my horizontal read times are fake too, because first I am doing a time series read, those loading some or all of the file into cache. I need to break that out into a separate test, I see, or perhaps make the order of the two tests controllable from the command line.

Oy, this benchmarking stuff is tricky business! I thought I had found some really good performance for netCDF-4, but now I am not sure. I need to look again more carefully and make sure that I am not being faked out by the caches.

Ed

Posted by $entry.creator.screenName

Email this

Effects of Clearing the Cache on Benchmarks

02 January 2010

How to win friends and influence benchmarks...

I note that I have a shell in my nc_test4 directory, clear_cache.sh. I have to sudo to run it, but when I do, it has a dramatic effect on the time that the time series read takes.

The following uses the new (not yet checked in) test program tst_ar4_3d.c, which seeks to set up a simpler proxy data file for the AR-4 tests. I want to show that a simpler file (but with the same-sized data variable) has similar performance to the slightly more dressed up pr_A1 file from AR-4 that I got from Gary. That's because my simpler file is easier to create in a test program.

bash-3.2$ ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1420         2281847

bash-3.2$ ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       81           3159

bash-3.2$ ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       76           2983

bash-3.2$ sudo bash clear_cache.sh 

bash-3.2$ ./tst_ar4_3d -h 
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
64    256   128   4.0       0       0       1410         2504315

Wow, what a difference a cleared cache makes!

Here's the clear_cache.sh script:

#!/bin/bash -x 
# Clear the disk caches.

sync
echo 3 > /proc/sys/vm/drop_caches

Posted by $entry.creator.screenName

Email this

More Cache Size Benchmarks

31 December 2009

Why does increasing cache size slow down time series access so much?

bash-3.2$ ./tst_ar4 -h pr_A1_256_128_128.nc
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us) read_time_ser(us)
256   128   128   0.5       0       0       217          2773
256   128   128   1.0       0       0       214          1935
256   128   128   4.0       0       0       214          1929
256   128   128   32.0      0       0       160          84440
256   128   128   128.0     0       0       129          82407

Posted by $entry.creator.screenName