Here are my numbers for doing horizontal reads with different cache sizes.
The times are the time to read each horizontal size, reading all of them.
I realize that reading just one horizontal slice will give different
(much higher) times. The reason is that when I read the first horizontal
level the various caches along the way will start filling up with the
following levels, and then when I read them I get very low times. So
reading it this way allows the caching to work. Reading just one
horizontal level and stopping the program (to clear cache), will result
in the worst case scenario for the caching.
But what should I be optimizing for? Reading all horizontal levels? Or just reading one level?
cs[0] cs[1] cs[2] cache(MB) deflate shuffle read_hor(us)
0 0 0 0.0 0 0 1527
1 16 32 1.0 0 0 1577
1 16 128 1.0 0 0 1618
1 16 256 1.0 0 0 1515
1 64 32 1.0 0 0 1579
1 64 128 1.0 0 0 1586
1 64 256 1.0 0 0 1584
1 128 32 1.0 0 0 1593
1 128 128 1.0 0 0 1583
1 128 256 1.0 0 0 1571
10 16 32 1.0 0 0 2128
10 16 128 1.0 0 0 2520
10 16 256 1.0 0 0 4309
10 64 32 1.0 0 0 4083
10 64 128 1.0 0 0 1751
10 64 256 1.0 0 0 1713
10 128 32 1.0 0 0 1692
10 128 128 1.0 0 0 1862
10 128 256 1.0 0 0 1749
256 16 32 1.0 0 0 10594
256 16 128 1.0 0 0 3681
256 16 256 1.0 0 0 3074
256 64 32 1.0 0 0 3656
256 64 128 1.0 0 0 3042
256 64 256 1.0 0 0 2773
256 128 32 1.0 0 0 3828
256 128 128 1.0 0 0 2335
256 128 256 1.0 0 0 1581
1024 16 32 1.0 0 0 35622
1024 16 128 1.0 0 0 2759
1024 16 256 1.0 0 0 2912
1024 64 32 1.0 0 0 2875
1024 64 128 1.0 0 0 2868
1024 64 256 1.0 0 0 3816
1024 128 32 1.0 0 0 2780
1024 128 128 1.0 0 0 2558
1024 128 256 1.0 0 0 1628
1560 16 32 1.0 0 0 154450
1560 16 128 1.0 0 0 3063
1560 16 256 1.0 0 0 3700