On 4/25/2011 2:04 PM, Peter Cornillon wrote:
On Apr 25, 2011, at 3:51 PM, John Caron wrote:
On 4/25/2011 1:46 PM, Peter Cornillon wrote:
On Apr 25, 2011, at 3:42 PM, John Caron wrote:
On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
yes, internal compression. All the files were made from netcdf3
files using NCO with the options:
ncks -4 -L 1
The results so far show a decrease in file size from 40% of
original to 1/100 th of the original file size. If the
internally compressed data requests are cached differently than
request to netcdf3 files, we want to take that into account when
we do the tests, so that we do not just see the affect of
differential cacheing.
When we have done tests on just local files, the reads where about
8 times slower from a compressed file. But Rich Signell has
found that the combination of serialization/bandwidth is the
bottleneck, and you hardly notice the difference in a remote
access situation. That is what we want to find out, because we
run on very little money and with compression as mentioned above
our RAIDS would go a lot farther, as long the hit to the access
time is not too great.
Thanks,
-Roy
in netcdf4/hdf5, compression is tied to the chunking. Each chunk is
individually compressed, and must be completely decompressed to
retrieve even one value from that chunk. So the trick is to make
your chunks correspond to your "common cases" of data access. If
thats possible, you should find that compressed access is faster
than non-compressed access, because IO is smaller. but it will be
highly dependent on that.
John, is there a loss of efficiency when compressing chunks compared
to compressing the entire file? I vaguely recall that for some
compression algorithms, compression efficiency is a function of the
volume of data compressed.
Peter
Hi Peter:
I think dictionary methods such as deflate get better as the file
size goes up, but the tradeoff here is to try to decompress only the
data you actually want. Decompressing very large files can be very
costly.
Yes, this is why I chunk. The reason that I asked the question is that
this might influence the chunk size that one chooses.
yup!