On Apr 25, 2011, at 3:51 PM, John Caron wrote:
> On 4/25/2011 1:46 PM, Peter Cornillon wrote:
>>
>> On Apr 25, 2011, at 3:42 PM, John Caron wrote:
>>
>>> On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
>>>> yes, internal compression. All the files were made from netcdf3 files
>>>> using NCO with the options:
>>>>
>>>> ncks -4 -L 1
>>>>
>>>> The results so far show a decrease in file size from 40% of original to
>>>> 1/100 th of the original file size. If the internally compressed data
>>>> requests are cached differently than request to netcdf3 files, we want to
>>>> take that into account when we do the tests, so that we do not just see
>>>> the affect of differential cacheing.
>>>>
>>>> When we have done tests on just local files, the reads where about 8
>>>> times slower from a compressed file. But Rich Signell has found that the
>>>> combination of serialization/bandwidth is the bottleneck, and you hardly
>>>> notice the difference in a remote access situation. That is what we want
>>>> to find out, because we run on very little money and with compression as
>>>> mentioned above our RAIDS would go a lot farther, as long the hit to the
>>>> access time is not too great.
>>>>
>>>> Thanks,
>>>>
>>>> -Roy
>>>
>>> in netcdf4/hdf5, compression is tied to the chunking. Each chunk is
>>> individually compressed, and must be completely decompressed to retrieve
>>> even one value from that chunk. So the trick is to make your chunks
>>> correspond to your "common cases" of data access. If thats possible, you
>>> should find that compressed access is faster than non-compressed access,
>>> because IO is smaller. but it will be highly dependent on that.
>>
>> John, is there a loss of efficiency when compressing chunks compared to
>> compressing the entire file? I vaguely recall that for some compression
>> algorithms, compression efficiency is a function of the volume of data
>> compressed.
>>
>> Peter
>>
>
> Hi Peter:
>
> I think dictionary methods such as deflate get better as the file size goes
> up, but the tradeoff here is to try to decompress only the data you actually
> want. Decompressing very large files can be very costly.
Yes, this is why I chunk. The reason that I asked the question is that this
might influence the chunk size that one chooses.
Peter
>
> John
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
--
Peter Cornillon
215 South Ferry Road Telephone: (401)
874-6283
Graduate School of Oceanography Fax: (401) 874-6283
University of Rhode Island Internet:
pcornillon@xxxxxxxxxxx
Narragansett, RI 02882 USA