On Apr 25, 2011, at 3:42 PM, John Caron wrote:
> On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
>> yes, internal compression. All the files were made from netcdf3 files using
>> NCO with the options:
>>
>> ncks -4 -L 1
>>
>> The results so far show a decrease in file size from 40% of original to
>> 1/100 th of the original file size. If the internally compressed data
>> requests are cached differently than request to netcdf3 files, we want to
>> take that into account when we do the tests, so that we do not just see the
>> affect of differential cacheing.
>>
>> When we have done tests on just local files, the reads where about 8 times
>> slower from a compressed file. But Rich Signell has found that the
>> combination of serialization/bandwidth is the bottleneck, and you hardly
>> notice the difference in a remote access situation. That is what we want to
>> find out, because we run on very little money and with compression as
>> mentioned above our RAIDS would go a lot farther, as long the hit to the
>> access time is not too great.
>>
>> Thanks,
>>
>> -Roy
>
> in netcdf4/hdf5, compression is tied to the chunking. Each chunk is
> individually compressed, and must be completely decompressed to retrieve even
> one value from that chunk. So the trick is to make your chunks correspond to
> your "common cases" of data access. If thats possible, you should find that
> compressed access is faster than non-compressed access, because IO is
> smaller. but it will be highly dependent on that.
John, is there a loss of efficiency when compressing chunks compared to
compressing the entire file? I vaguely recall that for some compression
algorithms, compression efficiency is a function of the volume of data
compressed.
Peter
>
>>
>>
>>
>> On Apr 25, 2011, at 12:28 PM, John Caron wrote:
>>
>>> On 4/25/2011 11:30 AM, Roy Mendelssohn wrote:
>>>> Hi All:
>>>>
>>>> We just converted one or our larger datasets (larger in terms of the
>>>> number of files that are aggregated) into compressed netCDF4. There is a
>>>> substantial savings in storage, but we wanted to do a series of tests to
>>>> see what hit in access time we would take, if any, wsince many of our
>>>> users will make requests involving a lot of time periods.
>>>>
>>>> In order to design these tests properly, we need to get a better
>>>> understanding of how the TDS handles netcdf4 datasets that have
>>>> compression. Are the decompressed data cached, or more accurately cached
>>>> any differently from data read from an uncompressed series of netcdf3
>>>> files, or since the decompression is handled automatically on the read, is
>>>> everything handled the same after that?
>>>>
>>>> We would also be interested other peoples experience with compressed
>>>> netcdf4 files in TDS, in particular when the extracts are not synoptic,
>>>> but cover a lot of time periods in a region, or make a lot of very small
>>>> calls to a large number of time periods - such as we need to do for
>>>> tagging data.
>>>>
>>>> Thanks for any info,
>>>>
>>>> -Roy
>>> Hi Roy:
>>>
>>> I assume you mean internally compressed, not externally (like zipping up a
>>> file) ?
>>>
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe, visit:
>>> http://www.unidata.ucar.edu/mailing_lists/
>> **********************
>> "The contents of this message do not reflect any position of the U.S.
>> Government or NOAA."
>> **********************
>> Roy Mendelssohn
>> Supervisory Operations Research Analyst
>> NOAA/NMFS
>> Environmental Research Division
>> Southwest Fisheries Science Center
>> 1352 Lighthouse Avenue
>> Pacific Grove, CA 93950-2097
>>
>> e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
>> voice: (831)-648-9029
>> fax: (831)-648-8440
>> www: http://www.pfeg.noaa.gov/
>>
>> "Old age and treachery will overcome youth and skill."
>> "From those who have been given much, much will be expected"
>
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe, visit:
> http://www.unidata.ucar.edu/mailing_lists/
--
Peter Cornillon
215 South Ferry Road Telephone: (401)
874-6283
Graduate School of Oceanography Fax: (401) 874-6283
University of Rhode Island Internet:
pcornillon@xxxxxxxxxxx
Narragansett, RI 02882 USA