Re: [thredds] How are compressed netcdf4 files handled in TDS

On Apr 25, 2011, at 3:42 PM, John Caron wrote:

> On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
>> yes, internal compression.  All the files were made from netcdf3 files using 
>> NCO with the options:
>> 
>> ncks -4 -L 1
>> 
>> The results so far show a decrease in file size from 40% of original to 
>> 1/100 th of the original file size.   If the internally compressed data 
>> requests are cached differently than request to netcdf3 files, we want to 
>> take that into account when we do the tests, so that we do not just see the 
>> affect of differential cacheing.
>> 
>> When we have done tests on just local files, the reads where about  8 times 
>> slower from a compressed file.  But Rich Signell has found that the 
>> combination  of serialization/bandwidth is the bottleneck, and you hardly 
>> notice the difference in a remote access situation.  That is what we want to 
>> find out, because we run on very little money and with compression as 
>> mentioned above our RAIDS would go a lot farther, as long the hit to the 
>> access time is not too great.
>> 
>> Thanks,
>> 
>> -Roy
> 
> in netcdf4/hdf5, compression is tied to the chunking. Each chunk is 
> individually compressed, and must be completely decompressed to retrieve even 
> one value from that chunk. So the trick is to make your chunks correspond to 
> your "common cases" of data access. If thats possible, you should find that 
> compressed access is faster than non-compressed access, because IO is 
> smaller. but it will be highly dependent on that.

John, is there a loss of efficiency when compressing chunks compared to 
compressing the entire file? I vaguely recall that for some compression 
algorithms, compression efficiency is a function of the volume of data 
compressed.

Peter

> 
>> 
>> 
>> 
>> On Apr 25, 2011, at 12:28 PM, John Caron wrote:
>> 
>>> On 4/25/2011 11:30 AM, Roy Mendelssohn wrote:
>>>> Hi All:
>>>> 
>>>> We just converted one or our larger datasets  (larger in terms of the 
>>>> number of files that are aggregated) into compressed netCDF4. There is a 
>>>> substantial savings in storage, but we wanted to do a series of tests to 
>>>> see what hit in access time we would take, if any, wsince many of our 
>>>> users will make requests involving a lot of time periods.
>>>> 
>>>> In order to design these tests properly, we need to get a better 
>>>> understanding of how the TDS handles netcdf4 datasets that have 
>>>> compression.  Are the decompressed data cached, or more accurately cached 
>>>> any differently from data read from an uncompressed series of netcdf3 
>>>> files, or since the decompression is handled automatically on the read, is 
>>>> everything handled the same after that?
>>>> 
>>>> We would also be interested other peoples experience with compressed 
>>>> netcdf4 files in TDS, in particular when the extracts are not synoptic, 
>>>> but cover a lot of time periods in a region, or make a lot of very small 
>>>> calls to a large number of time periods  - such as we need to do for 
>>>> tagging data.
>>>> 
>>>> Thanks for any info,
>>>> 
>>>> -Roy
>>> Hi Roy:
>>> 
>>> I assume you mean internally compressed, not externally (like zipping up a 
>>> file) ?
>>> 
>>> _______________________________________________
>>> thredds mailing list
>>> thredds@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit: 
>>> http://www.unidata.ucar.edu/mailing_lists/
>> **********************
>> "The contents of this message do not reflect any position of the U.S. 
>> Government or NOAA."
>> **********************
>> Roy Mendelssohn
>> Supervisory Operations Research Analyst
>> NOAA/NMFS
>> Environmental Research Division
>> Southwest Fisheries Science Center
>> 1352 Lighthouse Avenue
>> Pacific Grove, CA 93950-2097
>> 
>> e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
>> voice: (831)-648-9029
>> fax: (831)-648-8440
>> www: http://www.pfeg.noaa.gov/
>> 
>> "Old age and treachery will overcome youth and skill."
>> "From those who have been given much, much will be expected"
> 
> _______________________________________________
> thredds mailing list
> thredds@xxxxxxxxxxxxxxxx
> For list information or to unsubscribe,  visit: 
> http://www.unidata.ucar.edu/mailing_lists/ 

--
Peter Cornillon
  215 South Ferry Road                                     Telephone: (401) 
874-6283
   Graduate School of Oceanography                          Fax: (401) 874-6283
    University of Rhode Island                                 Internet: 
pcornillon@xxxxxxxxxxx
     Narragansett, RI 02882   USA


  • 2011 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the thredds archives: