Re: [thredds] How are compressed netcdf4 files handled in TDS

To: Roy Mendelssohn <Roy.Mendelssohn@xxxxxxxx>
Subject: Re: [thredds] How are compressed netcdf4 files handled in TDS
From: John Caron <caron@xxxxxxxxxxxxxxxx>
Date: Mon, 25 Apr 2011 13:55:33 -0600

On 4/25/2011 1:50 PM, Roy Mendelssohn wrote:

Hi John:

On Apr 25, 2011, at 12:42 PM, John Caron wrote:

On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:

yes, internal compression.  All the files were made from netcdf3 files using 
NCO with the options:

ncks -4 -L 1

The results so far show a decrease in file size from 40% of original to 1/100 
th of the original file size.   If the internally compressed data requests are 
cached differently than request to netcdf3 files, we want to take that into 
account when we do the tests, so that we do not just see the affect of 
differential cacheing.

When we have done tests on just local files, the reads where about  8 times 
slower from a compressed file.  But Rich Signell has found that the combination 
 of serialization/bandwidth is the bottleneck, and you hardly notice the 
difference in a remote access situation.  That is what we want to find out, 
because we run on very little money and with compression as mentioned above our 
RAIDS would go a lot farther, as long the hit to the access time is not too 
great.

Thanks,

-Roy

in netcdf4/hdf5, compression is tied to the chunking. Each chunk is individually 
compressed, and must be completely decompressed to retrieve even one value from that 
chunk. So the trick is to make your chunks correspond to your "common cases" of 
data access. If thats possible, you should find that compressed access is faster than 
non-compressed access, because IO is smaller. but it will be highly dependent on that.


Hi John:

But do you cache the entire chunk that you decompress, or do you toss it?  So 
if I make a second request that has data in that chunk i it saved or is it 
reread from the file.

-Thanks,

-Roy

There is currently no caching by the CDM of netcdf4 internalcompression. The CDM caches small variables ( < 4000 bytes) and theOS/controller will cache disk blocks which would contain the compresseddata. Adding an internal cache for netcdf-4 is probably a good idea.

Follow-Ups:
- Re: [thredds] How are compressed netcdf4 files handled in TDS
  - From: Roy Mendelssohn

References:
- [thredds] How are compressed netcdf4 files handled in TDS
  - From: Roy Mendelssohn
- Re: [thredds] How are compressed netcdf4 files handled in TDS
  - From: John Caron
- Re: [thredds] How are compressed netcdf4 files handled in TDS
  - From: Roy Mendelssohn
- Re: [thredds] How are compressed netcdf4 files handled in TDS
  - From: John Caron
- Re: [thredds] How are compressed netcdf4 files handled in TDS
  - From: Roy Mendelssohn

2011 messages navigation, sorted by:
1. Thread
2. Subject
3. Author
4. Date
5. ↑ Table Of Contents
Search the thredds archives: