On 4/25/2011 1:37 PM, Roy Mendelssohn wrote:
yes, internal compression. All the files were made from netcdf3 files using
NCO with the options:
ncks -4 -L 1
The results so far show a decrease in file size from 40% of original to 1/100
th of the original file size. If the internally compressed data requests are
cached differently than request to netcdf3 files, we want to take that into
account when we do the tests, so that we do not just see the affect of
differential cacheing.
When we have done tests on just local files, the reads where about 8 times
slower from a compressed file. But Rich Signell has found that the combination
of serialization/bandwidth is the bottleneck, and you hardly notice the
difference in a remote access situation. That is what we want to find out,
because we run on very little money and with compression as mentioned above our
RAIDS would go a lot farther, as long the hit to the access time is not too
great.
Thanks,
-Roy
in netcdf4/hdf5, compression is tied to the chunking. Each chunk is
individually compressed, and must be completely decompressed to retrieve
even one value from that chunk. So the trick is to make your chunks
correspond to your "common cases" of data access. If thats possible, you
should find that compressed access is faster than non-compressed access,
because IO is smaller. but it will be highly dependent on that.
On Apr 25, 2011, at 12:28 PM, John Caron wrote:
On 4/25/2011 11:30 AM, Roy Mendelssohn wrote:
Hi All:
We just converted one or our larger datasets (larger in terms of the number of
files that are aggregated) into compressed netCDF4. There is a substantial
savings in storage, but we wanted to do a series of tests to see what hit in
access time we would take, if any, wsince many of our users will make requests
involving a lot of time periods.
In order to design these tests properly, we need to get a better understanding
of how the TDS handles netcdf4 datasets that have compression. Are the
decompressed data cached, or more accurately cached any differently from data
read from an uncompressed series of netcdf3 files, or since the decompression
is handled automatically on the read, is everything handled the same after that?
We would also be interested other peoples experience with compressed netcdf4
files in TDS, in particular when the extracts are not synoptic, but cover a lot
of time periods in a region, or make a lot of very small calls to a large
number of time periods - such as we need to do for tagging data.
Thanks for any info,
-Roy
Hi Roy:
I assume you mean internally compressed, not externally (like zipping up a
file) ?
_______________________________________________
thredds mailing list
thredds@xxxxxxxxxxxxxxxx
For list information or to unsubscribe, visit:
http://www.unidata.ucar.edu/mailing_lists/
**********************
"The contents of this message do not reflect any position of the U.S. Government or
NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"