Nilesh -- As Dave Allured pointed out, if you want to use a standard
netCDF format, your options are limited.
We were facing a similar dilemma in the need to efficiently store large
amounts of climate data and we opted to create netCDF variant that kept
the data compressed and used an index to uncompress small blocks of the
data, as requested. We have been using it most successfully for over a
decade and most of the Regional Climate Centers' data is stored as
'compressed netCDF'. We generally see a 90-97% reduction in file sizes
with benchmarked access being equal or slightly faster than standard
netCDF files (especially over a network or off slower storage devices).
If you do go this route, you have to realize that you are on your own
and you will have to uncompress any files you want to send to other
researchers.
Given your particular situation, you may be more interested in looking
at some other options:
1. HDF5 is supposed to have a compressed format and an interface
similar to netCDF.
2. If you are using linux, you may be able to use a compressed file
system in loopback mode to keep the netcdfs compressed but access them
using the standard netCDF libraries. This is effectively what my
library modifications do on a per-file basis vs. per-filesystem. This
is probably most effective in a read-only situation.
Just some thoughts.
--Bill Noon
Northeast Regional Climate Center
Cornell University
On Aug 1, 2006, at 12:29 PM, Nilesh Lahoti wrote:
Dear Sir,
We are air quality modeling group at Rutgers University, New Jersey.
We are processing emissions and running simulation models for our
study of long range transport of Ozone and Particulate matter for our
research and for regulatory work.
The netCDF library works great for us. However, I came across with one
particular issue of netCDF and would like to discuss if there are any
solution to this problem or something that can do to make its
performance better. When we process emissions for our three
dimensional grid of size (172 x 172 x 22) for 24 hours time period
having hourly data, the file size is around 1 gigabyte(GB). There are
several cells that have zero values and therefore the floating point
value for pollutants in netCDF file has zero values. When we use gzip
utility on unix to compress this files, the file size become almost 10
MB which saves us 99% of disk space. Now the question arise that if
the netCDF is most compress scientific format, than is it possible to
suppress this zero values of the floating point variable or is there
any switch that can be used to handle zero values and reduce file size
by any chance.
Looking forward to hear from you.
from,
Nilesh Lahoti
Research Specialist
CCL, EOHSI,
Rutgers University
Email: nilesh@xxxxxxxxxxxxxxxxxxx
Phone: 732-445-1416
=======================================================================
=======
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
=======================================================================
=======
==============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================