Nilesh,
Since Netcdf format is a simple matrix of fixed width cells, there is no
simple way to save space by not storing zero values.
I think you are saying that a standard scientific file format is
important to you. Since you have had such good luck with gridded data
in Netcdf, I suggest that you stay with it. Consider these options to
reduce archival file size:
1. Keep your current Netcdf format, but store your files gzip'ed. Make
uncompressing a standard part of opening the file. Many application
languages will allow you to call the shell to gunzip and delete a
temporary file, so you can automate this. gunzip is rather fast, as I
recall. As you stated, your file size is reduced by 99%.
2. Netcdf 16-bit packed format. Reduce file size by 50%. You get 16
bits for your combined precision and dynamic range.
3. Netcdf 8-bit packed format. Reduce file size by 75%. You get 8
bits for your combined precision and dynamic range.
It is possible to write support for a custom, non-Netcdf or
contorted-Netcdf format to efficiently hold sparse data and exclude
zeros. This would be very costly in terms of programming time and lack
of compatibility. I recommend against this, and I say that as one who
has done it the wrong way a few times. ;-)
--Dave Allured
CIRES Climate Diagnostics Center (CDC)
NOAA/ESRL, Physical Sciences Division (PSD)
Nilesh Lahoti wrote:
Dear Sir,
We are air quality modeling group at Rutgers University, New Jersey.
We are processing emissions and running simulation models for our
study of long range transport of Ozone and Particulate matter for our
research and for regulatory work.
The netCDF library works great for us. However, I came across with one
particular issue of netCDF and would like to discuss if there are any
solution to this problem or something that can do to make its
performance better. When we process emissions for our three
dimensional grid of size (172 x 172 x 22) for 24 hours time period
having hourly data, the file size is around 1 gigabyte(GB). There are
several cells that have zero values and therefore the floating point
value for pollutants in netCDF file has zero values. When we use gzip
utility on unix to compress this files, the file size become almost 10
MB which saves us 99% of disk space. Now the question arise that if
the netCDF is most compress scientific format, than is it possible to
suppress this zero values of the floating point variable or is there
any switch that can be used to handle zero values and reduce file size
by any chance.
Looking forward to hear from you.
from,
Nilesh Lahoti
Research Specialist
CCL, EOHSI,
Rutgers University
Email: nilesh@xxxxxxxxxxxxxxxxxxx
Phone: 732-445-1416
===============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
===============================================================================
==============================================================================
To unsubscribe netcdfgroup, visit:
http://www.unidata.ucar.edu/mailing-list-delete-form.html
==============================================================================