More options for compression.
Dave Allured wrote:
Nilesh,
Since Netcdf format is a simple matrix of fixed width cells, there is no
simple way to save space by not storing zero values.
I think you are saying that a standard scientific file format is
important to you. Since you have had such good luck with gridded data
in Netcdf, I suggest that you stay with it. Consider these options to
reduce archival file size:
1. Keep your current Netcdf format, but store your files gzip'ed. Make
uncompressing a standard part of opening the file. Many application
languages will allow you to call the shell to gunzip and delete a
temporary file, so you can automate this. gunzip is rather fast, as I
recall. As you stated, your file size is reduced by 99%.
The Netcdf-Java 2.2 library looks for ".Z", ".zip", ".gzip", ".gz", or ".bz2"
file extensions, and if found, it will uncompress/unzip, then read from the uncompressed file. It caches the unzipped file, and
can clean up the cache area automatically, deleting older files to keep cache size within a specified limit. The next time the
file is opened, it first looks to see if the uncompressed version exists in the cache.
This works in read-only applications like servers. Writing usually is done once
and we havent tried to optimize that.
2. Netcdf 16-bit packed format. Reduce file size by 50%. You get 16
bits for your combined precision and dynamic range.
3. Netcdf 8-bit packed format. Reduce file size by 75%. You get 8
bits for your combined precision and dynamic range.
If you use the standard attributes "scale_factor" and "add_offset", the
Netcdf-Java 2.2 library will optionally handle the packing in a transparent way, ie promote the
variable to float or double from byte or short, and apply the scale and offset. Again, this is only
on the reading side.
These features are available only to Java applications.