2008 Unidata NetCDF Workshop for Developers and Data Providers > Formats and Performance
7.3 Using Less Space for Data
Using less disk space for your data can also reduce access time.
- Pack floating-point data into a narrower (and lower precision)
numeric type,
for example pack 32-bit floats into 16-bit shorts or 8-bit bytes using
scale_factor
and add_offset
attributes.
The NCO utilities ncpack
or ncpdq
will pack
variables into smaller types.
- Store whole files compressed. This has the
drawback that the whole files must be uncompressed to access even a
small subset of the data.
- The netCDF Java library uncompresses files with ".Z", ".zip",
".gzip", ".gz", or
".bz2" extensions and caches the expanded file for subsequent reads,
keeping cache size within a specified limit. This works well in
read-only applications like servers.
- Use sparse data structures for sparse grids with lots of
missing data, for example store only array indices and valid values.
Other sparse formats such as CSR (Compressed Sparse Row) may be useful.
- With Linux, use a compressed file system in loopback mode to
keep files compressed but access them efficiently.
- See the compressed netCDF library from Cornell, which uses zlib to
compress blocks of data in a classic format netCDF file (but only
works with netCDF version 3.3.1).
NetCDF-4 note:
provides per-variable compression. If you only
access a small amount of data in a large file, only a small part of
the data gets uncompressed.
2008 Unidata NetCDF Workshop for Developers and Data Providers > Formats and Performance