2010 Unidata NetCDF Workshop > Formats and Performance
12.5 Using Less Space for Data
Using less disk space for your data can also reduce access time.
- Pack floating-point data into a narrower (and lower precision)
numeric type,
for example pack 32-bit floats into 16-bit shorts or 8-bit bytes using
scale_factor
and add_offset
attributes.
The NCO utilities ncpack or ncpdq will pack
variables into smaller types.
-
Store whole files compressed. This has the
drawback that the whole files must be uncompressed to access even a
small subset of the data.
-
The netCDF Java library uncompresses files with ".Z", ".zip",
".gzip", ".gz", or
".bz2" extensions and caches the expanded file for subsequent reads,
keeping cache size within a specified limit. This works well in
read-only applications like servers.
- Use sparse data structures for sparse grids with lots of
missing data, for example store only array indices and valid values.
Other sparse formats such as CSR (Compressed Sparse Row) may be useful.
NetCDF-4 note:
provides per-variable compression. If you only
access a small amount of data in a large file, only a small part of
the data gets uncompressed.
2010 Unidata NetCDF Workshop > Formats and Performance