2007 Unidata NetCDF Workshop for Developers and Data Providers > Performance
7.2 Using Less Space for Data
Using less disk space for your data can also reduce access time.
To use less disk space for large numeric arrays:
- Pack data into a narrower (and lower precision) numeric type,
for example pack 32-bit floats into 16-bit shorts or 8-bit bytes using
scale_factor
and add_offset
attributes.
The NCO utilities ncpack
or ncpdq
will pack
variables into narrower types.
- Store whole files compressed. This has the
drawback that the whole files must be uncompressed to access even a
small subset of the data.
- The netCDF Java library uncompresses files with ".Z", ".zip", ".gzip", ".gz", or
".bz2" extensions and caches the expanded file for subsequent reads,
keeping cache size within a specified limit. This works well in
read-only applications like servers.
- Use sparse data structures for sparse grids with lots of
missing data, for example store only array indices and valid values.
Other sparse formats such as CSR (Compressed Sparse Row) may be useful.
- Use HDF5 or netCDF-4 "chunking", a multidimensional tiling of
arrays for accessing data a tile at a time, with each tile compressed
separately. If you only access a small amount of data in
a large file, only a small part of the data gets uncompressed.
With chunking, you can also choose which variables to compress
instead of compressing the entire file.
- Use the compressed netCDF library from Cornell, which uses zlib to
compress blocks of data in a classic format netCDF file.
- With Linux, use a compressed file system in loopback mode to
keep files compressed but access them efficiently.
2007 Unidata NetCDF Workshop for Developers and Data Providers > Performance