2007 Unidata NetCDF Workshop for Developers and Data Providers > Performance
7.1 NetCDF-3 File Format
Understanding the netCDF format can make clear
why some netCDF operations are more expensive than others
A netCDF classic or 64-bit offset file is stored in three parts:
- The header, containing information about dimensions,
attributes, and variables
- The fixed-size data, containing
data values for variables that
don't have an unlimited dimension
- The record data, containing data
values for variables that have an unlimited dimension
By default, the header has almost no extra space; it is just large
enough to contain the dimensions, variables, and attributes
(including all the attribute values) rounded up to a whole number of disk blocks.
- Advantage: netCDF files are compact, requiring very little
overhead
- Disadvantage: operations that require the header to grow
(such as adding new dimensions or variables) requires moving all the
data by copying it
To avoid copying data when the file schema changes
- Either create all necessary dimensions, variables, and attributes
before writing data, or
- reserve extra space in the file header for later additions
(using
nc__enddef()
in C, NF__ENDDEF()
or
NF90__ENDDEF()
for Fortran, not available in Java
interface)
- There is also an NCO program that adds extra space in a
netCDF header, if you forget to do it when the file is
created.
- NetCDF-4 files do not have a contiguous header for metadata, so
none of this is necessary for netCDF-4.
2007 Unidata NetCDF Workshop for Developers and Data Providers > Performance