2008 Unidata NetCDF Workshop for Developers and Data Providers > Formats and Performance
7.1 Classic File Format
Understanding the netCDF classic format can make clear why
modifying the schema of an existing netCDF file may be expensive.
A netCDF classic or 64-bit offset file is stored in three parts:
- The header, containing information
about dimensions, attributes, and variables (the schema
- The fixed-size data, containing
data values for variables that
don't have an unlimited dimension
- The record data, containing data
values for variables that have an unlimited dimension
By default, the header has almost no extra space; it is just large
enough to contain the dimensions, attributes
(including all attribute values), and variable metadata,
rounded up to a whole number of disk blocks.
- Advantage: netCDF files are compact, with little overhead
- Disadvantage: schema changes may require moving all the
data:
- defining new dimensions, variables, or attributes in an
existing file
- renaming existing dimensions, variables, or attributes with
longer names
- redefining an attribute to have values of a larger type or
more values, such as a longer string value
To avoid copying data when the file schema changes
- Either create all necessary dimensions, variables, and attributes
before writing data, or
- reserve extra space in the file header for later additions
(using
nc__enddef()
in C, NF__ENDDEF()
or
NF90__ENDDEF()
for Fortran, setExtraHeaderBytes() method
of NetcdfFileWritable for Java)
- There is also an NCO program that adds extra space in a
netCDF header, if you forget to do it when the file is
created
NetCDF-4 note:
NetCDF-4 files do not have a
contiguous header for schema, so none of this is necessary for
netCDF-4! Schema additions are efficient.
2008 Unidata NetCDF Workshop for Developers and Data Providers > Formats and Performance