Hi Roy,
> I am using ncgen (version 4) to create a file. Just out out of the
> box, it creates a netcdf3 file much larger than I care to work with,
> so i want to create the netcf4 with even default compression is fine.
> "man ncgen" tells me:
>
> Special Attributes
> Special, virtual, attributes can be specified to provide
> performance-related information about the file format and about
> variable properties. The file must be a netCDF-4 file for these to
> take effect.
>
> These special virtual attributes are not actually part of the file,
> they are merely a convenient way to set miscellaneous properties of
> the data in CDL
>
> The special attributes currently supported are as follows: ‘_Format’,
> ‘_Fletcher32, ‘_ChunkSizes’, ‘_Endianness’, ‘_DeflateLevel’,
> ‘_Shuffle’, and ‘_Storage’.
>
> ‘_Format’ is a global attribute specifying the netCDF format variant.
> Its value must be a single string matching one of ‘classic’, ‘64-bit
> offset’, ‘netCDF-4’, or ‘netCDF-4 classic model’.
>
> The rest of the special attributes are all variable attributes.
> Essentially all of then map to some corresponding ‘nc_def_var_XXX’
> function as defined in the netCDF-4 API. For the atttributes that are
> essentially boolean (_Fletcher32, _Shuffle, and _NOFILL), the value
> true can be specified by using the strings ‘true’ or ‘1’, or by using
> the integer 1. The value false expects either ‘false’, ‘0’, or the
> integer 0. The actions associated with these attributes are as
> follows.
>
> 1. ‘_Fletcher32 sets the ‘fletcher32’ property for a
> variable.
> 2. ‘_Endianness’ is either ‘little’ or ‘big’, depending
> on how the variable is stored when first written.
> 3. ‘_DeflateLevel’ is an integer between 0 and 9
> inclusive if compression has been specified for the
> variable.
> 4. ‘_Shuffle’ specifies if the the shuffle filter should
> be used.
> 5. ‘_Storage’ is ‘contiguous’ or ‘chunked’.
> 6. ‘_ChunkSizes’ is a list of chunk sizes for each
> dimension of the variable
>
> **********************
> I know very little about any of this, and some examples would have been
> helpful. my dimensions
> are (time, altitude, latitude, longitude) and example for one variable with
> the se dimensions
> to achieve a reasonable amount of compression would be of great help.
The attributes relevant to compression are the deflation level, whether
shuffle is used or not, and less directly the chunk shapes. Any
variable that is compressed will automatically use chunked storage.
Although you could just use ncgen to edit the attributes before creating
the file, it's easier to use nccopy on the output from ncgen to see the
effect on compression of deflation level, shuffling, and chunk shapes.
If you're going to use the nccopy utility, use the one from version 4.2
or later, which allows setting deflation level, shuffling, and chunking
with command-line options.
As an example, here's how you could test use of deflation level 1, with
shuffling and default chunking, on input infile.nc, with output to a
netCDF-4 classic-model file, outfile.nc:
nccopy -d 1 -s infile.nc outfile.nc
If you knew that the data at the same altitude varied less than with
other dimensions, you might try specifying that chunks (the unit of
compression and access) each have only one altitude level, which you
could specify with
nccopy -d 1 -s -c altitude/1 file.nc compressed.nc
Here's the documentation for the -d, -s, and -c options to nccopy (which
currently seem to be out-of-date on our web site):
-d n
Specify deflation level (level of compression) for variable data
in output. 0 corresponds to no compression and 9 to maximum
compression, with higher levels of compression requiring
marginally more time to compress or uncompress than lower levels.
Compression achieved may also depend on chunking parameters, which
will use default chunking in the current nccopy implementation.
If this option is specified for a classic format or 64-bit offset
format input file, it is not necessary to also specify that the
output should be netCDF-4 classic model, as that will be the
default. If this option is not specified and the input file has
com‐ pressed variables, the compression will still be preserved in
the output, using the same chunking as in the input.
Note that nccopy requires all variables to be compressed using the
same compression level, but the API has no such restriction.
With a program you can customize compression for each variable
independently.
-s
Specify shuffling of variable data bytes before compression or
after decompression. This option is ignored unless a non-zero
deflation level is specified. Turning shuffling on sometimes
improves compression.
-c chunkspec
Specify chunking (multidimensional tiling) for variable data in
the output, useful to specify the units of disk access,
compression, or other filters such as checksums. The chunkspec
argument is a string of comma-separated associations, each
specifying a dimension name, a `/' character, and optionally the
corresponding chunk length for that dimension. No blanks should
appear in the chunkspec string, except possibly escaped blanks
that are part of a dimension name. A chunkspec must name at least
one dimension, and may omit dimensions which are not to be chunked
or for which the default chunk length is desired. If a dimension
name is followed by a `/' character but no subsequent chunk
length, the actual dimension length is assumed. If copying a
classic model file to a netCDF-4 output file and not naming all
dimensions in the chunkspec, unnamed dimensions will also use the
actual dimension length for the chunk length. An example of a
chunkspec for variables that use the `m' and `n' dimensions might
be `m/100,n/200' to specify 100 by 200 chunks. To see the
chunking resulting from copying with a chunkspec, use the `-s'
option of ncdump on the output file.
--Russ