Re: [netcdfgroup] Setting special attributes using ncgen

Hi Roy,

> I am using ncgen (version 4) to create a file.  Just out out of the
> box, it creates a netcdf3 file much larger than I care to work with,
> so i want to create the netcf4 with even default compression is fine.
> "man ncgen" tells me:
> 
> Special Attributes
> Special, virtual, attributes can be specified to provide
> performance-related information about the file format and about
> variable properties. The file must be a netCDF-4 file for these to
> take effect.
> 
> These special virtual attributes are not actually part of the file,
> they are merely a convenient way to set miscellaneous properties of
> the data in CDL
> 
> The special attributes currently supported are as follows: ‘_Format’,
> ‘_Fletcher32, ‘_ChunkSizes’, ‘_Endianness’, ‘_DeflateLevel’,
> ‘_Shuffle’, and ‘_Storage’.
> 
> ‘_Format’ is a global attribute specifying the netCDF format variant.
> Its value must be a single string matching one of ‘classic’, ‘64-bit
> offset’, ‘netCDF-4’, or ‘netCDF-4 classic model’.
> 
> The rest of the special attributes are all variable attributes.
> Essentially all of then map to some corresponding ‘nc_def_var_XXX’
> function as defined in the netCDF-4 API. For the atttributes that are
> essentially boolean (_Fletcher32, _Shuffle, and _NOFILL), the value
> true can be specified by using the strings ‘true’ or ‘1’, or by using
> the integer 1. The value false expects either ‘false’, ‘0’, or the
> integer 0. The actions associated with these attributes are as
> follows.
> 
>            1.    ‘_Fletcher32 sets the ‘fletcher32’ property for a
>                  variable. 
>            2.    ‘_Endianness’ is either ‘little’ or ‘big’, depending
>                  on how the variable is stored when first written.
>            3.    ‘_DeflateLevel’ is an integer between 0 and 9
>                  inclusive if compression has been specified for the
>                  variable. 
>            4.    ‘_Shuffle’ specifies if the the shuffle filter should
>                  be used. 
>            5.    ‘_Storage’ is ‘contiguous’ or ‘chunked’.
>            6.    ‘_ChunkSizes’ is a list of chunk sizes for each
>                  dimension of the variable 
> 
> **********************
> I know very little about any of this, and some examples would have been 
> helpful.  my dimensions
> are (time, altitude, latitude, longitude)  and example for one variable with 
> the se dimensions
> to achieve a reasonable amount of compression would be of great help.

The attributes relevant to compression are the deflation level, whether
shuffle is used or not, and less directly the chunk shapes.  Any
variable that is compressed will automatically use chunked storage.

Although you could just use ncgen to edit the attributes before creating
the file, it's easier to use nccopy on the output from ncgen to see the
effect on compression of deflation level, shuffling, and chunk shapes.
If you're going to use the nccopy utility, use the one from version 4.2
or later, which allows setting deflation level, shuffling, and chunking
with command-line options.

As an example, here's how you could test use of deflation level 1, with
shuffling and default chunking, on input infile.nc, with output to a
netCDF-4 classic-model file, outfile.nc:

  nccopy -d 1 -s infile.nc outfile.nc

If you knew that the data at the same altitude varied less than with
other dimensions, you might try specifying that chunks (the unit of
compression and access) each have only one altitude level, which you
could specify with

  nccopy -d 1 -s -c altitude/1 file.nc compressed.nc

Here's the documentation for the -d, -s, and -c options to nccopy (which
currently seem to be out-of-date on our web site):

 -d   n
      Specify deflation level (level of compression) for variable data
      in output.  0 corresponds to no compression and 9 to maximum
      compression, with higher levels of compression requiring
      marginally more time to compress or uncompress than lower levels.
      Compression achieved may also depend on chunking parameters, which
      will use default chunking in the current nccopy implementation.
      If this option is specified for a classic format or 64-bit offset
      format input file, it is not necessary to also specify that the
      output should be netCDF-4 classic model, as that will be the
      default.  If this option is not specified and the input file has
      com‐ pressed variables, the compression will still be preserved in
      the output, using the same chunking as in the input.

      Note that nccopy requires all variables to be compressed using the
      same compression level, but the API has no such restriction.
      With a program you can customize compression for each variable
      independently.

 -s
      Specify shuffling of variable data bytes before compression or
      after decompression.  This option is ignored unless a non-zero
      deflation level is specified.  Turning shuffling on sometimes
      improves compression.

 -c   chunkspec
      Specify chunking (multidimensional tiling) for variable data in
      the output, useful to specify the units of disk access,
      compression, or other filters such as checksums.  The chunkspec
      argument is a string of comma-separated associations, each
      specifying a dimension name, a `/' character, and optionally the
      corresponding chunk length for that dimension.  No blanks should
      appear in the chunkspec string, except possibly escaped blanks
      that are part of a dimension name.  A chunkspec must name at least
      one dimension, and may omit dimensions which are not to be chunked
      or for which the default chunk length is desired.  If a dimension
      name is followed by a `/' character but no subsequent chunk
      length, the actual dimension length is assumed.  If copying a
      classic model file to a netCDF-4 output file and not naming all
      dimensions in the chunkspec, unnamed dimensions will also use the
      actual dimension length for the chunk length.  An example of a
      chunkspec for variables that use the `m' and `n' dimensions might
      be `m/100,n/200' to specify 100 by 200 chunks.  To see the
      chunking resulting from copying with a chunkspec, use the `-s'
      option of ncdump on the output file.

--Russ




  • 2012 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: