Hi Russ:
Thanks. But the reason I don't want to use nccopy, but want to use ncgen from
the start, is that the netcdf3 version of the file will be about 4Gb. What I
did is create a small version of the file, used NCO to convert to netcdf4 using
settings I commonly used, and then using ncdump -s was able to look at the
special attributes created, added them to my cdl, used ncgen and it worked
quite nicely, especially since I am creating an empty array that will be filled
as the output of my model runs, so the default netcdf4 files is miniscule.
Mainly I don't know enough about the different settings and what they do to the
file to know what options to put. I know these settings are more from HDF5
then you guys, but is there an idiot's guide to this for those of us who know
little about it but are finding it to be of great use. I know Rich Signell has
done a little bit of timings on the tradeoffs between size and read time for
the various settings, do you know of others who have? We are pretty much
interesting in reads, and using a lot of separate files aggregated in THREDDS,
where the user will want a time series or time series from a region. Using
NCO notation if we go "ncks -4 -L 1 -O " then we get really good compression
but relatively slow reads, and if we go "ncks -O -4 -L 1 --cnk_dmn lat,16
--cnk_dmn lon,16" the reads are right up there but the compression is so-so.
Are there any good guidelines for this, or just trial and error.
This is becoming a very important issue for us (assuming we are still around
next year) because we are starting to serve out a lot of 1km data. Storing in
netcdf4 saves us a huge amount of space, but we don't want a huge speed hit on
the reads.
Thanks.
-Roy
On Apr 8, 2012, at 3:43 PM, Russ Rew wrote:
> Hi Roy,
>
>> I am using ncgen (version 4) to create a file. Just out out of the
>> box, it creates a netcdf3 file much larger than I care to work with,
>> so i want to create the netcf4 with even default compression is fine.
>> "man ncgen" tells me:
>>
>> Special Attributes
>> Special, virtual, attributes can be specified to provide
>> performance-related information about the file format and about
>> variable properties. The file must be a netCDF-4 file for these to
>> take effect.
>>
>> These special virtual attributes are not actually part of the file,
>> they are merely a convenient way to set miscellaneous properties of
>> the data in CDL
>>
>> The special attributes currently supported are as follows: ‘_Format’,
>> ‘_Fletcher32, ‘_ChunkSizes’, ‘_Endianness’, ‘_DeflateLevel’,
>> ‘_Shuffle’, and ‘_Storage’.
>>
>> ‘_Format’ is a global attribute specifying the netCDF format variant.
>> Its value must be a single string matching one of ‘classic’, ‘64-bit
>> offset’, ‘netCDF-4’, or ‘netCDF-4 classic model’.
>>
>> The rest of the special attributes are all variable attributes.
>> Essentially all of then map to some corresponding ‘nc_def_var_XXX’
>> function as defined in the netCDF-4 API. For the atttributes that are
>> essentially boolean (_Fletcher32, _Shuffle, and _NOFILL), the value
>> true can be specified by using the strings ‘true’ or ‘1’, or by using
>> the integer 1. The value false expects either ‘false’, ‘0’, or the
>> integer 0. The actions associated with these attributes are as
>> follows.
>>
>> 1. ‘_Fletcher32 sets the ‘fletcher32’ property for a
>> variable.
>> 2. ‘_Endianness’ is either ‘little’ or ‘big’, depending
>> on how the variable is stored when first written.
>> 3. ‘_DeflateLevel’ is an integer between 0 and 9
>> inclusive if compression has been specified for the
>> variable.
>> 4. ‘_Shuffle’ specifies if the the shuffle filter should
>> be used.
>> 5. ‘_Storage’ is ‘contiguous’ or ‘chunked’.
>> 6. ‘_ChunkSizes’ is a list of chunk sizes for each
>> dimension of the variable
>>
>> **********************
>> I know very little about any of this, and some examples would have been
>> helpful. my dimensions
>> are (time, altitude, latitude, longitude) and example for one variable with
>> the se dimensions
>> to achieve a reasonable amount of compression would be of great help.
>
> The attributes relevant to compression are the deflation level, whether
> shuffle is used or not, and less directly the chunk shapes. Any
> variable that is compressed will automatically use chunked storage.
>
> Although you could just use ncgen to edit the attributes before creating
> the file, it's easier to use nccopy on the output from ncgen to see the
> effect on compression of deflation level, shuffling, and chunk shapes.
> If you're going to use the nccopy utility, use the one from version 4.2
> or later, which allows setting deflation level, shuffling, and chunking
> with command-line options.
>
> As an example, here's how you could test use of deflation level 1, with
> shuffling and default chunking, on input infile.nc, with output to a
> netCDF-4 classic-model file, outfile.nc:
>
> nccopy -d 1 -s infile.nc outfile.nc
>
> If you knew that the data at the same altitude varied less than with
> other dimensions, you might try specifying that chunks (the unit of
> compression and access) each have only one altitude level, which you
> could specify with
>
> nccopy -d 1 -s -c altitude/1 file.nc compressed.nc
>
> Here's the documentation for the -d, -s, and -c options to nccopy (which
> currently seem to be out-of-date on our web site):
>
> -d n
> Specify deflation level (level of compression) for variable data
> in output. 0 corresponds to no compression and 9 to maximum
> compression, with higher levels of compression requiring
> marginally more time to compress or uncompress than lower levels.
> Compression achieved may also depend on chunking parameters, which
> will use default chunking in the current nccopy implementation.
> If this option is specified for a classic format or 64-bit offset
> format input file, it is not necessary to also specify that the
> output should be netCDF-4 classic model, as that will be the
> default. If this option is not specified and the input file has
> com‐ pressed variables, the compression will still be preserved in
> the output, using the same chunking as in the input.
>
> Note that nccopy requires all variables to be compressed using the
> same compression level, but the API has no such restriction.
> With a program you can customize compression for each variable
> independently.
>
> -s
> Specify shuffling of variable data bytes before compression or
> after decompression. This option is ignored unless a non-zero
> deflation level is specified. Turning shuffling on sometimes
> improves compression.
>
> -c chunkspec
> Specify chunking (multidimensional tiling) for variable data in
> the output, useful to specify the units of disk access,
> compression, or other filters such as checksums. The chunkspec
> argument is a string of comma-separated associations, each
> specifying a dimension name, a `/' character, and optionally the
> corresponding chunk length for that dimension. No blanks should
> appear in the chunkspec string, except possibly escaped blanks
> that are part of a dimension name. A chunkspec must name at least
> one dimension, and may omit dimensions which are not to be chunked
> or for which the default chunk length is desired. If a dimension
> name is followed by a `/' character but no subsequent chunk
> length, the actual dimension length is assumed. If copying a
> classic model file to a netCDF-4 output file and not naming all
> dimensions in the chunkspec, unnamed dimensions will also use the
> actual dimension length for the chunk length. An example of a
> chunkspec for variables that use the `m' and `n' dimensions might
> be `m/100,n/200' to specify 100 by 200 chunks. To see the
> chunking resulting from copying with a chunkspec, use the `-s'
> option of ncdump on the output file.
>
> --Russ
>
>
**********************
"The contents of this message do not reflect any position of the U.S.
Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
1352 Lighthouse Avenue
Pacific Grove, CA 93950-2097
e-mail: Roy.Mendelssohn@xxxxxxxx (Note new e-mail address)
voice: (831)-648-9029
fax: (831)-648-8440
www: http://www.pfeg.noaa.gov/
"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.