It seems to me that if there were some standard coventions established for
dealing with compressed netCDF variables, that this would be fairly simple.
See below... >>>>
Rich Lysakowski
Digital Equipment Corporation
Laboratory and R&D Systems Engineering
==========================================================================
From: DECWRL::"russ@xxxxxxxxxxxxxxxx" "Russ Rew" 17-Sep-92 11:58
To: RIVERS@xxxxxxxxxxxxxxxxxxx
CC: netcdfgroup@xxxxxxxxxxxxxxxx
Subj: Data compression
Hi
Mark Rivers (rivers@xxxxxxxxxxxxxxxxxxx) asks:
> Are there any plans to add data compression to netCDF? We are strongly
> leaning towards switching from our present (local) data file format to
> netCDF. The only feature which we will have to give up is data
> compression. We are presently using either run-length encoding or a simple
> form of linear predictive coding. Both of these are loss-free. Linear
> predictive coding typically reduces the size of our 32 bit integer data
> files by a factor of 3, which is significant.
>
> It seems like it could be a very worthwhile addition.
We have no plans to add data compression to netCDF (although we do plan to
eventually add a form of data packing previously described on this mailing
list).
Implementing hyperslab access and direct access to individual array values
become considerably more complicated if compression is to be supported.
Consider how you might devise any effective compression scheme if the
elements of an array variable can be filled in any order or as
cross-sections in any direction. NetCDF permits writing elements in one
order and reading them later in differnet orders.
Some compression methods require that all the data to be compressed are
known before starting the compression. Techniques like run-length encoding
or anything that depends on exploiting similarities in nearby values can't
be used if nearby values aren't all known at the time some of the data are to
be written.
An alternative that can be implemented above the netCDF library is to adopt
a convention for compressed data that uses a "compression" attribute to
encode the method of compression, e.g.
x:compression = "rle" ;
for run-length encoding of the data in a variable x. Then when you write
the data, compress them into a bland array of bytes and write all the bytes.
Note that it would be difficult to define the size of such a variable in
advance, since its compressed size depends on its values. You would also
have to give up on hyperslab access for such variables, but instead read the
compressed array in all at once and uncompress it before using it.
--Russ
>>>> How about storing the size of the compressed variable in an attached
>>>> attribute?
>>>>
>>>> That is:
>>>>
>>>> x:compression = "rle" ;
>>>> x:compressed_size = "57000";
>>>> x:compressed_size_unit = "byte";
>>>>
>>>>
>>>> You can get the size of a variable before you compress for the first
>>>> time. And you know the size of it after you compress it.
>>>> Am I off the mark on that one?
>>>>
>>>> If the compressed variable is large, i.e., it won't all fit in memory
>>>> at once, then you will have problems dealing with it on a small
>>>> receiving machine. Russ is right that you couldn't do hyperslab
>>>> access on a compressed variable and would have to read it all in,
>>>> uncompress it, and then do hyperslab access.
>>>>
>>>> We are using conventions for a couple of our variables in the
>>>> analytical laboratory data standards. We don't mind because that is
>>>> what standards are all about.
>>>>
>>>> -- Rich
% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) =====
% Received: by enet-gw.pa.dec.com; id AA01481; Thu, 17 Sep 1992 08:52:12 -0700
% Received: by crl.dec.com; id AA18635; Thu, 17 Sep 1992 11:28:59 -0400
% Received: by unidata.ucar.edu id AA01326 (5.65c/IDA-1.4.4 for
netcdfgroup-send); Thu, 17 Sep 1992 08:45:24 -0600
% Received: from buddy.unidata.ucar.edu by unidata.ucar.edu with SMTP id
AA01319 (5.65c/IDA-1.4.4 for <netcdfgroup@xxxxxxxxxxxxxxxx>); Thu, 17 Sep 1992
08:45:21 -0600
% Organization: .
% Keywords: 199209171445.AA01319
% Received: by buddy.unidata.ucar.edu id AA27573 (5.65b/IDA-1.4.3 for
netcdfgroup@xxxxxxxxxxxxxxxx); Thu, 17 Sep 1992 08:46:21 -0600
% Date: Thu, 17 Sep 1992 08:46:21 -0600
% From: Russ Rew <russ@xxxxxxxxxxxxxxxx>
% Message-Id: <9209171446.AA27573@xxxxxxxxxxxxxxxxxxxxxx>
% To: RIVERS@xxxxxxxxxxxxxxxxxxx
% Cc: netcdfgroup@xxxxxxxxxxxxxxxx
% In-Reply-To: Mark Rivers's message of Wed, 16 Sep 1992 22:38:37 EDT
<920916223837.22800687@xxxxxxxxxxxxxxxxxxx>
% Subject: Data compression