CF Section 8.2 compression by gathering has this example (with my mods to make it more readable):
dimensions:
lat=73;
lon=96;
cindex=2381;
depth=4;
variables:
float data(depth, cindex);
int cindex(cindex);
cindex:compress="lat lon";
float depth(depth);
float lat(lat);
float lon(lon);
data:
landpoint=363, 364, 365, ...;
"Since landpoint(0)=363, for instance, we know that data(*,0) maps on to point 363 of the original data with dimensions (lat,lon). This corresponds to indices (3,75), i.e., 363 = 3*96 + 75."
Lets call cindex a compression dimension; it is identified by the compress attribute, which has a list of the dimensions gathered. Equivalently we could store the indices separately, eg use cindex(condex,2), where 0= lat index, 1=lon index, instead of using stride arithmetic (363 = 3*96 + 75) .
Note that the lat, lon coordinates are orthogonal so this is a 2D regular (rectified) grid. The compression just saves storage space.
Note probably worth adding the coordinate attribute to the data variables for clarity and completeness, eg:
float data(depth, cindex);
data:coordinates = "lon lat" ;
A compression dimension logically expands to its list of compressed dimensions, eg:
float data(depth, lat, lon);
with missing values at the places not stored. Thus, we keep the rule "coordinate dims must be subset of data dims".
However, CF Section 5.3 Reduced Horizontal Grid has a different example of Compression by Gathering:
"A "reduced" longitude-latitude grid is one in which the points are arranged along constant latitude lines with the number of points on a latitude line decreasing toward the poles. Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables."
dimensions:
londim = 128 ;
latdim = 64 ;
cindex= 6144 ;
variables:
float data(cindex) ;
data:coordinates = "lon lat" ;
int cindex(cindex);
cindex:compress = "latdim londim";
float lon(cindex) ;
float lat(cindex) ;
"PS(n) is associated with the coordinate values lon(n), lat(n). Compressed grid index (n) would be assigned to 2D index (j,i) (C index conventions) where j = rgrid(n) / 128 and i = rgrid(n) - 128*j".
If we do logical expansion of the compression dimension:
float data(latdim, londim) ;
data:coordinates = "lon lat" ;
int cindex(cindex);
cindex:compress = "latdim londim";
float lon(latdim, londim) ;
float lat(latdim, londim) ;
So CF is trying to deal with 2D lat lon coordinates here. Actually i think the common case for reduced grids is 1D latitude coordinates and variable length 2D longitide coordinates (eg reduced Gaussian Grids). So probably the example should be
dimensions:
londim = 128 ;
latdim = 64 ;
cindex= 6144 ;
variables:
float data(cindex) ;
data:coordinates = "lon lat" ;
int cindex(cindex);
cindex:compress = "latdim londim";
float lon(cindex) ;
float lat(latdim) ;
which is logically expanded to
float data(latdim, londim) ;
data:coordinates = "lon lat" ;
int cindex(cindex);
cindex:compress = "latdim londim";
float lon(latdim, londim) ;
float lat(latdim) ;
The main problem with this is that we've really got a "ragged array", not a 2D rectified grid with missing data.
Since this blog post is getting long, let me continue in part 2.