Main | CF Ragged Arrays »

11 October 2008

CF Section 8.2 compression by gathering has this example (with my mods to make it more readable):

dimensions:
  lat=73;
  lon=96;
  cindex=2381;
  depth=4;

variables:
  float data(depth, cindex);

  int cindex(cindex);
    cindex:compress="lat lon";

  float depth(depth);
  float lat(lat);
  float lon(lon);

data:
  landpoint=363, 364, 365, ...;

"Since landpoint(0)=363, for instance, we know that data(*,0) maps on to point 363 of the original data with dimensions (lat,lon). This corresponds to indices (3,75), i.e., 363 = 3*96 + 75."

Lets call cindex a compression dimension; it is identified by the compress attribute, which has a list of the dimensions gathered. Equivalently we could store the indices separately, eg use cindex(condex,2), where 0= lat index, 1=lon index, instead of using stride arithmetic (363 = 3*96 + 75) .

Note that the lat, lon coordinates are orthogonal so this is a 2D regular (rectified) grid. The compression just saves storage space.

Note probably worth adding the coordinate attribute to the data variables for clarity and completeness, eg:

  float data(depth, cindex);

    data:coordinates = "lon lat" ;

A compression dimension logically expands to its list of compressed dimensions, eg:

  float data(depth, lat, lon);

with missing values at the places not stored. Thus, we keep the rule "coordinate dims must be subset of data dims".

However, CF Section 5.3 Reduced Horizontal Grid has a different example of Compression by Gathering:

"A "reduced" longitude-latitude grid is one in which the points are arranged along constant latitude lines with the number of points on a latitude line decreasing toward the poles. Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables."

dimensions:
  londim = 128 ;
  latdim = 64 ;
  cindex= 6144 ;

variables:
  float data(cindex) ;
    data:coordinates = "lon lat" ;

  int cindex(cindex);
    cindex:compress = "latdim londim";

  float lon(cindex) ;
  float lat(cindex) ;

"PS(n) is associated with the coordinate values lon(n), lat(n). Compressed grid index (n) would be assigned to 2D index (j,i) (C index conventions) where j = rgrid(n) / 128 and i = rgrid(n) - 128*j".

If we do logical expansion of the compression dimension:

  float data(latdim, londim) ;
    data:coordinates = "lon lat" ;

  int cindex(cindex);
    cindex:compress = "latdim londim";

  float lon(latdim, londim) ;
  float lat(latdim, londim) ;

So CF is trying to deal with 2D lat lon coordinates here. Actually i think the common case for reduced grids is 1D latitude coordinates and variable length 2D longitide coordinates (eg reduced Gaussian Grids). So probably the example should be

dimensions:
  londim = 128 ;
  latdim = 64 ;
  cindex= 6144 ;

variables:
  float data(cindex) ;
    data:coordinates = "lon lat" ;

  int cindex(cindex);
    cindex:compress = "latdim londim";

  float lon(cindex) ;
  float lat(latdim) ;

which is logically expanded to

  float data(latdim, londim) ;
    data:coordinates = "lon lat" ;

  int cindex(cindex);
    cindex:compress = "latdim londim";

  float lon(latdim, londim) ;
  float lat(latdim) ;

The main problem with this is that we've really got a "ragged array", not a 2D rectified grid with missing data.

Since this blog post is getting long, let me continue in part 2.

Posted by $entry.creator.screenName