coordinate variables

Jonathan Gregory (jmgregory@meto.gov.uk)
Wed, 16 Jul 1997 14:35:14 +0100 (BST)

Our proposed netCDF conventions for climate data (GDT=Gregory, Drach and Tett)
did not include any discussion of non-rectilinear grids or multidimensional
coordinates, but we hoped (section 10) that people with experience of these
would comment on ways to handle them. In subsequent discussion by email with
Stephen Walker, I suggested that we could use our proposed associate attribute
(section 19) to identify the multidimensional coordinate variables associated
with a data variable. GDT proposed this attribute for coordinate variables
only, but the extension to data variables is fairly natural, I think. In this
posting I try to describe how this would work, its similarities to what has
been brought up in previous discussion, and its relationship to other aspects
of GDT.

THE ASSOCIATE ATTRIBUTE IN GDT

Firstly, let me describe GDT's proposed "associate" attribute. This was
suggested with one-dimensional coordinate variables in mind. The cases it was
intended for include Steve Emmerson's wire example, and John Caron's motivating
examples of scattered points (1) and trajectories (9). In fact, the trajectory
and scattered points were given as examples in GDT (section 19 and 20
respectively). We said, rather vaguely, that associated coordinate variables
should be used "where an axis has a number of alternative ways of being
labelled, providing different kinds of information".

The case of scattered points might look like this:

  dimensions:
    time=10;
    point=12;
    StringMaxLength=32;

  variables:
    float temperature(time,point);
    short point(point);
      point:associate="height,lat,lon,sitename"; // spaces are optional
    float height(point);
    float lat(point);
    float lon(point);
    char sitename(point,StringMaxLength);
    double time(time);

In this instance, the main coordinate variable "point" would probably be an
index (0,1,2,...). Its function is to bear the associate attribute. It could be
meaningful, though - it might be a WMO station number, perhaps. GDT require it
to be present, but this requirement has been criticised by some. If it were
dropped from GDT, we could instead move the associate attribute to the data
variable i.e. remove the "point" variable and its attribute above, and have
instead

      temperature:associate="height,lat,lon,sitename";

The wire or trajectory case would be represented as:

  dimensions:
    distance=100;

  variables:
    float temperature(distance); // temperature along spiral
    float distance(distance); // distance along spiral
      distance:associate="z,rho,theta"; // cylindrical coordinate system (CCS)
    float z(distance); // CCS height
    float rho(distance); // distance from CCS centre axis
    float theta(distance); // CCS azimuth

In this case, the main coordinate variable "distance" probably has physical
content.

In these cases, the associate attribute can be attached to the main coordinate
variable, rather than to the data variable. This is OK because all the
associated coordinates apply to that axis alone (they are a function solely of
distance along the wire). It is efficient because the same associations will
probably apply for any data variable that has that axis. For instance,
precipitation rate at the scattered points:

    float precipitation(time,point);

where the points need just the same latitude, longitude, height and name as
for temperature. Since this information is associated to the point axis, which
is shared by temperature and precipitation, it only needs to be recorded once
in the file.

But suppose you wanted to have a variable that did *not* have all those
associations. For instance, suppose you wanted to consider vertical profiles of
humidity at the scattered points; you now want their names, longitude and
latitude but not their height, as presumably a new height coordinate will be
introduced instead. This can be handled in two ways:

* Use a different dimension name, not "point". This means you have to have a
new main coordinate variable and repeat the lat, lon and name information in
new variables, since GDT require that associated coordinate variables have the
same dimension as the main coordinate variable. If we relaxed that requirement
to say that the dimension had only to be equal, not identical, the point-
dependent information would not have to be repeated. I am not so happy about
that, though.

* Put the association on the data variable instead of the main coordinate
variable, above. In this case, the associate attribute has to appear on each of
the relevant data variables individually, rather than once for all on the
coordinate variable.  This is less efficient, but the inefficiency is really
quite minor.

I think either approach should be allowed.

GDT does not contain the second approach. However, to allow the associate
attribute on either the data or the coordinate variable is quite natural, as is
shown by John Caron's remark about the close relationship between scattered
points in 1D or 2D arrangements. What if we want to record the information that
the 12 points are not completely irregular, but in some 3x4 non-rectilinear
arrangement? We now cannot have a main coordinate variable any more (that is,
one whose name is the same as the name of a dimension, since the points now
have two dimensions), but we could transfer the attribute to the data variable:

  dimensions:
    time=10;
    pointx=3;
    pointy=4;
    StringMaxLength=32;

  variables:
    float temperature(time,pointx,pointy);
      temperature:associate="height,lat,lon,sitename";
    short point(point);
    float height(pointx,pointy);
    float lat(pointx,pointy);
    float lon(pointx,pointy);
    char sitename(pointx,pointy,StringMaxLength);
    double time(time);

USING THE ASSOCIATE ATTRIBUTE FOR MULTIDIMENSIONAL COORDINATES

This situation is very similar to what Stephen Walker and Jason Waring use and
to Steve Emmerson's preferred approach for multidimensional coordinates. For a
three-dimensional field, it looks like this:

  dimensions:
    level=19;
    i=72;
    j=96;

  variables:
    float temperature(level,i,j);
      temperature:associate="lat,lon";
    short level(level);
    float lat(i,j);
    float lon(i,j);

This is not a proposal of GDT, but a possible extension. This is not the use
for which the associate attribute was intended, although the example in the
previous section suggests that it is related. If I understand correctly what
has been said, it is not strictly speaking a "referential attribute" approach,
which would have

      temperature:i="lat"
      temperature:j="lon"

instead of

      temperature:associate="lat,lon";

I feel that referential attributes are not so good because

* An association is implied between i and lat, j and lon. As has been discussed
by several people, this is not the case. lat,lon supply a set of coordinates
which is an alternative to i,j ("manifold" and "base"). The same misleading
association is implied by the approach that Steve Emmerson deprecates, in which
there is no attribute and the lat,lon variables are named i,j.

* The idea introduces an "interaction" between the names of dimensions and
attributes, in that you have to be careful not to give them the same name
unless you intend the "reference". Suppose that a convention like GDT had
defined a particular meaning for a variable attribute named "level". This
attribute could not be used with that meaning on the temperature variable above
because it would be interpreted as referential. You might say that it is
unlikely that you would want to use it under those circumstances, but I think
it must be confusing at least to humans to have an attribute of a particular
name having different functions when attached to different variables in the
same file.

I don't think it matters very much whether the coordinate variables are called
lat,lon or i,j. Since the dimensions i,j do not correspond one-to-one with the
coordinate variables lat,lon, I think it would be good practice to give them
different names, but I don't think this need be the subject of a convention.

HOW SHOULD APPLICATIONS INTERPRET THIS?

There has been discussion about how applications should make use of these
associated coordinate variables. Personally, I do not think that a netCDF
convention should be prescriptive about this. It is really a problem for the
application program.  What is essential, as has been pointed out, is that
existing programs should not have difficulties.

A completely general application (plotting program, or whatever) can ignore the
associate attribute altogether and use the one-dimensional axes, which might
have plain indices for coordinate variables. This is guaranteed to work.

A more sophisticated application will presumably have some idea of what it
wants, or the ability to be told. In general, there might be several possible
sets of axes, and some choices which are pathological. I think it is
unavoidable that some kind of intelligence will have to be present in the
application, or some selection made by its user. If we have

  dimensions:
    i=72;
    j=96;

  variables:
    float temperature(i,j);
      temperature:associate="lat,lon,x,y";
    short level(level);
    float lat(i,j);
    float lon(i,j);
    float x(i,j);
    float y(i,j);

the user or the program is unavoidably going to have to choose whether a plot
is to be made on lat-lon or x-y (perhaps distance) axes. If the plotting
program is able to plot a map background (coastlines etc.) when appropriate, it
must already know that this has to be done differently in the two cases, and
moreover that the map cannot be plotted in all cases (not for latitude-height,
for example). A program that calculates area-averages, similarly, must know how
to work out areas on a spherical surface. Such applications are already
sensitive to lat-lon (and presumably other special cases). They are free to
look for these kinds of special coordinate among the associated coordinate
variables if they want to, or if requested, and make appropriate checks that
the coordinates are sensible. All the netCDF file does is tell the application
where to look for the information, not what to do with it. All the netCDF
convention needs to do is to provide a method for attaching this information.

John Sheldon asked about an example like this:

  dimensions:
    x=5;
    y=10;
    z=12;

  variables:
    float pressure(x,y,z);
      pressure:associate="lat,lon,alt";
    float lat(x,y);
      lat:quantity="latitude";
    float lon(x,y);
      lon:quantity="longitude";
    float alt(x,y,z);
      alt:quantity="altitude";

where I have used the quantity and associate attributes of GDT. He asked, "Can
my application infer that "lat,lon,alt" is an ordered list intended to point to
variables containing coordinates for x,y,z?" I think the answer to this is No.
The most basic application could plot the pressure on x,y,z axes with indices
for coordinate variables. A more advanced application would "know" about
latitude, longitude and height - for instance, if it complies with COARDS it
knows something about them, I suppose. It can therefore look in the associate
attribute to find coordinate variables to use instead of the one-dimensional
indices. It will be able to identify these by their quantity attributes in GDT
or their units in COARDS.

The application should do some checking to see whether the dimensions of these
coordinate variables are acceptable - for instance, it wants as many coordinate
variables as it has dimensions, with each dimension referred to at least once.
I do not think that such restrictions should be in the netCDF convention. I
feel that Gary Granger's description of mappings, for example, is valuable as a
way to think about the possible axes which might be used for plotting, etc.,
but that such information is really in the realm of the application, and is not
a property of the data itself.

COMPONENTS AND BOUNDARIES

GDT also have material which applies to some other of John Caron's motivating
examples. Hybrid coordinates (4) and time components (11) are what we regard as
"component coordinate variables" (section 18). These are formally very similar
to associated coordinate variables. The difference is that the main coordinate
variable is a function of the components, whereas associated coordinates are
merely alternative or additional information. For hybrid coordinates, we have

  dimensions:
    eta=19;

  variables:
    float eta(eta);
      eta:component="pressure,sigma";
    float pressure(eta);
    float sigma(eta);

This means that the hybrid vertical coordinate eta is, at each level, a
function of the pressure and sigma values. In this case, it is necessary to
store these values in the file as eta cannot be uniquely decomposed into
pressure and sigma. eta is also stored in the file in order to provide a
monotonic main coordinate variable to order the axis, and so that the
application does not have to know how to compute it from its components. For
time, on the other hand, it is not necessary to store the components because a
unique decomposition can be done, and time is so important that we assume the
application will know how to do this. We have quite a lot to suggest about time
coordinates (sections 24-29) as it is such a complicated subject.

The example of edge coordinates (5) is referred to as "boundaries" in GDT
(section 21). This is not the same kind of coordinate. Boundaries are not
additional axes for the data, but additional information about the coordinate
values of other axes, to which they are attached. In the example above we may
envisage that the data refers to "layers" in the atmosphere. Each layer has
a characteristic level, given by its main coordinates, and lower and upper
limits. GDT supply the boundary information like this:

  variables:
    float eta(eta);
      eta:component="pressure,sigma";
      eta:bounds="bounds_eta"; // name of the boundary coordinate is arbitrary
    float bounds_eta(2,eta); // 0=boundary with smaller eta value, 1=larger
    float pressure(eta);
      pressure:bounds="bounds_pressure";
    float bounds_pressure(2,eta); // elements correspond to those of bounds_eta
    float sigma(eta);
      sigma:bounds="bounds_sigma";
    float bounds_sigma(2,eta);

Although GDT does not discuss multidimensional coordinates, I think that the
idea of boundaries can be easily extended to cover them. The convention I would
suggest is

    float alt(x,y,z);
      alt:quantity="altitude";
      alt:bounds="bounds_alt";
    float bounds_alt(2,2,2,x,y,z);

where the first dimension of bounds_alt is 0 for the alt at the lower x value,
1 at the higher x value, the second dimension for y bounds and the third for z
bounds. Thus, bounds_alt[*][*][*][i][j][k] are the altitude at the eight
vertices of a cube surrounding the the point [i][j][k]. (Whether these boundary
dimensions should be leading or trailing is a subsidiary question.)

Jonathan Gregory