Re: GDT proposals

John Caron (caron@ucar.edu)
Fri, 18 Jul 1997 09:42:53 -0600

Hi Jonathan:

Here are some comments on your comments.

> 
> THE ASSOCIATE ATTRIBUTE IN GDT
> 
> Firstly, let me describe GDT's proposed "associate" attribute. This was
> suggested with one-dimensional coordinate variables in mind. The cases it was
> intended for include Steve Emmerson's wire example, and John Caron's motivating
> examples of scattered points (1) and trajectories (9). In fact, the trajectory
> and scattered points were given as examples in GDT (section 19 and 20
> respectively). We said, rather vaguely, that associated coordinate variables
> should be used "where an axis has a number of alternative ways of being
> labelled, providing different kinds of information".
> 
> The case of scattered points might look like this:
> 
>   dimensions:
>     time=10;
>     point=12;
>     StringMaxLength=32;
> 
>   variables:
>     float temperature(time,point);
>     short point(point);
>       point:associate="height,lat,lon,sitename"; // spaces are optional
>     float height(point);
>     float lat(point);
>     float lon(point);
>     char sitename(point,StringMaxLength);
>     double time(time);
> 
> In this instance, the main coordinate variable "point" would probably be an
> index (0,1,2,...). Its function is to bear the associate attribute. It could be
> meaningful, though - it might be a WMO station number, perhaps. GDT require it
> to be present, but this requirement has been criticised by some. If it were
> dropped from GDT, we could instead move the associate attribute to the data
> variable i.e. remove the "point" variable and its attribute above, and have
> instead
> 
>       temperature:associate="height,lat,lon,sitename";

this looks exactly like the naming of a coordinate system for the variable
temperature.

the use of the word "associate" and "main" has the connotation that the
"main" is in some sense fundamental. yet in this case its more or less a
dummy. The requirement that the "main" coord system has coordinate
functions that are all 1-dimensional seems unmotivated to me.

> 
> The wire or trajectory case would be represented as:
> 
>   dimensions:
>     distance=100;
> 
>   variables:
>     float temperature(distance); // temperature along spiral
>     float distance(distance); // distance along spiral
>       distance:associate="z,rho,theta"; // cylindrical coordinate system (CCS)
>     float z(distance); // CCS height
>     float rho(distance); // distance from CCS centre axis
>     float theta(distance); // CCS azimuth
> 
> In this case, the main coordinate variable "distance" probably has physical
> content.
> 
> In these cases, the associate attribute can be attached to the main coordinate
> variable, rather than to the data variable. This is OK because all the
> associated coordinates apply to that axis alone (they are a function solely of
> distance along the wire). It is efficient because the same associations will
> probably apply for any data variable that has that axis. For instance,
> precipitation rate at the scattered points:
> 
>     float precipitation(time,point);
> 
> where the points need just the same latitude, longitude, height and name as
> for temperature. Since this information is associated to the point axis, which
> is shared by temperature and precipitation, it only needs to be recorded once
> in the file.

One way to interpret this example is that you have specified two coordinate
systems, one is "distance", and the other is "z,rho,theta". Attaching the
second to the first emphasizes in some sense that it is a coordinate
transformation, rather than an alternative system. Of course, any two
complete coordinate systems for the same variable can be thought of as
being transformations of each other (meaning there is a function T such
that T * Cs1 = Cs2).

> 
> But suppose you wanted to have a variable that did *not* have all those
> associations. For instance, suppose you wanted to consider vertical profiles of
> humidity at the scattered points; you now want their names, longitude and
> latitude but not their height, as presumably a new height coordinate will be
> introduced instead. This can be handled in two ways:
> 
> * Use a different dimension name, not "point". This means you have to have a
> new main coordinate variable and repeat the lat, lon and name information in
> new variables, since GDT require that associated coordinate variables have the
> same dimension as the main coordinate variable. If we relaxed that requirement
> to say that the dimension had only to be equal, not identical, the point-
> dependent information would not have to be repeated. I am not so happy about
> that, though.
> 
> * Put the association on the data variable instead of the main coordinate
> variable, above. In this case, the associate attribute has to appear on each of
> the relevant data variables individually, rather than once for all onthe
> coordinate variable.  This is less efficient, but the inefficiency is really
> quite minor.
> 
> I think either approach should be allowed.
> 
> GDT does not contain the second approach. However, to allow the associate
> attribute on either the data or the coordinate variable is quite natural, as is
> shown by John Caron's remark about the close relationship between scattered
> points in 1D or 2D arrangements. What if we want to record the information that
> the 12 points are not completely irregular, but in some 3x4 non-rectilinear
> arrangement? We now cannot have a main coordinate variable any more (that is,
> one whose name is the same as the name of a dimension, since the points now
> have two dimensions), but we could transfer the attribute to the data variable:
> 
>   dimensions:
>     time=10;
>     pointx=3;
>     pointy=4;
>     StringMaxLength=32;
> 
>   variables:
>     float temperature(time,pointx,pointy);
>       temperature:associate="height,lat,lon,sitename";
>     short point(point);
>     float height(pointx,pointy);
>     float lat(pointx,pointy);
>     float lon(pointx,pointy);
>     char sitename(pointx,pointy,StringMaxLength);
>     double time(time);

again, this looks exactly like a specification of a coordinate system as a
list of coordinate functions.

> 
> USING THE ASSOCIATE ATTRIBUTE FOR MULTIDIMENSIONAL COORDINATES
> 
> This situation is very similar to what Stephen Walker and Jason Waring use and
> to Steve Emmerson's preferred approach for multidimensional coordinates. For a
> three-dimensional field, it looks like this:
> 
>   dimensions:
>     level=19;
>     i=72;
>     j=96;
> 
>   variables:
>     float temperature(level,i,j);
>       temperature:associate="lat,lon";
>     short level(level);
>     float lat(i,j);
>     float lon(i,j);
> 
> This is not a proposal of GDT, but a possible extension. This is not the use
> for which the associate attribute was intended, although the example in the
> previous section suggests that it is related. If I understand correctly what
> has been said, it is not strictly speaking a "referential attribute" approach,
> which would have
> 
>       temperature:i="lat"
>       temperature:j="lon"
> 
> instead of
> 
>       temperature:associate="lat,lon";
> 
> I feel that referential attributes are not so good because
> 
> * An association is implied between i and lat, j and lon. As has been discussed
> by several people, this is not the case. lat,lon supply a set of coordinates
> which is an alternative to i,j ("manifold" and "base"). The same misleading
> association is implied by the approach that Steve Emmerson deprecates, in which
> there is no attribute and the lat,lon variables are named i,j.
> 
> * The idea introduces an "interaction" between the names of dimensions and
> attributes, in that you have to be careful not to give them the same name
> unless you intend the "reference". Suppose that a convention like GDT had
> defined a particular meaning for a variable attribute named "level".  This
> attribute could not be used with that meaning on the temperature variable above
> because it would be interpreted as referential. You might say that it is
> unlikely that you would want to use it under those circumstances, but I think
> it must be confusing at least to humans to have an attribute of a particular
> name having different functions when attached to different variables in the
> same file.
> 
> I don't think it matters very much whether the coordinate variables are called
> lat,lon or i,j. Since the dimensions i,j do not correspond one-to-one with the
> coordinate variables lat,lon, I think it would be good practice to give them
> different names, but I don't think this need be the subject of a convention.
> 
> HOW SHOULD APPLICATIONS INTERPRET THIS?
> 
> There has been discussion about how applications should make use of these
> associated coordinate variables. Personally, I do not think that a netCDF
> convention should be prescriptive about this. It is really a problem for the
> application program.  What is essential, as has been pointed out, is that
> existing programs should not have difficulties.
> 
> A completely general application (plotting program, or whatever) can ignore the
> associate attribute altogether and use the one-dimensional axes, which might
> have plain indices for coordinate variables. This is guaranteed to work.

but its also the trivial case that doesnt particularly need our frenzied
email intervention. More interesting is to try to capture the meanings we
actually want to infer in our applications as best we can.

> 
> A more sophisticated application will presumably have some idea of what it
> wants, or the ability to be told. In general, there might be several possible
> sets of axes, and some choices which are pathological. I think it is
> unavoidable that some kind of intelligence will have to be present in the
> application, or some selection made by its user. If we have
> 
>   dimensions:
>     i=72;
>     j=96;
> 
>   variables:
>     float temperature(i,j);
>       temperature:associate="lat,lon,x,y";
>     short level(level);
>     float lat(i,j);
>     float lon(i,j);
>     float x(i,j);
>     float y(i,j);
> 
> the user or the program is unavoidably going to have to choose whether a plot
> is to be made on lat-lon or x-y (perhaps distance) axes. 

in this case, why not (lat,y) or (lon,x) ? better is to actually capture
the two intelligent choices.

> If the plotting
> program is able to plot a map background (coastlines etc.) when appropriate, it
> must already know that this has to be done differently in the two cases, and
> moreover that the map cannot be plotted in all cases (not for latitude-height,
> for example). A program that calculates area-averages, similarly, must know how
> to work out areas on a spherical surface. Such applications are already
> sensitive to lat-lon (and presumably other special cases). They are free to
> look for these kinds of special coordinate among the associated coordinate
> variables if they want to, or if requested, and make appropriate checks that
> the coordinates are sensible. All the netCDF file does is tell the application
> where to look for the information, not what to do with it. All the netCDF
> convention needs to do is to provide a method for attaching this information.

this is a difficult and subtle point. How much meaning is in the metadata
and how much in the application?  I would say most is in the application,
in general.  But there are a number of very specific exceptions, mostly in
the area of georeferencing and maybe time, where we can profitably agree to
place meaning in the metadata.   

> 
> John Sheldon asked about an example like this:
> 
>   dimensions:
>     x=5;
>     y=10;
>     z=12;
> 
>   variables:
>     float pressure(x,y,z);
>       pressure:associate="lat,lon,alt";
>     float lat(x,y);
>       lat:quantity="latitude";
>     float lon(x,y);
>       lon:quantity="longitude";
>     float alt(x,y,z);
>       alt:quantity="altitude";
> 
> where I have used the quantity and associate attributes of GDT. He asked, "Can
> my application infer that "lat,lon,alt" is an ordered list intended to point to
> variables containing coordinates for x,y,z?" I think the answer to this is No.
> The most basic application could plot the pressure on x,y,z axes with indices
> for coordinate variables. A more advanced application would "know" about
> latitude, longitude and height - for instance, if it complies with COARDS it
> knows something about them, I suppose. It can therefore look in the associate
> attribute to find coordinate variables to use instead of the one-dimensional
> indices. It will be able to identify these by their quantity attributes in GDT
> or their units in COARDS.
> 
> The application should do some checking to see whether the dimensions of these
> coordinate variables are acceptable - for instance, it wants as many coordinate
> variables as it has dimensions, with each dimension referred to at least once.
> I do not think that such restrictions should be in the netCDF convention. I
> feel that Gary Granger's description of mappings, for example, is valuable as a
> way to think about the possible axes which might be used for plotting, > etc.,
> but that such information is really in the realm of the application, and is not
> a property of the data itself.

i think i disagree, although your caution is understandable. if by "axes"
you mean "coordinate functions", then we want to add metadata to indicate
the "possible axes" to the netcdf file, because it is a property of the
data.

> 
> COMPONENTS AND BOUNDARIES
> 
> GDT also have material which applies to some other of John Caron's motivating
> examples. Hybrid coordinates (4) and time components (11) are what we regard as
> "component coordinate variables" (section 18). These are formally very similar
> to associated coordinate variables. The difference is that the main coordinate
> variable is a function of the components, whereas associated coordinates are
> merely alternative or additional information. For hybrid coordinates, we have
> 
>   dimensions:
>     eta=19;
> 
>   variables:
>     float eta(eta);
>       eta:component="pressure,sigma";
>     float pressure(eta);
>     float sigma(eta);
> 
> This means that the hybrid vertical coordinate eta is, at each level, a
> function of the pressure and sigma values. 

Its interesting to notice that even though eta = F(pressure, sigma)
mathematically, as a netcdf variable pressure and sigma are functions of
the dimension eta.

> In this case, it is necessary to
> store these values in the file as eta cannot be uniquely decomposed into
> pressure and sigma. eta is also stored in the file in order to provide a
> monotonic main coordinate variable to order the axis, and so that the
> application does not have to know how to compute it from its components. For
> time, on the other hand, it is not necessary to store the components because a
> unique decomposition can be done, and time is so important that we assume the
> application will know how to do this. We have quite a lot to suggest about time
> coordinates (sections 24-29) as it is such a complicated subject.
> 
> The example of edge coordinates (5) is referred to as "boundaries" in GDT
> (section 21). This is not the same kind of coordinate. Boundaries are not
> additional axes for the data, but additional information about the coordinate
> values of other axes, to which they are attached. In the example above we may
> envisage that the data refers to "layers" in the atmosphere. Each layer has
> a characteristic level, given by its main coordinates, and lower and upper
> limits. GDT supply the boundary information like this:
> 
>   variables:
>     float eta(eta);
>       eta:component="pressure,sigma";
>       eta:bounds="bounds_eta"; // name of the boundary coordinate is arbitrary
>     float bounds_eta(2,eta); // 0=boundary with smaller eta value, 1=larger
>     float pressure(eta);
>       pressure:bounds="bounds_pressure";
>     float bounds_pressure(2,eta); // elements correspond to those of bounds_eta
>     float sigma(eta);
>       sigma:bounds="bounds_sigma";
>     float bounds_sigma(2,eta);
> 
> Although GDT does not discuss multidimensional coordinates, I think that the
> idea of boundaries can be easily extended to cover them. The convention I would
> suggest is
> 
>     float alt(x,y,z);
>       alt:quantity="altitude";
>       alt:bounds="bounds_alt";
>     float bounds_alt(2,2,2,x,y,z);
> 
> where the first dimension of bounds_alt is 0 for the alt at the lower x value,
> 1 at the higher x value, the second dimension for y bounds and the third for z
> bounds. Thus, bounds_alt[*][*][*][i][j][k] are the altitude at the eight
> vertices of a cube surrounding the the point [i][j][k]. (Whether these boundary
> dimensions should be leading or trailing is a subsidiary question.)

Here you have a vector valued function on multiple dimensions (x,y,z).  Its
probably worth making a proposal about these two things separately, for
clarity.  (or perhaps this is just about bounds?)

The idea of an "axis" is very intuitive, and even though I have suggested
that it is the same thing as a coordinate function, I want to refine that
somewhat. The reason is that for a georeferencing system as in this
example, we know there are three spatial axes. So its natural to specify
three sets of axes; but to capture bounds we need two numbers, so our two
obvious choices are:
	float var(x,y,z);
	   var:coordinates = "lat_bound lon_bound lev_bound";
	float lat_bound(x,y,z,2);
	float lon_bound(x,y,z,2);
	float lev_bound(x,y,z,2);

or something like:

	float var(x,y,z);
	   var:coordinates = "(lat1 lat2) (lon1 lon2) (lev1 lev2)";
	float lat1(x,y,z);
	float lat2(x,y,z);
	float lon1(x,y,z);
	float lon2(x,y,z);
	float lev1(x,y,z);
	float lev2(x,y,z);

both emphazise that you have three "axes" (although i dont have a
definition of that yet), each with two coordinate values associated with each
domain point.

Here the fact that the coordinate functions are mult-dimensional is
incidental.