Our proposed netCDF conventions for climate data (GDT=Gregory, Drach and Tett) did not include any discussion of non-rectilinear grids or multidimensional coordinates, but we hoped (section 10) that people with experience of these would comment on ways to handle them. In subsequent discussion by email with Stephen Walker, I suggested that we could use our proposed associate attribute (section 19) to identify the multidimensional coordinate variables associated with a data variable. GDT proposed this attribute for coordinate variables only, but the extension to data variables is fairly natural, I think. In this posting I try to describe how this would work, its similarities to what has been brought up in previous discussion, and its relationship to other aspects of GDT. THE ASSOCIATE ATTRIBUTE IN GDT Firstly, let me describe GDT's proposed "associate" attribute. This was suggested with one-dimensional coordinate variables in mind. The cases it was intended for include Steve Emmerson's wire example, and John Caron's motivating examples of scattered points (1) and trajectories (9). In fact, the trajectory and scattered points were given as examples in GDT (section 19 and 20 respectively). We said, rather vaguely, that associated coordinate variables should be used "where an axis has a number of alternative ways of being labelled, providing different kinds of information". The case of scattered points might look like this: dimensions: time=10; point=12; StringMaxLength=32; variables: float temperature(time,point); short point(point); point:associate="height,lat,lon,sitename"; // spaces are optional float height(point); float lat(point); float lon(point); char sitename(point,StringMaxLength); double time(time); In this instance, the main coordinate variable "point" would probably be an index (0,1,2,...). Its function is to bear the associate attribute. It could be meaningful, though - it might be a WMO station number, perhaps. GDT require it to be present, but this requirement has been criticised by some. If it were dropped from GDT, we could instead move the associate attribute to the data variable i.e. remove the "point" variable and its attribute above, and have instead temperature:associate="height,lat,lon,sitename"; The wire or trajectory case would be represented as: dimensions: distance=100; variables: float temperature(distance); // temperature along spiral float distance(distance); // distance along spiral distance:associate="z,rho,theta"; // cylindrical coordinate system (CCS) float z(distance); // CCS height float rho(distance); // distance from CCS centre axis float theta(distance); // CCS azimuth In this case, the main coordinate variable "distance" probably has physical content. In these cases, the associate attribute can be attached to the main coordinate variable, rather than to the data variable. This is OK because all the associated coordinates apply to that axis alone (they are a function solely of distance along the wire). It is efficient because the same associations will probably apply for any data variable that has that axis. For instance, precipitation rate at the scattered points: float precipitation(time,point); where the points need just the same latitude, longitude, height and name as for temperature. Since this information is associated to the point axis, which is shared by temperature and precipitation, it only needs to be recorded once in the file. But suppose you wanted to have a variable that did *not* have all those associations. For instance, suppose you wanted to consider vertical profiles of humidity at the scattered points; you now want their names, longitude and latitude but not their height, as presumably a new height coordinate will be introduced instead. This can be handled in two ways: * Use a different dimension name, not "point". This means you have to have a new main coordinate variable and repeat the lat, lon and name information in new variables, since GDT require that associated coordinate variables have the same dimension as the main coordinate variable. If we relaxed that requirement to say that the dimension had only to be equal, not identical, the point- dependent information would not have to be repeated. I am not so happy about that, though. * Put the association on the data variable instead of the main coordinate variable, above. In this case, the associate attribute has to appear on each of the relevant data variables individually, rather than once for all on the coordinate variable. This is less efficient, but the inefficiency is really quite minor. I think either approach should be allowed. GDT does not contain the second approach. However, to allow the associate attribute on either the data or the coordinate variable is quite natural, as is shown by John Caron's remark about the close relationship between scattered points in 1D or 2D arrangements. What if we want to record the information that the 12 points are not completely irregular, but in some 3x4 non-rectilinear arrangement? We now cannot have a main coordinate variable any more (that is, one whose name is the same as the name of a dimension, since the points now have two dimensions), but we could transfer the attribute to the data variable: dimensions: time=10; pointx=3; pointy=4; StringMaxLength=32; variables: float temperature(time,pointx,pointy); temperature:associate="height,lat,lon,sitename"; short point(point); float height(pointx,pointy); float lat(pointx,pointy); float lon(pointx,pointy); char sitename(pointx,pointy,StringMaxLength); double time(time); USING THE ASSOCIATE ATTRIBUTE FOR MULTIDIMENSIONAL COORDINATES This situation is very similar to what Stephen Walker and Jason Waring use and to Steve Emmerson's preferred approach for multidimensional coordinates. For a three-dimensional field, it looks like this: dimensions: level=19; i=72; j=96; variables: float temperature(level,i,j); temperature:associate="lat,lon"; short level(level); float lat(i,j); float lon(i,j); This is not a proposal of GDT, but a possible extension. This is not the use for which the associate attribute was intended, although the example in the previous section suggests that it is related. If I understand correctly what has been said, it is not strictly speaking a "referential attribute" approach, which would have temperature:i="lat" temperature:j="lon" instead of temperature:associate="lat,lon"; I feel that referential attributes are not so good because * An association is implied between i and lat, j and lon. As has been discussed by several people, this is not the case. lat,lon supply a set of coordinates which is an alternative to i,j ("manifold" and "base"). The same misleading association is implied by the approach that Steve Emmerson deprecates, in which there is no attribute and the lat,lon variables are named i,j. * The idea introduces an "interaction" between the names of dimensions and attributes, in that you have to be careful not to give them the same name unless you intend the "reference". Suppose that a convention like GDT had defined a particular meaning for a variable attribute named "level". This attribute could not be used with that meaning on the temperature variable above because it would be interpreted as referential. You might say that it is unlikely that you would want to use it under those circumstances, but I think it must be confusing at least to humans to have an attribute of a particular name having different functions when attached to different variables in the same file. I don't think it matters very much whether the coordinate variables are called lat,lon or i,j. Since the dimensions i,j do not correspond one-to-one with the coordinate variables lat,lon, I think it would be good practice to give them different names, but I don't think this need be the subject of a convention. HOW SHOULD APPLICATIONS INTERPRET THIS? There has been discussion about how applications should make use of these associated coordinate variables. Personally, I do not think that a netCDF convention should be prescriptive about this. It is really a problem for the application program. What is essential, as has been pointed out, is that existing programs should not have difficulties. A completely general application (plotting program, or whatever) can ignore the associate attribute altogether and use the one-dimensional axes, which might have plain indices for coordinate variables. This is guaranteed to work. A more sophisticated application will presumably have some idea of what it wants, or the ability to be told. In general, there might be several possible sets of axes, and some choices which are pathological. I think it is unavoidable that some kind of intelligence will have to be present in the application, or some selection made by its user. If we have dimensions: i=72; j=96; variables: float temperature(i,j); temperature:associate="lat,lon,x,y"; short level(level); float lat(i,j); float lon(i,j); float x(i,j); float y(i,j); the user or the program is unavoidably going to have to choose whether a plot is to be made on lat-lon or x-y (perhaps distance) axes. If the plotting program is able to plot a map background (coastlines etc.) when appropriate, it must already know that this has to be done differently in the two cases, and moreover that the map cannot be plotted in all cases (not for latitude-height, for example). A program that calculates area-averages, similarly, must know how to work out areas on a spherical surface. Such applications are already sensitive to lat-lon (and presumably other special cases). They are free to look for these kinds of special coordinate among the associated coordinate variables if they want to, or if requested, and make appropriate checks that the coordinates are sensible. All the netCDF file does is tell the application where to look for the information, not what to do with it. All the netCDF convention needs to do is to provide a method for attaching this information. John Sheldon asked about an example like this: dimensions: x=5; y=10; z=12; variables: float pressure(x,y,z); pressure:associate="lat,lon,alt"; float lat(x,y); lat:quantity="latitude"; float lon(x,y); lon:quantity="longitude"; float alt(x,y,z); alt:quantity="altitude"; where I have used the quantity and associate attributes of GDT. He asked, "Can my application infer that "lat,lon,alt" is an ordered list intended to point to variables containing coordinates for x,y,z?" I think the answer to this is No. The most basic application could plot the pressure on x,y,z axes with indices for coordinate variables. A more advanced application would "know" about latitude, longitude and height - for instance, if it complies with COARDS it knows something about them, I suppose. It can therefore look in the associate attribute to find coordinate variables to use instead of the one-dimensional indices. It will be able to identify these by their quantity attributes in GDT or their units in COARDS. The application should do some checking to see whether the dimensions of these coordinate variables are acceptable - for instance, it wants as many coordinate variables as it has dimensions, with each dimension referred to at least once. I do not think that such restrictions should be in the netCDF convention. I feel that Gary Granger's description of mappings, for example, is valuable as a way to think about the possible axes which might be used for plotting, etc., but that such information is really in the realm of the application, and is not a property of the data itself. COMPONENTS AND BOUNDARIES GDT also have material which applies to some other of John Caron's motivating examples. Hybrid coordinates (4) and time components (11) are what we regard as "component coordinate variables" (section 18). These are formally very similar to associated coordinate variables. The difference is that the main coordinate variable is a function of the components, whereas associated coordinates are merely alternative or additional information. For hybrid coordinates, we have dimensions: eta=19; variables: float eta(eta); eta:component="pressure,sigma"; float pressure(eta); float sigma(eta); This means that the hybrid vertical coordinate eta is, at each level, a function of the pressure and sigma values. In this case, it is necessary to store these values in the file as eta cannot be uniquely decomposed into pressure and sigma. eta is also stored in the file in order to provide a monotonic main coordinate variable to order the axis, and so that the application does not have to know how to compute it from its components. For time, on the other hand, it is not necessary to store the components because a unique decomposition can be done, and time is so important that we assume the application will know how to do this. We have quite a lot to suggest about time coordinates (sections 24-29) as it is such a complicated subject. The example of edge coordinates (5) is referred to as "boundaries" in GDT (section 21). This is not the same kind of coordinate. Boundaries are not additional axes for the data, but additional information about the coordinate values of other axes, to which they are attached. In the example above we may envisage that the data refers to "layers" in the atmosphere. Each layer has a characteristic level, given by its main coordinates, and lower and upper limits. GDT supply the boundary information like this: variables: float eta(eta); eta:component="pressure,sigma"; eta:bounds="bounds_eta"; // name of the boundary coordinate is arbitrary float bounds_eta(2,eta); // 0=boundary with smaller eta value, 1=larger float pressure(eta); pressure:bounds="bounds_pressure"; float bounds_pressure(2,eta); // elements correspond to those of bounds_eta float sigma(eta); sigma:bounds="bounds_sigma"; float bounds_sigma(2,eta); Although GDT does not discuss multidimensional coordinates, I think that the idea of boundaries can be easily extended to cover them. The convention I would suggest is float alt(x,y,z); alt:quantity="altitude"; alt:bounds="bounds_alt"; float bounds_alt(2,2,2,x,y,z); where the first dimension of bounds_alt is 0 for the alt at the lower x value, 1 at the higher x value, the second dimension for y bounds and the third for z bounds. Thus, bounds_alt[*][*][*][i][j][k] are the altitude at the eight vertices of a cube surrounding the the point [i][j][k]. (Whether these boundary dimensions should be leading or trailing is a subsidiary question.) Jonathan Gregory