Stephen and Jason, > Thanks for your comments on our proposal. Your message appears to > contain two main themes - bounding coordinates, and dimensional > attributes. Our comments on each of these are as follows: > > Bounding coordinates: > > You give the example of layers in the atmosphere, and the need to > store coordinates for the top and bottom of these layers. Along > the same lines, more generally, Gregory, Drach, and Tett say > in section 21 of their proposal > > > NEW: Along a dimension, the values might relate to points (at the coordinate > > values) or to contiguous or non-contiguous cells. The boundaries of the > > cells should be defined as well as the cell coordinate values. The > > convention is to define an additional two-dimensional ``boundary coordinate > > variable'' with a left-hand dimension (trailing dimension in Fortran terms) > > of size two. What I meant to illustrate with my layer example, although it didn't come across very well, was that coordinates may be useful even if they are more general than monotonic values along an axis. In particular, atmospheric layers can (and do) overlap, and overlapping layers cannot be ordered, in general. I did not mean to require matching boundaries on multiple layers. I was just looking for a real example of a coordinate-like variable that required multiple component values. > Their proposal and your example both only deal with the 1-dimensional > case. In two dimensions, a 'cell' will be defined by 4 points, and > in the general curvilinear case, each such point is specified by 2 coordinate > variable values (x and y, for example). In 3 dimensions, 8 points are needed > (defining the corners of a 'cube' for the want of a better word), although > particular cases (such as some model grids) might allow you to simplify this > (4 (x,y) points and 2 values of z, for example). In general, the specification > of the bounds of a 'cell' becomes quite messy for higher dimensions, and we > don't (yet) have a good, general proposal for addressing this problem. Our > model output files actually do store this information - we have 4 distinct > horizontal grids stored in the files, representing cell centres, cell corners, > and the centres of two adjacent faces, but at the moment, the intelligence > to interpret these is hard wired into our processing software. I was trying to present a plausible example in which 2 values are required to specify a single coordinate value (a layer). A more general higher-dimensional analogue of overlapping layers would be higher-dimensional connected sets, not necessarily rectangular cells; even in only two dimensions, it might require an infinite number of values to represent such a general set. And maybe this extrapolation to higher dimensions shows that I have an invalid generalization for coordinates, but the layer example has proved useful for representation with a single netCDF dimension. And I think it would also be useful to have a single netCDF dimension for representing geographic/political regions, such as states, countries, or provinces. Certainly one could think of representing climatology data by region using one netCDF dimension for "region", where each region had a variable-length name stored in a corresponding "region" variable. I'm hoping such a "region" variable would qualify as a coordinate variable under the right generalization of the concept of coordinate. > Dimension attributes: > > A possible drawback of our proposal is the need to maintain "coordinates" > attribute strings for each data variable, even when several data variables > have the same set of coordinate variables associated with them. Your proposal > is to eliminate this possible duplication by using global attributes having > dimension names, which list coordinate variables. As well as the drawbacks > you mention, we see several other problems with this approach: > > Firstly, almost any variable might be considered to be a coordinate variable. > For example, given a file fragment as follows: > > dimensions: > d1 = ...; > d2 = ...; > d3 = ...; > > variables: > data1(d1,d2,d3); > data2(d1,d2,d3); > > coord1(d1); > coord2(d2,d3); > coord3(d2,d3); > coord4(d2,d3); > coord5(d2,d3); > > Your scheme would have global dimension attributes as follows: > > :d1 = "coord1"; > :d2 = "coord2 coord3 coord4 coord5"; > :d3 = "coord2 coord3 coord4 coord5"; Evidently, I wasn't very clear in describing my scheme. It was not my intent to list all coordinate variables that use a dimension as coordinate variables for that dimension, but just to list a set of variables whose values uniquely determine an index for that dimension. This is analogous to a multi-field key in a relational database relation. There may be several candidate sets of fields that might serve as a key for a relation, but only one set of fields is declared to be *the key* for the relation. In your example above, if knowing the value of coord2 and coord3 were enough to determine the corresponding d2 index (by the intended meaning of d2, coord2 and coord3, not just the values in a particular dataset), then it would be sufficient to declare :d2 = "coord2 coord3"; > Incidentally, this seems to me to be perfectly valid, but it violates > your requirement that: > > > No two tuples of coordinate variable values are the same for distinct > > values of the dimension. I meant the *values* of the coordinate variables had to uniquely determine the dimension index, but again I wasn't being precise enough. But I think it is important to capture this property of a coordinate, that the values of the coordinate uniquely determine the index of the corresponding dimension. This is automatically true for a one-dimensional coordinate variable with monotonically increasing or decreasing values, but even if the coordinate variable values could not be ordered, I think you would agree that you would want them to be unique. For example, in using countries as a dimension, the character coordinate variable had better not name the same country twice. I was trying to say that for coordinate variables that had multiple components, you never want the same set of components to occur more than once, since they must uniquely determine a dimension index. > However the main point is that in some circumstances one may wish to consider > data2 as a coordinate variable for data1, or vice versa. In that case, the > global dimension attributes become: > > :d1 = "data1 data2 coord1"; > :d2 = "data1 data2 coord2 coord3 coord4 coord5"; > :d3 = "data1 data2 coord2 coord3 coord4 coord5"; > > This initially looks fine, but in fact it adds absolutely no information > to the file, as all it does is explicitly state the dimensional dependence > of each variable in the file - something that can already be found out by > (perhaps somewhat tedious) inspection of each variable. Exactly, and I agree that this would not be a desirable convention. It's not the one I meant to propose. > So, if we do allow data variables to be coordinates for other data variables, > then your dimension attribute proposal adds no information at all. If we don't > allow this, then all it really does is to identify the set of variables which > we do consider to be coordinate variables. This could be done more clearly > by having a single global attribute, as follows: > > :coordinate_variables = "coord1 coord2 coord3 coord4 coord5"; > > This is quite like our original proposal, but avoids the problem of having > to maintain coordinate attributes for each data variable. Bindings between > coordinate variables and data variables must then be worked out on the > basis of which dimensions they have in common (keeping in mind that the > dimensions of a coordinate variable must always be a subset of the dimensions > of the associated data variable). If this was adopted, someone should write > and disseminate a subroutine or set of routines which identify these bindings! I don't think these bindings can be discovered merely by examining which dimensions data variables have in common, because I think they are providing information about the intended meaning of the data that is not in the declarations of which variables depend on which dimensions. Going back to the layer example, there may be many variables that depend on a layer dimension; stating that a combination of exactly two of these variables uniquely determine the layer index is a way of capturing the meaning in the data that can be used by applications. > The main limitation of the above is that it allows less flexibility in > the association of data variables and coordinate variables. Using our > original proposal, for example, we could write: > > data1:coordinates = "coord1 coord2 coord3"; > data2:coordinates = "coord1 coord2 coord3 coord4 coord5"; > > signifying that coord4 and coord5 were appropriate coordinates for data2, > but not data1. The global attribute approach doesn't allow this, but that > may not be a big sacrifice in most applications. If great flexibility really > is required, perhaps a nested approach could be used, where the "coordinates" > attribute for a variable is used only if it is present, and otherwise a > global "coordinate_variables" attribute is used. This is a good point. But the above doesn't tell me whether a tuple of values (coord1, coord2, coord3) really represents a single conceptual coordinate for data2, or whether it is really a 5-dimensional variable with no relation among its coordinates. Perhaps this is asking too much of conventions, but I'm hoping that if the author of a dataset knows some relation among coordinates and variables that must be true, that relation can somehow be represented in declarations. > Thanks again for your comments, and we would welcome more on the material > above. We have also copied this message to Jonathan Gregory, and have also > had some correspondence with him on other aspects of our proposal. We have > not sent this message to the netCDF group, due to its length, but feel free > to forward it if you think that is appropriate. I agree that the netcdfgroup as a whole doesn't seem very interested in these coordinate conventions, but I think your proposal and comments are valuable, so I'm adding your posting and this reply to the coordinate conventions archive. I've also added John Caron to the CC: list, since I think he's interested and I value his insights on the subject. If anyone in this subgroup feels I shouldn't include their postings in the archive for anyone else to read, please let me know. > Regards, > Stephen Walker > Jason Waring Thanks again for your comments. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program russ@unidata.ucar.edu http://www.unidata.ucar.edu