Hi, Stephen Walker and Jason Waring wrote: > We are (some of the) numerical modellers at CSIRO Division of Marine > Research in Hobart, Australia, and have been following the recent (and > historical) thread regarding coordinate conventions with much interest. > The model we use here for coastal and estuarine work uses curvilinear > coordinates and stores data in netCDF files. For years we have done this > in an ad-hoc way, using neither conventional coordinate variables nor > referential attributes, but rather depending on hard-wired intelligence > in our processing and plotting software. Some sort of convention for > representing curvilinear grids which is compatible with the wider > community would clearly be of great benefit. > > Recently, Russ Rew posted [reference to] a draft document by Jonathan > Gregory, Bob Drach and Simon Tett, essentially describing extensions > to the COARDS conventions. While it is clear that much thought and > work has gone into this document, we feel it is too specific, or too > 'high level' for our needs. For example, it essentially perpetuates > the idea of coordinate variables as having only 1 dimension, and > accommodates 'rotated' grids by specifying the position of a shifted > North Pole. Neither of these concepts is useful to those of us who use > more general curvilinear grids, or who use grids which are not defined > in lon,lat space. We feel that some lower level, more generic > conventions may be of use to a wider community. Yes, after a first (superficial) reading of the draft proposal (at http://www-pcmdi.llnl.gov/drach/netCDF.html), I thought that it included a specification for multidimensional coordinate variables, because of their description of 2-dimensional "boundary coordinate variables", and 2-dimensional "string-valued coordinate variables". On reading the proposal again, I agree that it still essentially recommends "classic" 1-dimensional coordinate variables that use the same name as a dimension. Walker and Waring give a very clear explanation of the need for more general coordinate variables, in which different coordinate variables may have different numbers of dimensions. In general, I like their proposal, however I want to explore a modification that I describe below, after a concrete example. This example is similar to the "boundary coordinate variable" of Gregory, Drach, and Tett. Some atmospheric variables are defined on layers, each of which is specified by a bottom level and top level. For example, "relative humidity in boundary layer", "helicity", "lifted index", etc. are model output parameters that are defined for specific layers of the atmosphere. Layer Example Using A Multidimensional Coordinate Variable ========================================================== To represent a "layer coordinate" requires two values, the bottom and top of the layer, so one possible representation for such a "layer" coordinate would use a two-dimensional coordinate variable, dimensioned for the number of layers by 2: dimensions: bndlay = 5 ; // boundary layers lon = 93 ; lat = 65 ; bot_top = 2 ; // bottom and top of layer variables: float RH_bndlay(bndlay, lat, lon) ; RH_bndlay:long_name = "relative humidity in boundary layer" ; RH_bndlay:units = "percent" ; float bndlay(bndlay, bot_top) ; bndlay:long_name = "layer between 2 pressure levels" ; bndlay:units = "hPa" ; Note that unlike some other multidimensional coordinate variable examples that have been presented, there is no need for a "bot_top" coordinate variable; the relation between the values of bndlay(i, 0) and bndlay(i, 1) make clear which is the bottom and which is the top of a layer. Layer Example Using A Referential Attribute =========================================== A different representation might instead use two one-dimensional coordinate variables for the bottom and top, respectively, of each layer: dimensions: bndlay = 5 ; // boundary layers lon = 93 ; lat = 65 ; variables: float RH_bndlay(bndlay, lat, lon) ; RH_bndlay:long_name = "relative humidity in boundary layer" ; RH_bndlay:units = "percent" ; float bndlay_bot(bndlay) ; bndlay_bot:long_name = "bottom of layer" ; bndlay_bot:units = "hPa" ; float bndlay_top(bndlay) ; bndlay_top:long_name = "top of layer" ; bndlay_top:units = "hPa" ; and now we somehow have to associate the bndlay_bot and bndlay_top coordinates with the bndlay dimension, as Walker and Waring point out: > The problem [...] is that if you want the dataset to be self > describing then you need some further mechanism to identify the > association between data variables and coordinate variables. A number > of people have identified referential attributes as the solution to > this problem. Walker and Waring's Proposal 1 does this with a "coordinates" attribute, naming the coordinate variables. In the above example, this adds the variable attribute RH_bndlay:coordinates = "bndlay_bot bndlay_top lat lon"; Dimension Attributes ==================== I think each of these two representations are reasonable for a "layer" dimension convention, and have a hard time choosing between them. The first representation (multidimensional coordinate variables) doesn't require any extra conventional attributes, only a new interpretation for multidimensional variables that have the same name as a dimension. The second convention (referential attributes) is more general, because it permits using different units or other attributes (e.g. different long_name attributes as above) for the multiple components of a coordinate. One (arguably minor) problem with referential attributes as proposed is that it is English-centric, using the word "coordinates" to represent something fundamental about the structure of the data. This is in contrast to the "classic" coordinate variable convention that merely uses the fact that a dimension and variable have the same name. Another possibility that avoids this problem is to use a global attribute with the same name as a variable to refer to the variable's coordinates. In the example above, the CDL notation for the global "coordinate attribute" would be: :RH_bndlay = "bndlay_bot bndlay_top lat lon"; This is really not ideal either; it shares another problem with the "coordinates" attribute convention. If there were multiple variables that each used the same layer dimension, each would require "bndlay_bot bndlay_top" in their coordinate attributes. When information is duplicated like this, it becomes tedious to create, difficult to update, and may become inconsistent if it's changed in one place but not in others. A possibility that avoids this problem would exploit another name identification among netCDF components: a global "dimension attribute" would be an attribute with the same name as a dimension, and would name coordinate variables for that dimension. In our layer example: :bndlay = "bndlay_bot bndlay_top"; With this convention, the variable "coordinates" attribute is no longer necessary, and the information about what the bndlay dimension means is in one place. In summary, here is a proposal for this version of referential attributes, which I'll call "dimension attributes": dimensions: d1 = ...; d2 = ...; d3 = ...; ... // global attributes :d1 = "coord1 coord2 ..."; :d2 = "coord3" ... variables: float coord1(d1, d2, ...); char coord2(d1, d3); ... A "dimension attribute" is a global attribute with the same name as a dimension and of character type. It names coordinate variables associated with that dimension, in the form of a list of names of variables. Each coordinate variable mentioned in such a list must include the attribute dimension as one of its dimensions. No two tuples of coordinate variable values are the same for distinct values of the dimension. This latter requirement may seem unnecessary, but I don't think it would make sense, for example, to have two layers with the same bottom and top: in that case you really only have one layer. And this is consistent with an intuitive understanding of what a coordinate is: a tuple of coordinates uniquely identifies a point in the domain of a function. An addendum to the proposal would establish defaults for dimension attributes, making all existing "classic" 1-dimensional coordinates fit this convention: If there is no global attribute for a dimension, it is assumed to be the same as if the dimension attribute named only one variable, a coordinate variable with the same name as the dimension. I can see a couple of drawbacks to this proposal: - A global attribute with the same name as a dimension is not an obvious place to store information about coordinate variables; using a named attribute such as "coordinates" makes it clearer what is going on. - There can be only one global attribute with the same name as a dimension. Is a list of coordinate variables for the dimension the most fundamental "attribute" of the dimension? Or are there other conflicting uses for global attributes with the same name as dimensions? I think I would like to work out several more examples with this proposal, to see if it's adequate to represent sigma vertical coordinates, curvilinear grids, etc. But I'll be away from my email until 30 June, so please don't assume my lack of response indicates lack of interest. Also the archive of netCDF coordinate conventions postings at http://www.unidata.ucar.edu/software/netcdf/coords/ may not get updated until I return. --Russ _____________________________________________________________________ Russ Rew UCAR Unidata Program russ@unidata.ucar.edu http://www.unidata.ucar.edu