We are (some of the) numerical modellers at CSIRO Division of Marine Research in Hobart, Australia, and have been following the recent (and historical) thread regarding coordinate conventions with much interest. The model we use here for coastal and estuarine work uses curvilinear coordinates and stores data in netCDF files. For years we have done this in an ad-hoc way, using neither conventional coordinate variables nor referential attributes, but rather depending on hard-wired intelligence in our processing and plotting software. Some sort of convention for representing curvilinear grids which is compatible with the wider community would clearly be of great benefit. Recently, Russ Rew posted a draft document by Jonathan Gregory, Bob Drach and Simon Tett, essentially describing extensions to the COARDS conventions. While it is clear that much thought and work has gone into this document, we feel it is too specific, or too 'high level' for our needs. For example, it essentially perpetuates the idea of coordinate variables as having only 1 dimension, and accommodates 'rotated' grids by specifying the position of a shifted North Pole. Neither of these concepts is useful to those of us who use more general curvilinear grids, or who use grids which are not defined in lon,lat space. We feel that some lower level, more generic conventions may be of use to a wider community. Below are our (rather long) ideas on the subject, based largely on our reading of the very useful archive of postings maintained by Russ Rew. In the general case, a netCDF file is capable of storing multi-dimensional arrays of data. A general file fragment might look like this: dimensions: d1 = size1; // or perhaps UNLIMITED d2 = size2; d3 = size3; . . . variables: float data(d1, d2, d3, ...); The usual situation is that, for each data value at a given position defined by the indices (d1, d2, d3, ...), you want to be able to associate or evaluate a number of other quantities (which we will call coordinates). So, the netCDF file will contain a number of other variables which store these 'coordinate' values. Here are some very general examples of such coordinate variables, with comments interspersed: Example 1: float d1(d1); float d2(d2); . . . These are examples of the 'classic' 1-dimensional coordinate variables which conform to the existing netCDF conventions. Example 2: float d1(d1, d2, d3, ...); float d2(d1, d2, d3, ...); . . . These are examples of the multi-dimensional extension to the coordinate variable convention, proposed by a number of people. As in Example 1 above, each coordinate variable here has a name which is the same as one of the dimension names. Example 3: float coord1(d1); float coord2(d1, d2, d3, ...); float coord3(d1, d3); float coord4(d3); float coord5(d3, d4, ...); . . . This is a much more general example. Note that the variable names are not necessarily the same as the dimension names, and that different coordinate variables might have different numbers of dimensions, although their dimensions must always be a subset of the dimensions present in the associated data variable/s. Note, however, that this example does not (yet) specify any way of associating these coordinate variables with the data variable. This, by the way, is the current state of our model output files. Problems: The problem with example 1 is that it excludes the easy representation of many types of coordinate 'grids', as previously discussed by many people. It serves the purpose well only when there is a 1 to 1 mapping between data dimensions and coordinate quantities, and when each coordinate is a function of only one data dimension. Example 2 generalises the concept of a coordinate variable in a fairly natural way. Some have commented that it violates the existing 1-d convention, while others state that this need only be the case when the data is such that the existing 1-d convention is inadequate in any case. More seriously, the problem with example 2 is that it doesn't allow you to have more coordinate variables than there are dimensions, and a number of people have discussed this issue with regard to time coordinates or vertical coordinates (or even spatial coordinates - see the posting by Rich Signell in October 1992). From a purely mathematical (and esthetic) point of view, we also find the implied statement that d1, for example, depends on things other than d1, is confusing and illogical. There is a real temptation here to confuse the role of data dimensions and coordinates. In this situation it is important to note that there is no longer a 1 to 1 mapping between them. The problem with example 3 is that if you want the dataset to be self describing then you need some further mechanism to identify the association between data variables and coordinate variables. A number of people have identified referential attributes as the solution to this problem. Our approach: The above problems may be severe, mild, or irrelevant, depending on your particular application. We favour the third approach above for the following reasons: - We use curvilinear grids, so example 1 is not really useful. - We store coordinates in various projection spaces, so that we need more coordinate variables than we have dimensions. This makes example 2 of fairly limited use. Proposal 1: Our first proposal is for a low-level, general way to specify associations between data variables and coordinates in a netCDF file: Each data variable has an attribute called 'coordinates' which lists the coordinate variables associated with that data variable. Each coordinate variable has dimensions which are a subset of the dimensions of the associated data variable(s). The proposed netCDF file fragment then looks like this: dimensions: d1 = size1; d2 = size2; d3 = size3; . . . variables: float data(d1, d2, d3, ...); data:coordinates = "coord1 coord2 coord3 coord4 coord5"; // probably other attributes here as well float coord1(d1); float coord2(d1, d2, d3, ...); float coord3(d1, d3); float coord4(d3); float coord5(d3, d4, ...); Note that this is essentially identical to the 'independent_variables' attribute proposed by Rich Signell in 1992, and also similar suggestions by others since then. Note also that this approach is compatible in principle with the existing conventions, and with the multi-dimensional coordinate variable proposals. It merely adds a single extra attribute per data variable. People can still use dimension names as coordinate variable names if they want to, and they can still have 1-dimensional coordinate variables if their data grids warrant it. We stress that this is a generic, low level, and very general proposal. As well, we are happy to leave the details of things like whether to separate names by commas or white space to future debate. But how do I use this! The main thing missing from the above proposal is that it does not address the issues of how a 'generic' netCFD application is supposed to handle coordinate variables once they have been identified. To find this extra information, it is natural to use the attributes of the coordinate variables themselves. There are already very useful conventions which may help here. For example, in a 'well behaved' netCDF file, each coordinate variable would have long_name and units attributes, which a generic application could use (we have not shown such attributes in any examples above for the sake of brevity and clarity). So, for example, an application which was expecting to find latitude and longitude values could examine the units of each of the coordinate variables, hoping to find strings like "degrees_east" and "degrees_north". It may be necessary to add further helpful information which is not covered by current attribute conventions, so that, for example, each coordinate variable might have a 'quantity' attribute (as previously suggested), or 'coordinate_type' attribute. We would welcome suggestions on this point. As a final concrete example, we show below a file fragment which uses the above ideas (including an extra attribute for coordinate variables) to describe salinity output from our model: dimensions: n = UNLIMITED k = 10; j = 100; i = 100; variables: double t(n); t:long_name = "Time"; t:units = "seconds since 1990-01-01 00:00:00 +10"; t:coordinate_type = "time"; double cell_z(n,k,j,i); cell_z:long_name = "Z coordinate at cell centres"; cell_z:units = "metres"; cell_z:coordinate_type = "height"; double cell_y(j,i); cell_y:long_name = "Y coordinate at cell centres"; cell_y:units = "metres"; cell_y:coordinate_type = "Y, projection=AMG_zone_55"; double cell_x(j,i); cell_x:long_name = "X coordinate at cell centres"; cell_x:units = "metres"; cell_x:coordinate_type = "X, projection=AMG_zone_55"; double cell_lat(j,i); cell_lat:long_name = "Latitude at cell centres"; cell_lat:units = "degrees_north"; cell_lat:coordinate_type = "latitude"; double cell_lon(j,i); cell_lon:long_name = "Longitude at cell centres"; cell_lon:units = "degrees_east"; cell_lon:coordinate_type = "longitude"; double salt(n,k,j,i); salt:long_name = "Salinity"; salt:units = "1"; salt:coordinates = "t cell_z cell_y cell_x cell_lat cell_lon"; Any comments on any of the above would be most welcome. Stephen Walker Jason Waring Email: Stephen.Walker@marine.csiro.au Jason.Waring@marine.csiro.au CSIRO Marine Research Fax: 03 6232 5123 GPO Box 1538, Hobart Phone: 03 6232 5298 Tasmania, AUSTRALIA