Re: coordinate systems in netcdf (again)

Russ Rew (russ@unidata.ucar.edu)
Fri, 20 Jun 1997 14:09:07 -0600

Hi,

Stephen Walker and Jason Waring wrote:

> We are (some of the) numerical modellers at CSIRO Division of Marine
> Research in Hobart, Australia, and have been following the recent (and
> historical) thread regarding coordinate conventions with much interest.
> The model we use here for coastal and estuarine work uses curvilinear
> coordinates and stores data in netCDF files. For years we have done this
> in an ad-hoc way, using neither conventional coordinate variables nor
> referential attributes, but rather depending on hard-wired intelligence
> in our processing and plotting software. Some sort of convention for
> representing curvilinear grids which is compatible with the wider
> community would clearly be of great benefit.
>  
> Recently, Russ Rew posted [reference to] a draft document by Jonathan
> Gregory, Bob Drach and Simon Tett, essentially describing extensions
> to the COARDS conventions. While it is clear that much thought and
> work has gone into this document, we feel it is too specific, or too
> 'high level' for our needs. For example, it essentially perpetuates
> the idea of coordinate variables as having only 1 dimension, and
> accommodates 'rotated' grids by specifying the position of a shifted
> North Pole. Neither of these concepts is useful to those of us who use
> more general curvilinear grids, or who use grids which are not defined
> in lon,lat space. We feel that some lower level, more generic
> conventions may be of use to a wider community.

Yes, after a first (superficial) reading of the draft proposal (at
http://www-pcmdi.llnl.gov/drach/netCDF.html), I thought that it included
a specification for multidimensional coordinate variables, because of
their description of 2-dimensional "boundary coordinate variables", and
2-dimensional "string-valued coordinate variables".  On reading the
proposal again, I agree that it still essentially recommends "classic"
1-dimensional coordinate variables that use the same name as a
dimension.

Walker and Waring give a very clear explanation of the need for more
general coordinate variables, in which different coordinate variables
may have different numbers of dimensions.  In general, I like their
proposal, however I want to explore a modification that I describe
below, after a concrete example.  This example is similar to the
"boundary coordinate variable" of Gregory, Drach, and Tett.

Some atmospheric variables are defined on layers, each of which is
specified by a bottom level and top level.  For example, "relative
humidity in boundary layer", "helicity", "lifted index", etc. are model
output parameters that are defined for specific layers of the
atmosphere.

Layer Example Using A Multidimensional Coordinate Variable
==========================================================

To represent a "layer coordinate" requires two values, the bottom and
top of the layer, so one possible representation for such a "layer"
coordinate would use a two-dimensional coordinate variable, dimensioned
for the number of layers by 2:

    dimensions:

	bndlay = 5 ;           // boundary layers
	lon =  93 ;
	lat =  65 ;
	bot_top = 2 ;          // bottom and top of layer

    variables:

	float   RH_bndlay(bndlay, lat, lon) ;
		RH_bndlay:long_name = "relative humidity in boundary layer" ;
		RH_bndlay:units = "percent" ;

	float	bndlay(bndlay, bot_top) ;
		bndlay:long_name = "layer between 2 pressure levels" ;
		bndlay:units = "hPa" ;

Note that unlike some other multidimensional coordinate variable
examples that have been presented, there is no need for a "bot_top"
coordinate variable; the relation between the values of bndlay(i, 0) and
bndlay(i, 1) make clear which is the bottom and which is the top of a
layer.

Layer Example Using A Referential Attribute
===========================================

A different representation might instead use two one-dimensional
coordinate variables for the bottom and top, respectively, of each
layer:

    dimensions:

	bndlay = 5 ;           // boundary layers
	lon =  93 ;
	lat =  65 ;

    variables:

	float   RH_bndlay(bndlay, lat, lon) ;
		RH_bndlay:long_name = "relative humidity in boundary layer" ;
		RH_bndlay:units = "percent" ;

	float	bndlay_bot(bndlay) ;
		bndlay_bot:long_name = "bottom of layer" ;
		bndlay_bot:units = "hPa" ;

	float	bndlay_top(bndlay) ;
		bndlay_top:long_name = "top of layer" ;
		bndlay_top:units = "hPa" ;

and now we somehow have to associate the bndlay_bot and bndlay_top
coordinates with the bndlay dimension, as Walker and Waring point out:

> The problem [...] is that if you want the dataset to be self
> describing then you need some further mechanism to identify the
> association between data variables and coordinate variables. A number
> of people have identified referential attributes as the solution to
> this problem.

Walker and Waring's Proposal 1 does this with a "coordinates" attribute,
naming the coordinate variables.  In the above example, this adds the
variable attribute

      RH_bndlay:coordinates = "bndlay_bot bndlay_top lat lon";

Dimension Attributes
====================

I think each of these two representations are reasonable for a "layer"
dimension convention, and have a hard time choosing between them.  The
first representation (multidimensional coordinate variables) doesn't
require any extra conventional attributes, only a new interpretation for
multidimensional variables that have the same name as a dimension.  The
second convention (referential attributes) is more general, because it
permits using different units or other attributes (e.g. different
long_name attributes as above) for the multiple components of a
coordinate.

One (arguably minor) problem with referential attributes as proposed is
that it is English-centric, using the word "coordinates" to represent
something fundamental about the structure of the data.  This is in
contrast to the "classic" coordinate variable convention that merely
uses the fact that a dimension and variable have the same name.  Another
possibility that avoids this problem is to use a global attribute with
the same name as a variable to refer to the variable's coordinates.  In
the example above, the CDL notation for the global "coordinate
attribute" would be:

      :RH_bndlay = "bndlay_bot bndlay_top lat lon";

This is really not ideal either; it shares another problem with the
"coordinates" attribute convention.  If there were multiple variables
that each used the same layer dimension, each would require "bndlay_bot
bndlay_top" in their coordinate attributes.  When information is
duplicated like this, it becomes tedious to create, difficult to update,
and may become inconsistent if it's changed in one place but not in
others.  

A possibility that avoids this problem would exploit another name
identification among netCDF components: a global "dimension attribute"
would be an attribute with the same name as a dimension, and would name
coordinate variables for that dimension.  In our layer example:

      :bndlay = "bndlay_bot bndlay_top";

With this convention, the variable "coordinates" attribute is no longer
necessary, and the information about what the bndlay dimension means is
in one place.

In summary, here is a proposal for this version of referential
attributes, which I'll call "dimension attributes":

  dimensions:
     d1 = ...;
     d2 = ...;
     d3 = ...;
     ...
  // global attributes
     :d1 = "coord1 coord2 ...";
     :d2 = "coord3"
     ...
  variables:
     float coord1(d1, d2, ...);
     char  coord2(d1, d3);
     ...

  A "dimension attribute" is a global attribute with the same name as a
  dimension and of character type.  It names coordinate variables
  associated with that dimension, in the form of a list of names of
  variables.

  Each coordinate variable mentioned in such a list must include the
  attribute dimension as one of its dimensions.

  No two tuples of coordinate variable values are the same for distinct
  values of the dimension.

This latter requirement may seem unnecessary, but I don't think it would
make sense, for example, to have two layers with the same bottom and
top: in that case you really only have one layer.  And this is
consistent with an intuitive understanding of what a coordinate is: a
tuple of coordinates uniquely identifies a point in the domain of a
function.

An addendum to the proposal would establish defaults for dimension
attributes, making all existing "classic" 1-dimensional coordinates fit
this convention:

  If there is no global attribute for a dimension, it is assumed to
  be the same as if the dimension attribute named only one variable, a
  coordinate variable with the same name as the dimension.

I can see a couple of drawbacks to this proposal:

 - A global attribute with the same name as a dimension is not an
   obvious place to store information about coordinate variables; using
   a named attribute such as "coordinates" makes it clearer what is
   going on.

 - There can be only one global attribute with the same name as a
   dimension.  Is a list of coordinate variables for the dimension the
   most fundamental "attribute" of the dimension?  Or are there other
   conflicting uses for global attributes with the same name as
   dimensions?

I think I would like to work out several more examples with this
proposal, to see if it's adequate to represent sigma vertical
coordinates, curvilinear grids, etc.  

But I'll be away from my email until 30 June, so please don't assume my
lack of response indicates lack of interest.  Also the archive of netCDF
coordinate conventions postings at
http://www.unidata.ucar.edu/software/netcdf/coords/ may not get updated
until I return.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
russ@unidata.ucar.edu                     http://www.unidata.ucar.edu