Re: coordinate systems in netcdf (again)

Russ Rew (russ@unidata.ucar.edu)
Tue, 10 Jun 1997 01:10:20 -0600

John,

> Attached is a long attempt at defining coordinate systems in a
> formalized way, along with proposals for (what else?) netcdf conventions
> on coordinate variables, and generalized coordinate systems.
> 
> Im a bit rusty at this sort of thing, so Im hoping others might have a
> look at it and give me some feedback.  Perhaps someone somewhere else
> has made a formalized specification in a more succinct way.  If so,
> I'd appreciate a pointer to it.

It's not formalized, but there's a `Coordinate Systems Overview' at:

    http://www.utexas.edu/depts/grg/gcraft/notes/coordsys/coordsys.html

that describes lots of coordinate systems used for geography and
geodesy.  Some of these seem so complex or ad hoc that we probably would
not want a set of netCDF coordinate conventions extensive enough to
encompass all of them.

> Anyway, I'm muddling around trying to capture what a coordinate system
> is in a precise way, trying to make it as general as possible.  I might
> be wrong on some fundamental level, and i'd appreciate understanding
> that if you can explain it.  Thanks!

I think you've got rectilinear coordinate systems specified clearly, but
there may be a problem in trying to use vector spaces and linear algebra
terminology to define coordinate systems that aren't vector spaces.
What are the basis vectors for a coordinate system based on (lat, lon,
height)?  They can't be (1, 0, 0), (0, 1, 0), and (0, 0, 1), in
(radians, radians, meters) because in a vector space, every vector has a
unique representation as a linear combination of the basis vectors, but
(lat, lon, height) and (lat, lon+2*pi, height) represent the same
element.

I would also like to consider the possibility of a more general notion
of coordinates, for example treating climatology data so that `month'
could be a dimension with a corresponding coordinate variable in a
dataset such as:

     ...
    dimensions:
	    lat   = 19;
	    lon   = 36;
	    month = 12;
    variables:
	    float average_temperature(month, lat, lon);
	    // coordinate variables
	    float lat(lat);
	    float lon(lon);
	    // `month' doesn't currently qualify as a coordinate variable,
	    char month(3,month) = "jan","feb","mar",...,"dec";
     ...

Here `month' might be considered a _nominal_ coordinate variable, from a
useful categorization of value types that Harvey Davies once pointed out:

  nominal:
	  Values are not ordered, e.g. `country'.  Operations such as 
	  min, max, and sort are not defined for such data.  `Closest to'
	  must be an exact match.
  ordinal:
	  Data can be ordered, but not sensibly subtracted, e.g.
	  `house_number' in street addresses or `FAA_level_number'.
	  Such data can not be interpolated.
  interval:
	  Subtraction of values are meaningful, but ratios are
	  not, e.g. Celsius temperatures.  Such data can be interpolated.
  ratio:
	  Ratio of data values meaningful, e.g. Kelvin temperatures.
          Logarithms and geometric means are possible for such data.

Coordinate variables may make sense for all of these categories, but for
nominal or ordinal coordinates, vector spaces don't seem to apply.
Harvey proposed a `measurement_level' attribute to specify the value
type according to this terminology, so an application would not attempt
meaningless operations on inappropriate data values or coordinates.

I agree with others in this thread that the simple netCDF conventions
for coordinates are currently too limited for some uses.  Extended
conventions may eliminate some of these limitations, but extensions must
be adopted carefully.  A new convention requires support from existing
and future netCDF software, making such software more difficult to
develop and maintain.

As a small step toward moving closer to resolution of extending the
netCDF conventions for coordinates, I have put together and will
maintain a Web page linking to netcdfgroup postings relevant to this
subject:

   http://www.unidata.ucar.edu/software/netcdf/coords/

Reading through these, it's clear that some of the older postings
address the same issues as recent postings and propose similar
solutions.  For example, Richard Signell's 1992 posting `Suggestion for
Coordinate Mapping convention' and Eric Pepke's 1994 posting `Sigma and
Curvilinear Grids' propose conventions relevant to the current
discussion.  Lloyd Treinish's 1992 posting `netCDF and "complex" data'
provides some elaborate examples of the power of well-designed
referential attributes.

I'll also try to maintain proposals for extensions to netCDF coordinate
system conventions from a Web page, so those who are interested can help
to refine them without needing to include all the preceding context in
every posting.  (I believe there will always be a need for evolving other
discipline-specific conventions, for which existing mechanisms seem
adequate.)

Current candidates for convention extensions include multidimensional
coordinate variables and referential attributes.  If neither of these
turns out to be adequate for solving most problems of interest, I'm not
sure we would be better off adopting both of them.  It might be better
to just document them more clearly so that datasets can use them,
applications can support them, and future data users have a common
understanding of of what these sorts of conventions mean and when they
are useful.

--Russ

_____________________________________________________________________

Russ Rew                                         UCAR Unidata Program
russ@unidata.ucar.edu                     http://www.unidata.ucar.edu