Hi,
I would like to present a way to generalize the definition of coordinate
variables to include multidimensional coordinates.
Here is the current definition from the netCDF User's Guide:
A variable with the same name as a dimension is called a coordinate
variable. It typically defines a physical coordinate corresponding to that
dimension. [deleted example] Note that each coordinate variable is a
vector and has a shape consisting of just the dimension with the same name.
Here is a new definition:
Each variable in a netCDF file has a shape described by its list of
dimensions. When the name of a variable that has one or more dimensions
does not match any of the names of its dimensions then the variable is
called a dependent variable. Let v(c1,c2,...,cn) be a dependent variable
with n dimensions. The coordinate variables for v are those variables
whose names match the name of one of v's dimensions, and whose own
dimension list is a subset (not necessarily proper) of the dimensions of v.
Note that when a coordinate variable is 1D then it defines a physical
coordinate corresponding to the dimension with the same name. In the
general case however no such correspondence exists.
Example:
dimensions:
c1 = 5;
c2 = 6;
c3 = 7;
c4 = UNLIMITED;
variables:
float v(c4,c3,c2,c1);
float c4(c4);
float c3(c3,c2,c1);
float c2(c2);
float c1(c2,c1);
The interpretation is that grid point (n,k,j,i) is located in some
coordinate system at ( c4(n), c3(k,j,k), c2(j), c1(j,i) ). How the
application actually determines where this position is in physical space is
outside of the scope of this definition. But, e.g., if one were following
the COARDS conventions then this would be determined by the using the units
attribute conventions to determine the latitude, longitude, vertical, and
time coordinates.
Responses to the objections to multidimensional coordinates that have come
up in recent postings:
1) Not backwards compatible with current convention.
When an application looks for a coordinate variable presumably there is
some default behavior for the case that no such variable is found, like
issuing an error message or using a default coordinate comprised of
index values. If an application that requires a 1D coordinate variable
finds a coordinate variable but does not check to see whether or not it is
1D then there is the potential for the application to break in an
unpredictable way if it encounters a multidimensional coordinate. But the
fix is simply to add this check and then follow the behavior already
prescribed for the case where no coordinate variable is found if a 1D
variable is expected and a multiD variable is found.
2) There is no correspondence between a dimension and a coordinate in
the case where coordinate variables are not 1D. So having them share a
name is confusing.
In the original definition of coordinate variables only the 1D case was
considered and so the connection between a dimension and its corresponding
coordinate was explicitly made. In the definition for the generalized
coordinate variable it is pointed out that one can only make the
correspondence for the special 1D case. One way to think about case where
multiD coordinate variables are required is that the list of dimension
names is being used to provide a list of coordinate names.
3) It is confusing to use the same names to refer to different coordinates
in different coordinate systems.
One rebuttal to this (provided by the proposer of the objection; thanks
Steve) is that this is already true for the existing 1D coordinate
variables. I would add that dimensions and variables have separate name
spaces in netCDF implying that they are distinquishable by context. So we
need to look at more than a name to avoid confusion (just like the parser
of a CDL file must).
Summary:
The current 1D coordinate variable convention is extremely useful and it
covers the most common cases for describing a dependent variable's
coordinates. I believe that the generalization to multidimensional
coordinate variables for describing non-rectilinear coordinate systems
retains the simplicity that made the original convention so successful. It
does not address the issues of how to represent multiple coordinate systems
for the same dependent variable nor how to represent the coordinates for a
non-gridded collection of points. These will need to be resolved by new
conventions on attributes. A compelling argument for coordinate variables
is that no conventions on attributes are required and this should make them
the natural choice for describing a variable's "primary" coordinate system.
Any comments would be most appreciated.
Brian Eaton
eaton@xxxxxxxxxxxxx
NCAR, Climate and Global Dynamics Division.
Boulder, Colorado.