I presume that the recent lull in the firestorm over coordinate systems
is due to our patient rumination over the issues, and not the exhaustion
of the fuel or the firefighters...:^} Its been very useful to follow
other discussions, and I've been influenced by most of them, whether
I've agreed or disagreed, commented on them or not...
Anyway, I have come to some new impasses in my own thinking, so I've
decided to expose my current "works in progress", despite the fact that
it's not complete or satisfactory. I have new web pages:
New draft proposals at:
http://acd.ucar.edu/~caron/proposals.html
Improved version of "motivating examples" at
http://acd.ucar.edu/~caron/examples.html.
New draft formal definitions (not very coherent yet) at:
http://acd.ucar.edu/~caron/definitions.html.
I will also attach ascii versions below.
Important conclusions I have come to so far:
1) A coordinate system is best thought of as a function, and we
should carefully distinguish between its domain and its range.
2) A coordinate system is a list of coordinate functions.
Another is more tentative, and my current drafts are attempts to
implement it. It's motivated by asking "what is an axis" in the context
of a coordinate system. My tentative conclusion/decision is that in
order to capture the semantics we want for coordinate systems:
3) a coordinate function corresponds to our intuitive notion of an
axis in the coordinate space.
4) there should be a "natural" mapping from an axis to the real
numbers R; this gives us then a mapping from the entire coordinate
system range to Rn, the cartesian product space of the real numbers.
If you accept this, a lot of how the implementation has to look follows.
The main problem is how to naturally represent muli-valued coordinate
functions, like my examples 5 and 11. GDT spent a lot of time on this
issue in their proposals. I'm not happy with the (lack of) simplicity of
my solution yet.
However it does start to touch on the issues of data display, which we
should admit that we want to solve.
> Here we need to be careful to distinguish between the coordinates which
> locate the data in the physical world, and the coordinates which locate
> a given line element or pixel on a plot. They are not always the same.
> As far as I understand it, most of the discussion about coordinates
> has been about coordinates which locate the data in the real physical world,
> not plot coordinates (which may or may not be related to real-world
> coordinates
> in a simple way).
Obviously we aren't going to provide coordinates into "display space"
(pixels on a display or whatever). However we do want to provide
coordinates in "real world" space clearly enough that applications can
easily map to display space. So my solution of mapping to Rn is an
attempt to do this.
As John Sheldon insists, we still haven't made it yet to the "real
world", and so I've proposed a "geodetic" convention (still very
incomplete), much indebted to Steve Emmerson's posts, and tried to say
what the semantics of that coordinate system are. Since we are mostly an
earth science community, I havent proposed other coordinate systems like
Steve does, that would be useful in solid modeling applications. But if
some think that's important...
Finally, I'm feeling that we can't go too much further without actual
methods. For example, we can't really describe a coordinate
transformation from a projection surface to lat/lon without specifying
an initialization string to a reference implementation (for example the
USGS mapgen package). Similarly with a transformation from a vertical
coordinate to km above MSL, or whatever. Now we already have some
methods, namely those in the netcdf library itself, and to some extent
to the udunits library. We all accept those because Unidata is willing
to support/port them. Its possible that other software like mapgen could
be acceptable reference implementations. Then we can actually use
methods, and we wont be so limited by what we can represent in netcdf
files. Alternatively we wait for embedded java code in netcdf 5 (or is
it a minor upgrade to version 4, Russ? :^).
Proposed Conventions for Coordinate Systems in Netcdf
draft 7/29/97 by John Caron
Introduction
This document attempts to define metadata conventions for specifying
coordinate systems in netcdf files, and state as precisely as possible what
meaning application programs can infer from the metadata. While striving for
precision, this document will try to remain intuitive and informal. A more
formal, complete, and somewhat more rigorous set of definitions can be found
at http://acd.ucar.edu/~caron/definitions.html. Motivating examples can be
found at http://acd.ucar.edu/~caron/examples.html. The full discussion in
the netcdf user email group is at
http://www.unidata.ucar.edu/packages/netcdf/coords/ and a summary of other
proposals is at
http://www.unidata.ucar.edu/packages/netcdf/coords/proposals.html .
In order to facilitate implementation, these proposals will mark some of the
more advanced features as "level 2". Two conventions are thus
proposed: "level 1", and "level 2" that includes all of level 1.
1. General Coordinate Systems
1.1 Definitions. A coordinate system is best thought of as a function which
maps a point from an index domain (really just the set of array indices for
the variable) to a location in some user-defined coordinate space. We assume
that the maker of the netcdf file is recommending that the user of the file
view the data in the specified coordinate system.
A coordinate system is defined by a list of coordinate functions, each one
of which corresponds to an independent axis in the coordinate space. The
number of coordinate functions is the rank of the coordinate system. A
coordinate function will have a domain consisting of one or more dimensions,
and the number of dimensions in its domain is its dimensionality.
The value of a coordinate function may be a scalar (point), or a tuple of
values with meaning described below. In any case, the values refer to a
single coordinate axis, and application programs can assume that an axis
(and its coordinate function) has a natural mapping to a single dimension on
a display device.
An axis (and therefore the values of a coordinate functions) is always
considered ordered: if the coordinate values are numeric then through the
ordering on real numbers; if the coordinate values are string valued and the
function is one-dimensional, then through the nominal ordering of its
dimension; (Level 2) if the coordinate values are string valued but the
coordinate function is multi-dimensional, then the ordering may be
arbitrarily chosen by the application.
1.2 Specification. A coordinate system is specified in a netcdf file by an
attribute whose name begins with the keyword "coordinates" and whose value
is a blank or comma delimited list of coordinate functions. For example:
:coordinates = "xpos ypos time wavelength";
Users are encouraged to name their coordinate systems, using a period "." as
seperator, and must do so when they wish to specify more than one coordinate
system for the same variable:
:coordinates.latlon = "lat lon level";
:coordinates.stereo_projection = "x y z";
Any coordinate system must satisfy the following restrictions: 1) the domain
of each coordinate function must be composed of a subset of the dimensions
of any variable to which the coordinate system applies. 2) the coordinate
system, considered as a vector function, must map each point in its domain
to a unique point in its range.
1.3 Scope. A global attribute whose name begins with the keyword
"coordinates" defines a coordinate system for all variables in the file with
compatible domains. A variable attribute whose name begins with the keyword
"coordinates" defines a coordinate system for that variable and overrides a
global attribute of the same name. A variable attribute with a value equal
to a blank string or the string "none" undefines any global attribute of the
same name for that variable.
1.4 Coordinate Values.
(Level 1) A coordinate function that is a scalar netcdf variable is
considered a point along its coordinate axis.
(Level 2) A coordinate function that is multiply valued is specified by a
list of scalar netcdf variables enclosed in parenthesis:
:coordinates = "lat lon (lev_upper lev_lower lev_midpoint lev_label)";
Any netcdf variable used in a multiply-valued coordinate function must have
exactly one attribute whose name is "component", and whose value is "point",
"midpoint", "upper_bound", "lower_bound", or "name". If either the upper and
lower bounds exist, they both must exist and the value of the coordinate
function is considered to be the range [lower, upper]. A point and a
midpoint are synonyms, meaning a representative value, not necessily any
kind of mean of the interval. A coordinate function may specify both a range
and a midpoint or point, in which case the point or midpoint must be
included in the range. A label is a synonym for the coordinate value. There
may be multiple labels in a coordinate function.
1.5 Function Composition.
(level 2) A coordinate function may be a functional composition, specified
with an asterisk (*):
:coordinates = "lat*latidx lon*lonidx";
The composite function must obey the usual rules for coordinate functions.
1.6 Coordinate Variables. A variable with the same name as a dimension, with
that dimension in its domain, is the coordinate variable for that dimension.
It is recommended that coordinate variables be one-dimensional. In this case
the value of the variable must be strictly increasing or decreasing with
respect to the dimension index (the function is then said to be monotonic).
(Level 2) Multi-dimensional coordinate variables, while allowed, may be
misleading in their association with a single dimension, and are considered
less desirable. Multi-dimensional coordinate variables are not in general
monotonic, but they must satisfy the uniqueness property of any coordinate
systems of which they are part.
Coordinate variables define an "implicit" coordinate system for any variable
that uses those dimensions. For such a variable, find all dimensions with
coordinate variables: the implicit coordinate system is the one composed of
that list of coordinate variables. This can be turned off by adding a global
or variable attribute with name "coordinates.implicit" and value equal to a
blank string or the string "none". It is recommended that when multiple
coordinate systems are intended, that all be explicitly defined.
1.7. Keyword Aliasing. In order to minimize the "english-centricity" of
these conventions, all keywords can be aliased by defining a global
attribute of the form:
:alias.<keyword> = "my_ alias";
The keywords defined by this convention are:
"coordinates", "implicit"
"component", "point", "midpoint", "upper_bound", "lower_bound", "label"
"geodetic"
2. Geodetic Coordinate Systems
2.1 Definition and Specification. A geodetic coordinate system is a subtype
of a general coordinate system, whose locations can be placed in relation to
the earth. A geodetic coordinate system must specify exactly three
coordinate functions: either longitude, latitude and altitude above the
surface, or some other set of coordinates which in principle can be
transformed into those.
A geodetic coordinate system is specified by an attribute whose name begins
with "coordinates.geodetic" and whose value is the list of three coordinate
functions, in the order longitude, latitude and altitude:
:coordinates.geodetic = "lat lon pressure";
When the coordinate system has coordinate functions that are not latitude,
longitude and altitude, they should be listed in the order 1) generally
east-west, 2) generally north-south, 3) generally up-down. When there is no
obvious correspondence to these directions, then the order can be considered
a recommendation for a display to consider the first two coordinates as
"horizontal" and the third as "vertical" with respect to the earth.
----------------------------------------------------------------------------
Examples
Shorthand "solutions" of the motivating examples :
Example 1,2:
:coordinates = "lat lon";
Example 3:
:coordinates.latlon = "lat lon";
:coordinates.xy = "x y";
Example 4:
:coordinates.geodetic.hybrid = "lon lat hybrid";
:coordinates.geodetic.pressure = "lon lat pressure";
Example 5:
:coordinates = "(lev_bottom lev_top)";
float lev_bottom( level);
lev_bottom:component = "lower_bound";
float lev_top( level);
lev_top:component = "upper_bound";
Example 6:
:coordinates = "lev wavelength";
:coordinates = "rho theta z";
Example 7:
:coordinates = "lev";
:coordinates = "lat lon components";
Example 8 will need some notation not yet formally proposed, eg:
:coordinates = "lat(npoints,) lon(npoints,) lat(,npoints)
lon(,npoints)";
Example 9:
:coordinates.geodetic = "lon lat elevation";
Example 10:
:coordinates.geodetic = "lon lat pressure";
Example 11:
:coordinates = "year, day_of_year, second_of_day";
:coordinates = "generate_time, valid_time";
Example 12:
:coordinates = "lat*latidx, lon*lonidx, time"
Example 13:
:coordinates = "lon lat";
In solution one, we would have to allow missing values in coordinates,
which seems ok.
Coordinate Variables in Netcdf : Motivating Examples
Draft 7/29/97 by John Caron
Here are a number of examples that would be useful to cover in a general
coordinate system convention. I will use a shorthand to indicate the form of
the coordinate functions, rather than cdl syntax to indicate a proposed
convention. The idea is to test any proposal against these examples.
----------------------------------------------------------------------------
1. classic coordinate variables:
var(lat, lon)
lat(lat)
lon(lon)
2. scattered points; assign them a location:
var(npoints)
lat(npoints)
lon(npoints)
3. projective geometry:
var(lat, lon)
lat(lat,lon)
lon(lat,lon)
however, I think we could be clearer about this. var(lat, lon) isn't
really correct; its really a function of (x,y) on a projection surface.
So better is:
var(x, y)
x(x)
y(y)
lat(x,y)
lon(x,y)
4. hybrid coordinates:
var(lon, lat, lev)
lat(lat)
lon(lon)
lev(lon, lat, lev)
again, we're being vague; really we are describing two alternative lev
coordinates: hybrid(lev) and pressure(lon, lat, lev) so we can rewrite
as:
var(lon, lat, hybrid)
lon(lon)
lat(lat)
hybrid(hybrid)
pressure(lon, lat, hybrid)
5. edge coordinates, level is some kind of altitude coordinate:
var(level)
lev_bottom(level)
lev_top(level)
6. non- georeferencing coordinates. Up to now, all the examples have been
georeferencing coordinates. A different coordinate system that we use
in radiative transfer codes is
var(lev, wavelength)
Steve Emmerson's wire example for a spiral wire in a cylindrical
coordinate system (CCS), which is a spatial, but not georeferencing
coordinate system:
temp(s) // temperature along spiral
rho(s) // distance from CCS center axis
theta(s) // CCS azimuth
z(s) // CCS height
7. vector valued variables, for example:
vector(lev, 3)
velocity(lat,lon,component)
component(3) = "u", "v", "w"
8. correlations, using a dimension more than once:
precip(time, npoints)
precip.correlation( npoints, npoints)
lat(npoints)
lon(npoints)
Its worth specifying what the coordinate system is intended for the
variable precip.correlation. We can do so precisely by specifying the
coordinate system as a vector function, and explicitly state the
domains of the coordinate functions:
Cs(np,np) = (lat(np,), lon(np,), lat(,np), lon(,np)) -> (lat1,
lon1, lat2, lon2)
In case its not obvious, "(np,)" is a shorthand which says "take the
first index from the point in (np,np), and "(,np)" says "take the
second index from the point in (np,np).
9. a balloon or airplane trajectory:
temperature(time)
ch4(time)
lat(time)
lon(time)
elevation(time)
10. moving coordinate system. For example, satellite images tracking a
tropical hurricane keep the coordinate system centered on the hurricane
as it moves:
pressure(time, x, y)
lat(time, x, y)
lon(time, x, y)
11. multiple time coordinates:
var(time)
year(time)
day_of_year(time)
second_of_day(time)
and the famous NUWG case:
var(time)
generate_time(time)
valid_time(time)
12. sparse data variables. Adapted from Harvey Davies' example:
soil.temperature(time, land_point)
land_index(lat, lon) // if over land = land_point index, else -1
lat(lat)
lon(lon)
time(time)
or another way of getting a similar result:
soil.temperature(time, land_point)
latidx(land_point) // latitude index of ith land point
lonidx(land_point) // longitude index of ith land point
lat(lat)
lon(lon)
time(time)
13. reduced grid From CSM: the number and spacing of the longitude elements
decreases toward the pole:
Solution 1: use maximum number of longitude points, and put "missing
data" in the extra elements (note this violates the rule "no missing
data in coordinates", but it does seem natural):
var(nlat, max_lon)
lat(nlat)
lon(nlat,max_lon)
Solution 2: To save storage, you could store just the real points, but
you lose the 2-D "connectedness" of the coordinate system:
var(npoints)
lat(npoints)
lon(npoints)
Coordinate Variables in Netcdf : Definitions
Draft 7/21/97 by John Caron, with help from Brian Eaton and Russ Rew
The following tries to make formal definitions using the language of
abstract algebra. A standard reference is Algebra, Saunders MacLane and
Garrett Birkhoff, The Macmillan Company, 1967.
----------------------------------------------------------------------------
A dimension, d, is a named range of integers: d = {0,1,..size-1} (or d
{1,2,..size} if you prefer). A dimension is completely specified by the pair
(name, size).
An index domain, D, is a set constructed from the cartesian product of one
or more dimensions: D = d1 x d2 x .. x dn, where di are dimensions. The
points of D are thus tuples of integers. A projection Dp of D is a cartesian
product of a subset of the dimensions {di} that D is constructed from. (So a
point in Dp is just a point in D with 0, 1, or more indices missing). We
will also call the function p that maps D to Dp a projection of D.
A variable is a function v(D) -> C, where D is an index domain, v denotes
the function, and C is the range or codomain. The image of a function is the
set of points in C that are the values of the function. Since we consider
here only index domains, which are a finite set of points, the image of a
function is also always a finite set of points. In the context of netcdf
files, the values of a function have any of the possible data types of a
netcdf variable: double, int, string, etc.
A vector function is an ordered list of scalar functions with the same
domain, called component functions. A vector function thus maps points in D
to a tuple of values of its component scalar functions. In practice the
component functions may have domains that are projections of D. Formally
this is done by composing the component function with a projection function:
cf_formal = cf_actual * p, where * is functional composition and p is the
projection function which maps D to the domain of cf_actual.
An embedding E is an invertible map from a finite set C to Rn, the cartesian
n-product of the real numbers R. Each set of real numbers in Rn is called an
axis, so that the embedding E(C) -> Rn is a map from S onto n axes.
A coordinate function is a function whose codomain C is embedded into R. A
coordinate system is an ordered list of coordinate functions with the same
domain. The number of coordinate functions is the rank of the coordinate
system, and each is associated with a different axis of Rn ,
vector function that assigns unique physical values to the points in its
domain. It thus maps an n-tuple of integers in "index space" to an m-tuple
of reals, strings, etc. in "physical space", called a location. This mapping
must be one-to-one. Formally, we can write a coordinate system as a function
Cs(D) -> R, or equivalently Cs(D) = (C1(D), C2(D), ...,Cm(D)) -> (R1, R2,
..., Rm), where Ci(D) -> Ri is the ith scalar coordinate function. The
values of the coordinate functions are the coordinates of the coordinate
system.
For a variable v(Dv) -> Rv, and coordinate system Cs(Ds) -> Rs, Cs may be a
coordinate system for variable v when Ds is a projection of Dv. When Ds
Dv, Cs is a complete coordinate system for v, since then Cs assigns a unique
location to every value of v.
A spatial coordinate system is a coordinate system whose locations are in
3-dimensional space. A georeferencing coordinate system is a spatial
coordinate system which provides enough information to place its locations
in reference to the earth. A temporal coordinate system is one which
provides enough information to place its locations in real, physical time.