Attached is a long attempt at defining coordinate systems in a
formalized way, along with proposals for (what else?) netcdf conventions
on coordinate variables, and generalized coordinate systems.
Im a bit rusty at this sort of thing, so Im hoping others might have a
look at it and give me some feedback. Perhaps someone somewhere else
has made a formalized specification in a more succinct way. If so,
I'd appreciate a pointer to it.
Anyway, I'm muddling around trying to capture what a coordinate system
is in a precise way, trying to make it as general as possible. I might
be wrong on some fundamental level, and i'd appreciate understanding
that if you can explain it. Thanks!
(I couldnt read that attachment, so I'll just resend it here again.
Sorry for the
duplication).
--------
Dimension
A _dimension_ is a named range of integers = {0,1,..size-1}. A dimension
is completely specified by the pair (name, size). You can substitute {1..size}
in what follows if you prefer 1-based indexing.
--------
Variable
A _variable_ is a function whose domain is D0 x D1 x D2 x .. x Dn = D,
where the Di are the dimensions of the variable, and n is its _rank_.
To include scalar variables of rank 0, we define D0 = {0}.
We can thus write a variable v in functional form as v = f(D) -> R,
where f denotes the function, and R is the range. We will use v as
identical to f in what follows.
In the context of netcdf files, we represent functions as scalar arrays,
and so are limited to directly representing only scalar functions; some further
convention is needed for vector functions.
------------------
Coordinate Variable
A _coordinate variable_ is a variable that assigns physical values to a
dimension.
It must be a strictly increasing or decreasing function, and has domain
consisting of a
single dimension: CVi(Di) -> Ri so that CVi is said to be a coordinate
variable for
dimension Di.
-----------------
Coordinate System
If V is a vector space, a _coordinate system_ for V is a set of basis
vectors for V,
along with units to give each coordinate physical meaning. A _coordinate_ here
is a synonym
for basis vector.
Let D be a domain, D = D1 x D2 x .. Dn, and define a set of scaler
_coordinate functions_
fi(D) -> Ri. Let V be the vector space (R1, R2,.. , Rn). Then the vector
function
Fcs = (f1, f2, ..., fn) is said to be a coordinate system for D, Fcs(D) -> V,
if Fcs is
invertible. Given the discrete nature of D, Fcs is invertible if it is
one-to-one, meaning
Fcs maps each point in D to a unique point in V.
Given a coordinate system Fcs for domain Dc, a variable v with domain Dv,
and Dc a
subset of Dv, then Fcs is a coordinate system for v. If Dc = Dv, then Fcs is a
_complete_
coordinate system for v. The value Fcs(di) = vi for a particular value di in
the domain
is the _position vector_ for di, and the variable is said to be located at vi
for point di,
with respect to the coordinate system Fcs. (I think "Dc is a subset if Dv" is
not quite
right; I probably want to restrict Dc = D1 x D2 x .. Dk to be equal to Dv = D1
x D2 x .. Dn,
with just some dimension Di missing).
A special case of a coordinate system is one where the coordinate functions
are
coordinate variables, and so depend on a single domension Di. Then
Fcs(D1 x D2 x .. x Dn) = (f1(D1), f2(D2), ... fn(Dn)), and Fcs is said to be an
_independent_ coordinate system.
---------------------------
Coordinate Transformations
A coordinate transformation is an invertible mapping M, between two coordinate
systems.
Fcs1 and Fcs2:
Fcs1 = M * Fcs2, M-1 * Fcs1 = Fcs2.
Here * is functional composition, and M-1 indicates the inverse of M.
-------------------------------
Georeferencing Coordinate System
In a georeferencing coordinate system, or GCS for short, there are 3 spatial
dimensions x,y,z, which correspond as much as possible to the directions
"east/west",
"north/south" and "up/down", respectively. A GCS is therefore a function
Fgcs(D) -> (x,y,z)
where x,y,z describe the variable's position or spatial extent in each of the
directions.
Note that if describing spatial extent, two values are needed for each
direction, eg
x = (xleft,xright) or z = (zhigh,zlow).
===========================================
Specifying Coordinate Systems in netcdf files.
We have seen that a general coordinate system is specified by a domain
D = D1 x D2 x .. Dn, a vector space V (and associated physical units for the
basis
functions), and an invertible function Fcs(D) -> V. Netcdf semantics map
domains to
named dimensions, and units for coordinates are also very well done. Variable
arrays are
fine for describing single-valued functions. All that's really missing are
vector valued
functions.
Here is a proposal for a netcdf convention for specifying coordinate
systems.
The goal is to
1) build from existing practices.
2) keep simple things simple
3) make it flexible enough to handle any coordinate system.
So the proposal is:
1) coordinate variables remain an elegent way to define the coordinate
system when
possible.
2) allow the natural extension of coordinate variables to higher
dimensions.
Formally:
"A variable with the same name as a dimension is the coordinate
variable for that
dimension. If V is a variable with domain D1 x D2 .. Dn = D, let Dc be
the subset
of D with coordinate variables defined. Then a coordinate system is
defined on Dc
with the function
Fcs(Dc) = (cv1(D1), cv2(D2) ...)
where the cvi's are the defined coordinate variables, and the Di's are
each subsets
of D. For any such Dc, Fcs must be invertible."
You notice that coordinate variables are restricted to mapping D (in
index space)
to D (in physical coordinate space). This is a Good thing, and we try
hard to
define our dimensions so that we can do exactly that.
3) more generally, allow the specification of coordinate systems using
attributes:
"A coordinate system can be defined by an attribute whose name
starts with the
string 'coordinates' (case insensitive, optional trailing description)
and whose
value is a (comma or blank delimited) list of variable names in the
same file that
define the coordinate functions. The domain Dc of the coordinate
system is found
by forming the product of the set of any Di that is contained within
the domains of
the coordinate functions. The coordinate system is defined by the
function
Fcs(Dc) = (cv1(D1), cv2(D2) ...)
where the cvi's are the named coordinate functions"
This is meant to cover William Weibel's case of:
dimensions:
npoints = 541;
variables:
lon(npoints);
lat(npoints);
geopotential(npoints);
geopotential:coordinates = "lon lat";
and presumably any other coordinate system (?). It seems likely that
the case
var(dim, dim) would have to be excluded, ie using the same dimension
twice
in a variable declaration (?).
4) allow vector valued coordinates, to cover the famous (gen_time,
valid_time)
from NUWG:
"A vector valued coordinate function can be specified by enclosing
in
parentheses a list of variables in the same file that define each
component of
the coordinate function. Eg:
geopotential:coordinates = "lon lat (gen_time, valid_time)";
I still want to:
5) allow the specification of extents, as well as point positions
for a
coordinate function.
6) clarify a number of special things about georeferencing
coordinate systems
but I'm running out of gas, and Im not totally sure this whole thing is
solid.
So I'll stop and see if anyone can give me feedback one way or the
other.