Dear John,
Thank you for your posting. I found your summary of the position helpful, and
I agree with your general definitions. I would like to put forward some
differing points of view on the implementation, however.
Multiple values:
I feel that the various multiply valued coordinates you have referred to are of
several distinct types. I would argue that the "representative" or "midpoint"
value is the principal coordinate value. This one should always exist, and it
is this which must be monotonic (or at least ordered) if it is one-dimensional.
GDT distinguish three kinds of subsidiary coordinates: (1) Boundary (section
21). We group the upper and lower boundaries into one variable for tidiness and
ease of access. (2) Component (section 18). These are for cases where the
coordinate values are tuples, such as for the hybrid pressure-sigma vertical
coordinate. However, an ordinary principal coordinate value must still be
provided, for ordering the axis. (3) Associate (section 19). Associate values
are additional information, or extra ways of labelling the points, such as your
"lev_label".
I think these are all truly different. Moreover, component and associated
coordinate values can have boundaries, and associated coordinates can have
components. For this reason, rather than
:coordinates = "lon lat (lev_upper lev_lower lev_midpoint lev_label)";
I think that it would be better to specify
:coordinates="lon lat lev";
and provide the boundary, component and associate coordinates by attaching them
to lev, the representative or midpoint value. That makes for a simpler and
clearer definition of the coordinate system, and it shows that the other
information really is subsidiary to lev.
Implicit coordinate system:
For the case where all the coordinate variables are one-dimensional, I am not
convinced that it is worth the complexity of declaring the coordinate system
explicitly. If we have a variable
float temperature(lev,lat,lon);
I think it is fine to rely on the implicit convention. In fact there are two
implicit coordinate systems the application could use, either plain indices for
each of the dimensions, or the one-dimensional coordinate variables lev, lat,
lon. The application must be *able* to handle these implicit conventions, and
all old data will necessarily rely on them. So there is not much to be gained,
perhaps, by explicitly declaring the coordinate system in such a case.
Multidimensional coordinates:
By this I mean coordinate variables depending on more than one dimension - I do
not regard boundaries for one-dimensional coordinates as examples of
multidimensional coordinates. It is in this case that some kind of explicit
declaration is needed. You suggest that coordinate systems could be declared
globally, with individual data variables able to override them. I would suggest
that it would be more convenient for individual data variables always to have
the declaration, for two reasons:
(1) If a variable declares a particular coordinate system, it can be assumed
that the coordinates listed are appropriate for this variable (although you
might want to check this). It is more work to search through all the global
declarations to work out which ones apply to a particular variable.
(2) Various coordinate systems (groups of coordinate variables) might be felt
to be equivalent, even if they involved different coordinate variables. For
example, it is common to use a B-grid or C-grid in climate models. In this
case, the temperatures and velocities will have different lat and lon
coordinate vectors. Or you might have fields of different spatial resolution in
the same file. All the data variables in the file would have something you
would feel to be a "latlon" system, but these systems are different in terms of
coordinate variables.
Although I appreciate what is being suggested here, I am not clear what we gain
from these declarations. Suppose you have declared a "latlon" and a
"stereo_projection" coordinate system. Presumably you will still have to tell
the application that reads the file which system you want to use. This means
that the keywords "latlon" and "stereo_projection" will have to be be
standardised if the files are going to be portable. The application which
*generates* the file will have to know what a latlon coordinate system is
i.e. it consists of a latitude and a longitude coordinate variable, so that it
can encode the coordinate system in the file. Why is that better than
programming the application which *uses* the file to know that if asked for a
"latlon" system it must find latitude and longitude coordinate variables? If
it knows this, the coordinate system does not have to be declared in the file.
Keyword aliasing:
This is interesting. If anglocentricity is a concern, as I see it could be,
this would be a reasonable way to deal with it. In GDT we propose various other
standardised strings, especially for the quantity. For similar reasons, it
might be sensible to provide equivalent standardised strings in several
languages.
Best wishes,
Jonathan