Re: 2D coordinate variables

John Sheldon (jps@gfdl.gov)
Tue, 3 Jun 1997 15:59:21 -0400 (EDT)

Hi again-

It seems that Brian and I are the sole participants in this exchange,
so perhaps we can summarize this issue and get out of everyone's
mailbox :-)

In fact, there is, in the end, not that much difference in the proposed
mechanisms for specifying what I will call "curvilinear coordinates" (a
term stolen from the Iris Explorer visualization application to indicate
that the position of data points is variable and so must be specified
for each point). The original proposal from John Caron looked like this:

> dimensions:
>         lat = 4 ;
>         lon = 3 ;
> variables:
>         float lat(lat,lon) ;
>         float lon(lat,lon) ;
>         float mydata(lat,lon) ;

This is actually a fairly elegant way to specify a 4x3 grid of lat-lon
locations for the gridpoints, assuming that the downstream application
is smart enough to figure out that the 3 lon's at each lat (and the 4
lat's at each lon) for the 4x3 variable "mydata" are given in the
variables "lat" and "lon".

The alternate approach, using referential attributes, looks like this
(with a alight correction to the earlier specification in order to
make the shape the same):

> dimensions:
>         nj = 4 ;
>         ni = 3 ;
> variables:
>         float mydata(nj,ni) ;
>            mydata:nj = "latarr";
>            mydata:ni = "lonarr";
>         float latarr(ni,nj) ;
>         float lonarr(ni,nj) ;

In both cases, an application first sees that "mydata" is lat x lon,
or 4x3. Then, in the first case, it takes the name of "mydata"'s dims
and looks for variables by those names to get the coordinates and, upon
finding that they are 2D, must infer how the lat's and lon's vary from
point-to-point.

In the second case, the application takes the name of "mydata"'s dims
and looks for *attributes* by those names from which to obtain the
names of the variables from which to get the coordinates.  Upon finding
that they are 2D, it must also infer how the lat's and lon's vary from
point-to-point.

In either case, the handling of coordinates by downstream applications
must be made smarter *in exactly the same way*.  However, in the first
case, they also have to rewritten to undo existing assumptions about
"coordinate variables", or they will choke. Despite the extra level of
indirection in the second approach, I would simply feel more
comfortable limiting the the changes to an enhancement the
application's capabilities, without the need to also "fix" them where
they are not heretofore broken.

Going back to the very original question from Gary Strand (back in
March!), I'd first ask whether you need to use any existing software
for handling/visualizing netCDF data.  If so, I would recommend the
referential attributes approach, or much of that software won't work on
your data.  If not, then the first approach is fine.  But in either
case, I'm not aware of any existing software that will be able to
actually exploit the "curvilinear coordinates" you are describing. (An
upcoming release of FERRET *will* provide a mechanism whereby such a
connection can be made, though it will still choke on the 2D coordinate
variables in the first method.)

I hope this issue is pretty well exhausted by now.  However, it appears
that there may still be a bit of confusion over how "referential
attributes" work, which (it seems to me) is common whenever indirection
is involved.  Because I believe that referential attributes offer so
much potential for enhancing the value of metadata, let me address a
couple of these issues in the hope that others might also appreciate
their value...

> > The point here was that referential attributes provide a *mechanism*
> > for associating variables.
> 
> My point was that u, p and vpt are already associated by virtue of having
> identical dimension lists.  The referential attribute provides no new
> information in example 1.
> 
> I agree that referential attributes provide a mechanism for associating
> variables, but they do not describe the relationships between variables so
> associated. 

In the example from the Unidata web page:

         float u(record, z, x, y);
                 u:z = "vpt, p";
         float p(record, z, x, y);
         float vpt(record, z, x, y);

the fact that "u", "p", and "vpt" are defined on the same grid doesn't
tell you anything more than that.  But the referential attributes do,
indeed, provide additional information! They tell you that coordinates for
points along the z-dimension for variable "u" can be found in variables
"vpt" or "p".  Suppose there were also a variable

         float wspeed(record, z, x, y);

The referential attributes assign special significance to the variables
"vpt" and "p" that is not true of "wspeed".  Thus, there *is*
additional information being provided!

> >  Your objection to the lack of an actual "z" coordinate is
> > solved by specifying the attribute as 'u:z = "vpt, p, alt" ', where a
> > variable "alt" could be provided to document the actual altitude of the
> > points.  The variable "alt" would carry along its own metadata
> > documenting how it was calculated (eg, hypsometric equation, virtual
> > temperature correction, etc...).
> 
> In this example I believe that the existing convention for coordinate
> variables is the method that should be used to define z.

You are missing the very need for making reference to a "z" variable.
With the exception of z-coordinate models, the altitude of a gridpoint
varies in both space and time.  That is, there is no pre-defined
z-coordinate for each layer of the model; you can't, a priori, specify
a vector of heights (eg, 1km, 2km, 3km,...) because the altitude of a
gridpoint is different at every lat-lon and timestep. (Think of a sigma
coordinate model, where the altitude of a point depends on the terrain
height, surface pressure, and temperature and humidity profiles.)

> >   In all cases, one still has to know how to
> > interpret the referenced variable (a bit complicated, but one could,
> > conceivably, infer a relationship base on the shape of the referenced
> > variable).  There are no established conventions here - we're just
> > suggesting some alternatives that fit within the conventions that have
> > already been established WRT netCDF.
> 
> Exactly.  And therein lies the problem.  The conventions on these
> relationships that would work for describing 2D coordinate variables won't
> be the same as the one that would be required to describe what is meant by
> e.g., 'u:z = "vpt, p, alt"'

I disagree - the bulk of the new capabilities required of applications
is exactly the same,  The only difference is how they identify the
variable(s) from which to obtain the coordinates...but I've already
discussed this above...

> > I see where you're going, but this is one solution that does *not* fit
> > within the conventions that have already been established, because
> > there is a long-standing convention that variables with the same name
> > as a dimension are "coordinate variables", and should be one-
> > dimensional vectors containing the coordinates of the points along that
> > dimension.
> 
> The fact that netCDF's current convention for coordinate variables only
> applies to 1D variables makes it easy for an application to recognize that
> e.g., lon(lon,lat) is not a "coordinate variable" in the sense of the
> current convention.

You definitely have a point here. Unfortunately, I think the convention
about variables with the same name as a dimension is so well
established that many applications *expect* that such a variable will
be 1-D, and will choke if it turns out to be otherwise.  (I know this
is the case for FERRET.)

> >   This is not to say that you couldn't write your own
> > application to use such data - all the information you need is, indeed,
> > present.  But you will have some difficulty getting any existing
> > software to read it.
> 
> Any application that is looking for coordinate variables expects the 
> corresponding data to be on a rectilinear grid.  If I need 2D coordinates
> to describe the grid then it is not rectilinear and a package designed
> to plot rectilinear data shouldn't be expected to work.

That's true, but everything we've been talking about represents an
extension to existing capabilities.  Applications would need to be
upgraded to handle the curvilinear coordinates, or they will simply
fall back to treating the coordinates as "1", "2", etc.  The problem is
that, if you started using the proposed approach, they will probably
not even get that far.

> OK.  Here is the unabridged version of Example 6.
> 
> dimensions:
>         lat = 4 ;
>         lon = 3 ;
> variables:
>         float lat(lat,lon) ;
>                 lat:long_name = "Latitude" ;
> 		lat:units = "degrees_north" ;
>         float lon(lat,lon) ;
>                 lon:long_name = "Longitude" ;
> 		lon:units = "degrees_east" ;
>         float mydata(lat,lon) ;
>                 mydata:long_name = "data description" ;
> 		mydata:units = "whatever" ;
> 
> We use the units attributes of lat and lon to tell us that we are working
> on a lat-lon grid (as in COARDS).  That lat and lon are 2D arrays tells us 
> that the grid is not rectilinear.
> 
> By analogy to the current convention for coordinate variables example 6
> would be interpreted as, e.g., the lat-lon coordinates of mydata(1,1)
> are lat(1,1) and lon(1,1).

As I said, this is a perfectly logical approach, even elegant. The only
problems are (a) getting applications to interpret it (which is also
true of the referential attributes approach), and (b) existing
applications are likely to choke on the "2D coordinate variables".

I'd still be interested in hearing from others about this...????

Cheers-

John P. Sheldon 
(jps@gfdl.gov) 
Geophysical Fluid Dynamics Laboratory/NOAA 
Princeton University/Forrestal Campus/Rte. 1 
P.O. Box 308 
Princeton, NJ, USA  08542

(609) 987-5053 office
(609) 987-5063 fax