This is a multi-part message in MIME format. --------------167E2781446B Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I presume that the recent lull in the firestorm over coordinate systems is due to our patient rumination over the issues, and not the exhaustion of the fuel or the firefighters...:^} Its been very useful to follow other discussions, and I've been influenced by most of them, whether I've agreed or disagreed, commented on them or not... Anyway, I have come to some new impasses in my own thinking, so I've decided to expose my current "works in progress", despite the fact that it's not complete or satisfactory. I have new web pages: New draft proposals at: http://acd.ucar.edu/~caron/proposals.html Improved version of "motivating examples" at http://acd.ucar.edu/~caron/examples.html. New draft formal definitions (not very coherent yet) at: http://acd.ucar.edu/~caron/definitions.html. I will also attach ascii versions below. Important conclusions I have come to so far: 1) A coordinate system is best thought of as a function, and we should carefully distinguish between its domain and its range. 2) A coordinate system is a list of coordinate functions. Another is more tentative, and my current drafts are attempts to implement it. It's motivated by asking "what is an axis" in the context of a coordinate system. My tentative conclusion/decision is that in order to capture the semantics we want for coordinate systems: 3) a coordinate function corresponds to our intuitive notion of an axis in the coordinate space. 4) there should be a "natural" mapping from an axis to the real numbers R; this gives us then a mapping from the entire coordinate system range to Rn, the cartesian product space of the real numbers. If you accept this, a lot of how the implementation has to look follows. The main problem is how to naturally represent muli-valued coordinate functions, like my examples 5 and 11. GDT spent a lot of time on this issue in their proposals. I'm not happy with the (lack of) simplicity of my solution yet. However it does start to touch on the issues of data display, which we should admit that we want to solve. >From a recent post from Stephen.Walker: > Here we need to be careful to distinguish between the coordinates which > locate the data in the physical world, and the coordinates which locate > a given line element or pixel on a plot. They are not always the same. > As far as I understand it, most of the discussion about coordinates > has been about coordinates which locate the data in the real physical world, > not plot coordinates (which may or may not be related to real-world coordinates > in a simple way). Obviously we aren't going to provide coordinates into "display space" (pixels on a display or whatever). However we do want to provide coordinates in "real world" space clearly enough that applications can easily map to display space. So my solution of mapping to Rn is an attempt to do this. As John Sheldon insists, we still haven't made it yet to the "real world", and so I've proposed a "geodetic" convention (still very incomplete), much indebted to Steve Emmerson's posts, and tried to say what the semantics of that coordinate system are. Since we are mostly an earth science community, I havent proposed other coordinate systems like Steve does, that would be useful in solid modeling applications. But if some think that's important... Finally, I'm feeling that we can't go too much further without actual methods. For example, we can't really describe a coordinate transformation from a projection surface to lat/lon without specifying an initialization string to a reference implementation (for example the USGS mapgen package). Similarly with a transformation from a vertical coordinate to km above MSL, or whatever. Now we already have some methods, namely those in the netcdf library itself, and to some extent to the udunits library. We all accept those because Unidata is willing to support/port them. Its possible that other software like mapgen could be acceptable reference implementations. Then we can actually use methods, and we wont be so limited by what we can represent in netcdf files. Alternatively we wait for embedded java code in netcdf 5 (or is it a minor upgrade to version 4, Russ? :^). --------------167E2781446B Content-Type: text/plain; charset=us-ascii; name="proposals.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="proposals.html" Proposed Conventions for Coordinate Systems in Netcdf draft 7/29/97 by John Caron Introduction This document attempts to define metadata conventions for specifying coordinate systems in netcdf files, and state as precisely as possible what meaning application programs can infer from the metadata. While striving for precision, this document will try to remain intuitive and informal. A more formal, complete, and somewhat more rigorous set of definitions can be found at http://acd.ucar.edu/~caron/definitions.html. Motivating examples can be found at http://acd.ucar.edu/~caron/examples.html. The full discussion in the netcdf user email group is at http://www.unidata.ucar.edu/software/netcdf/coords/ and a summary of other proposals is at http://www.unidata.ucar.edu/software/netcdf/coords/proposals.html . In order to facilitate implementation, these proposals will mark some of the more advanced features as "level 2". Two conventions are thus proposed: "level 1", and "level 2" that includes all of level 1. 1. General Coordinate Systems 1.1 Definitions. A coordinate system is best thought of as a function which maps a point from an index domain (really just the set of array indices for the variable) to a location in some user-defined coordinate space. We assume that the maker of the netcdf file is recommending that the user of the file view the data in the specified coordinate system. A coordinate system is defined by a list of coordinate functions, each one of which corresponds to an independent axis in the coordinate space. The number of coordinate functions is the rank of the coordinate system. A coordinate function will have a domain consisting of one or more dimensions, and the number of dimensions in its domain is its dimensionality. The value of a coordinate function may be a scalar (point), or a tuple of values with meaning described below. In any case, the values refer to a single coordinate axis, and application programs can assume that an axis (and its coordinate function) has a natural mapping to a single dimension on a display device. An axis (and therefore the values of a coordinate functions) is always considered ordered: if the coordinate values are numeric then through the ordering on real numbers; if the coordinate values are string valued and the function is one-dimensional, then through the nominal ordering of its dimension; (Level 2) if the coordinate values are string valued but the coordinate function is multi-dimensional, then the ordering may be arbitrarily chosen by the application. 1.2 Specification. A coordinate system is specified in a netcdf file by an attribute whose name begins with the keyword "coordinates" and whose value is a blank or comma delimited list of coordinate functions. For example: :coordinates = "xpos ypos time wavelength"; Users are encouraged to name their coordinate systems, using a period "." as seperator, and must do so when they wish to specify more than one coordinate system for the same variable: :coordinates.latlon = "lat lon level"; :coordinates.stereo_projection = "x y z"; Any coordinate system must satisfy the following restrictions: 1) the domain of each coordinate function must be composed of a subset of the dimensions of any variable to which the coordinate system applies. 2) the coordinate system, considered as a vector function, must map each point in its domain to a unique point in its range. 1.3 Scope. A global attribute whose name begins with the keyword "coordinates" defines a coordinate system for all variables in the file with compatible domains. A variable attribute whose name begins with the keyword "coordinates" defines a coordinate system for that variable and overrides a global attribute of the same name. A variable attribute with a value equal to a blank string or the string "none" undefines any global attribute of the same name for that variable. 1.4 Coordinate Values. (Level 1) A coordinate function that is a scalar netcdf variable is considered a point along its coordinate axis. (Level 2) A coordinate function that is multiply valued is specified by a list of scalar netcdf variables enclosed in parenthesis: :coordinates = "lat lon (lev_upper lev_lower lev_midpoint lev_label)"; Any netcdf variable used in a multiply-valued coordinate function must have exactly one attribute whose name is "component", and whose value is "point", "midpoint", "upper_bound", "lower_bound", or "name". If either the upper and lower bounds exist, they both must exist and the value of the coordinate function is considered to be the range [lower, upper]. A point and a midpoint are synonyms, meaning a representative value, not necessily any kind of mean of the interval. A coordinate function may specify both a range and a midpoint or point, in which case the point or midpoint must be included in the range. A label is a synonym for the coordinate value. There may be multiple labels in a coordinate function. 1.5 Function Composition. (level 2) A coordinate function may be a functional composition, specified with an asterisk (*): :coordinates = "lat*latidx lon*lonidx"; The composite function must obey the usual rules for coordinate functions. 1.6 Coordinate Variables. A variable with the same name as a dimension, with that dimension in its domain, is the coordinate variable for that dimension. It is recommended that coordinate variables be one-dimensional. In this case the value of the variable must be strictly increasing or decreasing with respect to the dimension index (the function is then said to be monotonic). (Level 2) Multi-dimensional coordinate variables, while allowed, may be misleading in their association with a single dimension, and are considered less desirable. Multi-dimensional coordinate variables are not in general monotonic, but they must satisfy the uniqueness property of any coordinate systems of which they are part. Coordinate variables define an "implicit" coordinate system for any variable that uses those dimensions. For such a variable, find all dimensions with coordinate variables: the implicit coordinate system is the one composed of that list of coordinate variables. This can be turned off by adding a global or variable attribute with name "coordinates.implicit" and value equal to a blank string or the string "none". It is recommended that when multiple coordinate systems are intended, that all be explicitly defined. 1.7. Keyword Aliasing. In order to minimize the "english-centricity" of these conventions, all keywords can be aliased by defining a global attribute of the form: :alias.<keyword> = "my_ alias"; The keywords defined by this convention are: "coordinates", "implicit" "component", "point", "midpoint", "upper_bound", "lower_bound", "label" "geodetic" 2. Geodetic Coordinate Systems 2.1 Definition and Specification. A geodetic coordinate system is a subtype of a general coordinate system, whose locations can be placed in relation to the earth. A geodetic coordinate system must specify exactly three coordinate functions: either longitude, latitude and altitude above the surface, or some other set of coordinates which in principle can be transformed into those. A geodetic coordinate system is specified by an attribute whose name begins with "coordinates.geodetic" and whose value is the list of three coordinate functions, in the order longitude, latitude and altitude: :coordinates.geodetic = "lat lon pressure"; When the coordinate system has coordinate functions that are not latitude, longitude and altitude, they should be listed in the order 1) generally east-west, 2) generally north-south, 3) generally up-down. When there is no obvious correspondence to these directions, then the order can be considered a recommendation for a display to consider the first two coordinates as "horizontal" and the third as "vertical" with respect to the earth. ---------------------------------------------------------------------------- Examples Shorthand "solutions" of the motivating examples : Example 1,2: :coordinates = "lat lon"; Example 3: :coordinates.latlon = "lat lon"; :coordinates.xy = "x y"; Example 4: :coordinates.geodetic.hybrid = "lon lat hybrid"; :coordinates.geodetic.pressure = "lon lat pressure"; Example 5: :coordinates = "(lev_bottom lev_top)"; float lev_bottom( level); lev_bottom:component = "lower_bound"; float lev_top( level); lev_top:component = "upper_bound"; Example 6: :coordinates = "lev wavelength"; :coordinates = "rho theta z"; Example 7: :coordinates = "lev"; :coordinates = "lat lon components"; Example 8 will need some notation not yet formally proposed, eg: :coordinates = "lat(npoints,) lon(npoints,) lat(,npoints) lon(,npoints)"; Example 9: :coordinates.geodetic = "lon lat elevation"; Example 10: :coordinates.geodetic = "lon lat pressure"; Example 11: :coordinates = "year, day_of_year, second_of_day"; :coordinates = "generate_time, valid_time"; Example 12: :coordinates = "lat*latidx, lon*lonidx, time" Example 13: :coordinates = "lon lat"; In solution one, we would have to allow missing values in coordinates, which seems ok. --------------167E2781446B Content-Type: text/plain; charset=us-ascii; name="examples.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="examples.html" Coordinate Variables in Netcdf : Motivating Examples Draft 7/29/97 by John Caron Here are a number of examples that would be useful to cover in a general coordinate system convention. I will use a shorthand to indicate the form of the coordinate functions, rather than cdl syntax to indicate a proposed convention. The idea is to test any proposal against these examples. ---------------------------------------------------------------------------- 1. classic coordinate variables: var(lat, lon) lat(lat) lon(lon) 2. scattered points; assign them a location: var(npoints) lat(npoints) lon(npoints) 3. projective geometry: var(lat, lon) lat(lat,lon) lon(lat,lon) however, I think we could be clearer about this. var(lat, lon) isn't really correct; its really a function of (x,y) on a projection surface. So better is: var(x, y) x(x) y(y) lat(x,y) lon(x,y) 4. hybrid coordinates: var(lon, lat, lev) lat(lat) lon(lon) lev(lon, lat, lev) again, we're being vague; really we are describing two alternative lev coordinates: hybrid(lev) and pressure(lon, lat, lev) so we can rewrite as: var(lon, lat, hybrid) lon(lon) lat(lat) hybrid(hybrid) pressure(lon, lat, hybrid) 5. edge coordinates, level is some kind of altitude coordinate: var(level) lev_bottom(level) lev_top(level) 6. non- georeferencing coordinates. Up to now, all the examples have been georeferencing coordinates. A different coordinate system that we use in radiative transfer codes is var(lev, wavelength) Steve Emmerson's wire example for a spiral wire in a cylindrical coordinate system (CCS), which is a spatial, but not georeferencing coordinate system: temp(s) // temperature along spiral rho(s) // distance from CCS center axis theta(s) // CCS azimuth z(s) // CCS height 7. vector valued variables, for example: vector(lev, 3) velocity(lat,lon,component) component(3) = "u", "v", "w" 8. correlations, using a dimension more than once: precip(time, npoints) precip.correlation( npoints, npoints) lat(npoints) lon(npoints) Its worth specifying what the coordinate system is intended for the variable precip.correlation. We can do so precisely by specifying the coordinate system as a vector function, and explicitly state the domains of the coordinate functions: Cs(np,np) = (lat(np,), lon(np,), lat(,np), lon(,np)) -> (lat1, lon1, lat2, lon2) In case its not obvious, "(np,)" is a shorthand which says "take the first index from the point in (np,np), and "(,np)" says "take the second index from the point in (np,np). 9. a balloon or airplane trajectory: temperature(time) ch4(time) lat(time) lon(time) elevation(time) 10. moving coordinate system. For example, satellite images tracking a tropical hurricane keep the coordinate system centered on the hurricane as it moves: pressure(time, x, y) lat(time, x, y) lon(time, x, y) 11. multiple time coordinates: var(time) year(time) day_of_year(time) second_of_day(time) and the famous NUWG case: var(time) generate_time(time) valid_time(time) 12. sparse data variables. Adapted from Harvey Davies' example: soil.temperature(time, land_point) land_index(lat, lon) // if over land = land_point index, else -1 lat(lat) lon(lon) time(time) or another way of getting a similar result: soil.temperature(time, land_point) latidx(land_point) // latitude index of ith land point lonidx(land_point) // longitude index of ith land point lat(lat) lon(lon) time(time) 13. reduced grid From CSM: the number and spacing of the longitude elements decreases toward the pole: Solution 1: use maximum number of longitude points, and put "missing data" in the extra elements (note this violates the rule "no missing data in coordinates", but it does seem natural): var(nlat, max_lon) lat(nlat) lon(nlat,max_lon) Solution 2: To save storage, you could store just the real points, but you lose the 2-D "connectedness" of the coordinate system: var(npoints) lat(npoints) lon(npoints) --------------15FB59E21CFB Content-Type: text/plain; charset=us-ascii; name="definitions.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="definitions.html" Coordinate Variables in Netcdf : Definitions Draft 7/21/97 by John Caron, with help from Brian Eaton and Russ Rew The following tries to make formal definitions using the language of abstract algebra. A standard reference is Algebra, Saunders MacLane and Garrett Birkhoff, The Macmillan Company, 1967. ---------------------------------------------------------------------------- A dimension, d, is a named range of integers: d = {0,1,..size-1} (or d = {1,2,..size} if you prefer). A dimension is completely specified by the pair (name, size). An index domain, D, is a set constructed from the cartesian product of one or more dimensions: D = d1 x d2 x .. x dn, where di are dimensions. The points of D are thus tuples of integers. A projection Dp of D is a cartesian product of a subset of the dimensions {di} that D is constructed from. (So a point in Dp is just a point in D with 0, 1, or more indices missing). We will also call the function p that maps D to Dp a projection of D. A variable is a function v(D) -> C, where D is an index domain, v denotes the function, and C is the range or codomain. The image of a function is the set of points in C that are the values of the function. Since we consider here only index domains, which are a finite set of points, the image of a function is also always a finite set of points. In the context of netcdf files, the values of a function have any of the possible data types of a netcdf variable: double, int, string, etc. The number of dimensions in the domain of a function is its dimensionality. A vector function is an ordered list of scalar functions with the same domain, called component functions. A vector function thus maps points in D to a tuple of values of its component scalar functions. In practice the component functions may have domains that are projections of D. Formally this is done by composing the component function with a projection function: cf_formal = cf_actual * p, where * is functional composition and p is the projection function which maps D to the domain of cf_actual. An embedding E is an invertible map from a finite set C to Rn, the cartesian n-product of the real numbers R. Each set of real numbers in Rn is called an axis, so that the embedding E(C) -> Rn is a map from S onto n axes. A coordinate function is a scalar or vector function whose codomain C is embedded into R. An ordered list of coordinate functions can be considered a vector function by replacing any vector coordinate function by its list of component (scalar) functions. A coordinate system is an ordered list of coordinate functions with the same domain, which is one-to-one as a vector function. A coordinate system thus assigns unique physical values to the points in its domain: it maps an n-tuple of integers in "index space" to a unique m-tuple of reals, strings, etc. in "physical space", called a location. The number of coordinate functions is the rank of the coordinate system, and each is associated with a different axis of Rn . Formally, we can write a coordinate system as a function Cs(D) -> C, or equivalently Cs(D) = (F1(D), F2(D), ...,Fn(D)) -> (C1, C2, ..., Cn), where Fi(D) -> Ci is the ith coordinate function. The values of the coordinate functions are the coordinates of the coordinate system. For a variable v(Dv) -> Cv, and coordinate system Cs(Ds) -> Cs, Cs may be a coordinate system for variable v when Ds is a projection of Dv. When Ds = Dv, Cs is a complete coordinate system for v, since then Cs assigns a unique location to every value of v. A spatial coordinate system is a coordinate system whose locations are in 3-dimensional space. A georeferencing coordinate system is a spatial coordinate system which provides enough information to place its locations in reference to the earth. A temporal coordinate system is one which provides enough information to place its locations in real, physical time. --------------15FB59E21CFB--