Unidata Strawman for Storing Earth-Referencing Data

David W. Fulker

Unidata Program Center
University Corporation for Atmospheric Research
P.O. Box 3000
Boulder, CO 80307

Overall Purpose

The Unidata program was established to help university-based atmospheric scientists utilize weather data on their own small to medium-sized computer systems. Unidata systems, which have many components and can be configured in a variety of ways, are employed nationwide by nearly 100 universities. The software was designed to take full advantage of modem computer networking and, as a result, Unidata systems on differing kinds of hardware can share data with one another and they can even share task responsibilities. In other words, Unidata systems can engage in true distributed computing, and the protocols employed permit long-distance distribution (across the country, for example) if desired.

There are many interesting facets of distributed computing in heterogeneous, long-haul networks, but we will focus here on just one: data sharing. For scientific purposes, data sharing is made complex by the following factors:

a) scientific data often have multi-dimensional and other complicated structures;

b) a given structure often contains several data types, including characters, integers, floating-point numbers, and octets or bytes;

c) different computers have differing ways of storing data types and structures, so files of such data written on one computer cannot, in general, be read on another;

d) data, as used in computer programs, are most conveniently accessed as named variables and indexed arrays--in contrast, files usually must be accessed as sequences of records or bytes that become internal variables, arrays, and structures via format interpretation;

e) scientific data are often meaningless unless accompanied by a substantial amount of ancillary data, sometimes called attributes, that define such things as parameters represented, units of measure, origins of the data, and times/locations of observations;

f) similarly, scientific data may be organized on a variety of map projections or other transformations that must be well defined in a computational sense--this represents another kind of ancillary information that is particularly important if data organized on differing projections are to be compared with one another.

Thus, in the Unidata context, sharing of data means more than conveying bits from one computer to another. We are attempting systematically to address all of the aforementioned complexities in the software we provide, as articulated in the following sections.

The Unidata Program Center is sponsored by the National Science Foundation and managed by the University Corporation for Atmospheric Research. Mention of a commercial company or product does not constitute an endorsement by the Unidata Program Center.

Sharing Data in Heterogeneous Settings

The university community served by Unidata employs a very large range of computing systems to conduct its scientific activities. The National Center for Atmospheric Research (NCAR) in Boulder, Colorado, maintains Cray and IBM mainframe computers, as well as other kinds of computers, that are used by universities, NCAR, and other researchers. At their home institutions, university personnel use an almost unlimited variety of personal computers and workstations. Although standardization has increased in recent years, it remains the case that these systems have vastly differing ways of storing data in their memories and on their file systems. For example, even a seemingly simple entity such as an unsigned 32-bit integer is stored with its octets in various orders among the computers mentioned above. Even among the computers from a single manufacturer (IBM and DEC are examples) there are such differences.

Therefore, sharing data in a heterogeneous setting, as required in Unidata, can be approached in two ways: l) data can be tagged to indicate their computer of origin, and every computer can be equipped with software to transform all such types to the native mode; or 2) a common (external) data representation can be used for all files, and every computer can be equipped with software to transform data (both ways) between the native mode and the common representation. Simple arithmetic indicates that much less software must be written and supported with approach 2), and that is the approach taken for Unidata.

However, Unidata decided to approach the problem not simply as one of representing data types, but also one of representing data structures. Because scientific data can have vastly differing structures (illustrated, for example, by the differences among grids, images, and lists of point observations) we think it unwise to view the problem as one of identifying some small number of common formats, suitable for many kinds of data. Instead, we characterize the problem as: standardizing the software interfaces by which scientific data are stored and retrieved. An interface suitable for such standardization is now employed in Unidata, and it has been proven effective for many kinds of data We call this the Network Common Data Form or netCDF.

The Network Common Data Form (netCDF)

Through its interface library for the netCDF, the Unidata program has begun to standardize how scientific data are represented, filed and retrieved. As mentioned in the preceding section Unidata files employ a common data representation, and we provide software for many kinds of computers to transform data (both ways) between the (internal) native mode of the computer and Unidata's common representation. Though we will not describe the representation here (it is based on a standard known as XDR for External Data Representation), it is known to be quite compact and extraordinarily flexible; it can be used for a wide variety of data structures, containing many arrays, for example. In Unidata. this data representation is effectively hidden, because we have focused upon a standardized software interface by which data are stored in and retrieved from netCDF files.

All netCDF files are written and read through a library of Unidata software that may be called from programs written in FORTRAN or C. Data written in one language may be read in the other. This library realizes a very flexible model for a scientific data set (or file). A netCDF file can have any number of dimensions, and the dimensions are named. It may also have any number of named variables and each variable may be a multi-dimensional array employing one or more of the named dimensions. Variables are also typed, and can be integers, floating-point numbers, characters, or bytes. A netCDF file also contains named attributes, and each attribute may be associated with a variable or it may be "global," applying to the entire file.

The netCDF library provides a complete set of functions for opening, defining, querying, writing, reading, and closing netCDF files, so that no application program ever needs to know anything about the underlying data representation. The library supports direct access (by name and index) and supports bulk reads and writes (i.e., access to whole and partial arrays) so the efficiency is generally quite high. The effectiveness of the netCDF library has been demonstrated for computers as diverse as Crays, Digital VAXstations and DECstations, HP-Apollos, Macintoshes, NeXts, Suns, and Stardents, as well as IBM PCs, PS/2s, RS/6000s and mainframes.

There are now a number of software applications, both inside and outside Unidata, that employ the netCDF interface for data access, and many successful experiences have been reported. A significant number of organizations and programs--not all related to atmospheric science--are considering the netCDF for formal adoption as a data access standard.

Additional Needs for Standardization

Unidata incorporates a number of "applications" software packages into the systems that are offered to universities. As mentioned previously, many of these (at least as an option) employ the netCDF library as a means for storing and retrieving scientific data Applications in this category include: Purdue University's Weather Processor (WXP), NASA Goddard's GEMPAK, a UNIX-based image analysis package that will be available next year (but cannot be named because the procurement effort is still under way as of this writing), and several Unidata utilities. Also, translators have been developed so that other software, including the University of Wisconsin's McIDAS and New Mexico Tech's CANDIS, can use certain data that have been stored as netCDF files.

Unidata now intends to integrate several such applications into a truly coherent system, designated the Scientific Data Management (SDM) system, that is more than a collection of capabilities that use the same data access library. The several components handle very different kinds of data: satellite images, conventional surface and upper-air weather observations, and gridded data produced by numerical weather prediction and data assimilation models. A very important distinction among such applications is that they may employ different coordinate systems. Such applications can be particularly challenging to integrate, especially when required to produce properly registered graphical overlays.

In all there are many integration issues to be faced, such as choosing and setting standards for graphical output, creating consistency among user interfaces handling of information about user context (i.e., default values for parameters and methods for overriding them), and standardizing on the utilization of the netCDF method for data storage and retrieval. The remainder of this paper focuses on the last of these specifically on using the netCDF to handle earth-referencing information associated with the diverse kinds of data that are kept in netCDF files. There are two general objectives in this effort:

To achieve, for scientific data stored in netCDF files, a level of "self-description" that fully encompasses the underlying coordinate locations to which the data correspond and to do so in ways that support a variety of commonly used reference coordinates, geometric projections, and other kinds of coordinate transformations.
To create a software library which automatically calculates standard coordinate locations for data in such netCDF files and which supports mappings between the various kinds of coordinate transformations and projections encompassed in objective 1.

In this way, data that differ widely (for example, in their structures and methods of observation) but that are stored in netCDF files with suitable coordinate referencing information can be used together and compared with less development of specialized software.

For this discussion, we assume that netCDF files are to be sufficiently self-describing that few if any external references, such as station location tables, are required to determine the space-time locations of the data they contain. An important exception is the use of transforms and projections: the netCDF data model provides no way to store functions or algorithms, so transforms and projections of space-time (or other) coordinates will be referenced by name, possibly accompanied by arguments or parameters. Clearly, conventions for naming and parameterizing these transforms, as well as the algorithms for realizing them, must be defined and implemented externally, with the present netCDF paradigm.

Coordinates and netCDF Dimensions

Later in this paper we will suggest a way to use the netCDF capabilities for storing "attributes" as a way to imbed coordinate reference information in a netCDF file. Before suggesting a method for this, we wish to discuss the relationship between coordinates in a space-time sense, and dimensions in the netCDF sense, where they refer to multidimensional arrays or vectors.

As a general rule, a netCDF dimension is one of the two types described below:

mesh dimension

list dimension: we consider a netCDF dimension to be a list dimension if it is used to enumerate members of a list, each of which carries its own space-time coordinates.

The two cases are perhaps best described by examples.

This is easy to do because there is a simple, intuitive language called the Common Data Language (CDL), which allows precise definition of netCDF files using text. (A netCDF file can be defined in several ways: by using the CDL, by calling netCDF library functions in FORTRAN or C, or by combinations of these.) The following examples are intended to be self-explanatory without a formal definition for CDL.

Example 1, a Two-Dimensional Grid on Lat-Lon Coordinates

The simplest kind of coordinate referencing occurs when data are organized on a latitude-longitude mesh. A netCDF file containing a grid of surface wind-speed estimates might have two dimensions (named "lat" and "lon", each of length 3), one two-dimensional variable (named "WindSpeed") and two one-dimensional variables (named "lat", and "lon") that are related in the following way: at index pair (i,j), WindSpeed(i,j) represents the estimated wind speed in meters per second at the point whose latitude and longitude are lat(i) and lon(j) respectively.

Note: in CDL, text following "//" is interpreted only as commentary.

netCDF example 1  {	// lat-lon grid
illustration

dimensions:		// two dimensions
	lat = 3, lon = 3;

variables:		// variable names, shapes, and types 
	float WindSpeed (lat,lon);
	integer lat(lat), lon(lon);

			// attribute values 
	WindSpeed: units = ''m/s'';

data:
			//grid coordinate values
	lat - 40,   45,   50; 
	lon - -95, -100, -105;

			//data values
WindSpeed = 12., 13., 14., 
	    13., 15., 17.,
	    13., 14., 15.;	}

In this example, "lat" and "lon" are mesh dimensions. We might have defined attributes to indicate the units for the "lat" and "lon" variables, but this would get into topics that will be covered later.

Example 2, a Three-dimensional Grid on a Polar Projection

Supporting a variety of map projections is essential to any system that purports to be of broad scientific value. A netCDF file containing a grid of upper-air temperatures over a polar region might have three dimensions (named "x", "y", and "alt", of lengths 4, 3, and 2), one three-dimensional variable (named "Temperature") and three one-dimensional variables (named "x", "y", and "alt") that are related in the following way: at index triple (i,j,k), Temperature (i,j,k) represents the temperature in Kelvins at height alt(i) meters above the point (at mean sea level) defined by the vector y(j),x(k) on a suitably chosen polar stereographic projection.

netCDF example 2  {	// polar grid illustration

dimensions:		// three dimensions
	alt = 2, y = 3, x = 4;

variables:		// variable names, shapes, and types
	float Temperature(alt,y,x);
	float alt(alt), y(y), x(x);

			// attribute values
	Temperature: units = "K";
data:
			//grid coordinate values
	alt = 1.0, 10.0;
	  y =  .4,   .5,   .6;
	  x =  .3,   .4,   .5,   .6;

			//data values
Temperature = 271., 273 , 273., 271.,
	      272., 273 , 274., 273.,
	      272., 272 , 273., 272.,

	      274., 276., 276., 274.,
	      275., 276., 279., 276.,
	      275., 275., 276., 275.;

Except for being three-dimensional, example 2 is very similar to example 1 in its netCDF representation. However, processing and displaying them might be significantly different because of their distinct coordinate projections. Later, we will suggest a way to characterize the projections using netCDF attributes.

These two examples illustrate how compactly coordinate locations can be represented in netCDF files containing multidimensional grids. Instead of storing grid-point coordinates for every node in the grid, it is only necessary to store a single vector for each coordinate direction--the whole grid can be thought of as a tensor product of these individual coordinate vectors. The space savings can be immense: coordinate storage is directly related to the sum of the dimensions rather than the product, as it would be if every grid-point coordinate were stored.

All of the dimensions in the preceding two examples (lat, lon, x, y, and alt) are mesh dimensions as defined in the previous section. Two notable characteristics of such mesh dimensions may be observed:

Associated with each mesh dimension is a variable containing (monotonic) values that define (one edge of) a mesh or grid in the coordinate space; let us call a "mesh coordinate variable."
The name of a mesh coordinate variable is identical to the name of its corresponding mesh dimension.

These characteristics represent conventions or practices that we recommend only for true mesh dimensions. In particular, we advise against using identical variable and dimension names if they do not represent the edge of some well-defined coordinate grid--there are many such cases and, typically, the dimension serves to enumerate a list, with one or more coordinate variables indexed by this "list dimension." A list dimension is illustrated in the following example.

Example 3, a Collection of Surface Observations at a Single Time

A netCDF file containing surface temperature observations might be organized as a series of records or list elements. It could have one dimension (named "list", having unlimited size) and three, one-dimensional variables (named "Temp", "lat", and "lon" that are related in the following way: at index i (representing list element i or record i), Temp(i) represents the observed surface temperature in degrees Celsius at the point whose latitude and longitude are indicated by lat(i) and lon(i) respectively.

netCDF example 3  {	// surface temperature observations

dimensions:		// one dimension
	list = unlimited;

variables:		// variable names, shapes, and types
	float Temp(list);
	float lat(list), lon(list)

			// attribute values
	Temp:units = "degrees C";

data:
			// coordinate values (for each list element)
lat =  40.46,  25.40,  41.52,   33.56,   47.27,   39.54;
lon = -73.54, -80.17, -87.37, -118.24, -122.18, -105.07:


			// data values (for each list element)
Temp = 17., 24., 18., 22., 16., 20.;  }

This example is quite different from the first two, because the list dimension enumerates points whose locations are unordered. Furthermore, the dimension does not correspond to a single coordinate the coordinate names cannot both be identical to the dimension name, and the coordinate values are not monotonic. The storage consumed by coordinate values is significant, there being a position stored for every datum, but this cannot be avoided for arbitrarily located points.

In common with the preceding examples, the coordinate values are stored as variables; let us call them "coordinate variables." (Thus, we have defined coordinate variables and mesh coordinate variables.) The presence of coordinate variables represents a convention or practice that we encourage strongly. It would be possible, for example, to use a set of station identifiers, instead of coordinate locations, and to store the station locations in some other file or in the applications software. We discourage this because the space savings are modest and because it is difficult to keep such external files both current and consistent with old data sets.

Later we shall discuss formal ways to distinguish list dimensions from mesh dimensions as well as means to define their relationships to their corresponding coordinate variables and mesh coordinate variables.

Standard Coordinates and Useful Projections

Meshes and lists encompass the most common ways that scientific data are arranged for storage. Hence, distinguishing list dimensions from coordinate dimensions partially achieves our first objective, which was to achieve a level of "self-description" that fully encompasses the underlying coordinate locations to which the data correspond. However, our objective continued: ... and to do so in ways that support a variety of commonly used reference coordinates, geometric projections, and other kinds of coordinate transformations.

Although the goal refers to coordinate references in general, the needs of the Unidata program are limited primarily to coordinate systems for the surface of the earth and the atmosphere. Even in this realm we will consider only a few such systems, though we will attempt not to preclude extensions to other coordinate systems in the future. The reader should understand that these ideas are m the exploratory and formative stages, and have not yet been adopted or implemented, as of this writing.

With these limitations and caveats, let us choose a small (preliminary) list of "standard" coordinates by which geophysical data can be located in time and space. For each of these we will reserve a specific name, and each name uniquely will identify a particular coordinate with an origin, a (positive) direction, and a unit of measure.

Lat--Latitude in degrees north of the equator.

Lon--Longitude in degrees east of Greenwich.

Alt--Height in meters above mean sea level.

Pressure--Atmospheric pressure in hectopascals, commonly used as a height coordinate in the atmosphere, where decreased pressure implies increased height.

Time--Time measured in seconds of coordinated universal time (UT) since 00:00 on 1 January 1970.

DateTime--Date and coordinated universal time (UT) indicated as a character string of the form "yyyy mmm dd hh:mm:ss" where ss may include a decimal fraction.

Let us also reserve a few names for operations that are likely to be needed for specifying projections and other coordinate transformations. The following (preliminary) list defines the (vector-valued) functions and establishes argument bindings for each.

Interp(x,table)--The argument x is an n-vector, each component of which falls in the interval [0,1]; the argument table is an n-dimensional array, representing (scalar or vector) lookup values on an equally spaced mesh which exactly covers the domain of x; and the resultant (scalar or vector) function value is calculated from table by linear interpolation (in each of the n dimensions) to the point represented by x.

Project(x,type, params)--The argument x is an arbitrary two-vector; the argument type is one of several projection types, such as "polar" or" orthographic"; the argument params is a vector of values, such as central latitude and longitude, to completely specify the projection; and the resultant function value is a two-vector representing, as latitude and longitude, the projection of the point x onto the (earth's) sphere according to the specified map projection.

Time(dateTime)--The argument dateTime is a string representing a specific date and coordinated universal time (UT) m the form "yyyy mmm dd hh:mm:ss", where ss may include a decimal fraction; the resultant function value is the difference, measured in seconds, between the specified date/time and 00:00 on 1 January 1970.

It should be pointed out that these definitions are proposed as conventions, not as changes to the netCDF per se. Later, we will discuss encapsulating the definitions in a separate library to be used in conjunction with the netCDF library. In other words, we suggest implementing the coordinate reference system as a software layer imposed on top of the basic, netCDF layer for data access.

A Strawman, Using netCDF Attributes

Using the (preliminary) definitions articulated in the previous section, we suggest using the netCDF capabilities for storing "attributes" as a way to imbed coordinate reference information in a netCDF file. In particular, we suggest the addition of a global attribute, named Coord, to which can be assigned a character string that describes the relationship among netCDF dimensions, netCDF coordinate variables, standard projections/transforms, and standardized reference coordinate systems.

The string assigned to the coord attribute would be built up from a language defined specifically for coordinate referencing, and the language would employ certain reserved names for standard coordinates and for standard transforms, projections, and other useful functions that cannot be stored as data in a netCDF file. We do not, in this paper, propose to formally define such a language, but we will use a loosely defined "coordinate reference" language to illustrate the concept. For the illustration, we will use as reserved names the preliminary coordinate and operation names listed in the preceding section.

We suggest an expression for the netCDF coordinate attribute that has four sections, each characterizing a different aspect of the relationships among dimensions, coordinate variables, projections/transformations, and reference coordinates. In CDL, the form of the attribute specification would be:

: Coord = "MeshEdges {edgel, edge2,...} Coordinates {coordvarl, coordvar2,...} Transforms {operatorl, operator2,... } References {standcoordl, standcoord2,...}";

where the expressions in braces, which may be empty, obey rules that are specific to the section context. We illustrate with the rules for the MeshEdges section:

MeshEdges--Each component of the bracketed expression must be the name of a mesh dimension or of its corresponding mesh coordinate variable, and a particular ordering is advised. Using the most closely related space-time coordinate to each edge, the order of the mesh edges should be: Time, Altitude, Latitude, and Longitude. By agreeing on an order (which is independent of the dimension ordering for the various netCDF variables) display programs can provide a natural view of the data without human intervention.

List dimensions are not to be included in the expression, and if the netCDF file contains no mesh dimensions, the expression is empty.

Corresponding rules need to be devised for the other sections, but the principles can be understood without articulating the rules here.

To illustrate the use of this language for characterizing coordinate reference relationships, we repeat example 1, adding a coordinate attribute.

Example 1 with a Coordinate Attribute

netCDF example 1a  {	// lat-lon grid illustration w/ coord attribute

dimensions:		// two dimensions
	lat = 3, lon = 3;

variables:		// variable names, shapes, and types
	float WindSpeed (lat,lon);
	integer lat (lat), lon (lon);

			// variable-specific attribute
	WindSpeed: units = "m/s";

			// global attribute
:Coord =
"	MeshEdges {lat (lat), lon (lon)}
	Coordinates {lat (), lon ()}
	Transforms {}
	References {Lat = \1, Lon = \2} ";

data:
			//grid coordinate values
	lat = 40, 45, 50;
	lon = -95, -100, -105;

			//data values
WindSpeed = 12., 13., 14.,
	    13., 15., 17.,
	    13., 14., 15;  }

We interpret the coordinate attribute expression in the following way. The MeshEdges section merely indicates that variables which are dimensioned by lat and/or lon may be interpreted as grids whose coordinates are defined by the lat and/or lon variables. The Coordinates section establishes that all standard coordinate referencing is based on values stored in the lat and lon variables; the notations lat() and lon() serve to distinguish variables from dimensions. The empty Transforms section indicates that no transforms or projections are involved. The References section describes how the two components of the last non-empty section, indicated as \l and \2, may be mapped directly onto the standard latitudinal and longitudinal earth coordinates.

From this expression we may deduce that every datum having a lat index of i, such as Wind Speed(i,2), falls on the latitude line identified by lat(i). Similarly, WindSpeed(j,k) corresponds to the point on the earth identified by the latitude-longitude pair lat(j),lon(k).

Redoing example 2 provides a more interesting case because of the polar projection. For brevity, we omit the data declarations.

Example 2 with a Coordinate Attribute

netCDF example 2a {	//polar grid illustrations

dimensions:		// three dimensions
	alt = 2, y = 3, x = 4;

variables:		// variable names, shapes, and types
	float Temperature(alt,y,x);
	float alt(alt), y(y), x(x);

			// attribute values
	Temperature: units = "K";

			// global attribute
:Coord =
"	MeshEdges {alt(alt), y(y), x(x)  }
	Coordinates {alt(), y(), x()  }
	Transforms { \1, Project ({\2, \3 ), stereo,(90,0,1))  }
	References {Alt = \l,Lat = \2, Lon = \3  }";  }

We interpret the coordinate attribute expression in the following way. The MeshEdges section indicates that variables which are dimensioned by alt, y, and/or x may be interpreted as grids whose meshes are based on (the tensor product of) the alt, y and x variables. The Coordinates section establishes that standard coordinate referencing is based on values stored in the alt, y, and x variables. The Transforms section describes how coordinate variables from the previous section (y and x, indicated as \2 and \3) are employed in a projection operator; the alt coordinate variable (indicated as \1) is not used in the projection and is simply passed along for future reference. The References section describes how components of the previous section may be mapped directly onto standard altitude, latitude, and longitude coordinates; note that the projection operator in the Transforms section produces a two-vector whose components are referenced as \2, and \3 in the References section.

From this expression it can be deduced how (i,j,k) points can be mapped to standard altitude-latitude-longitude coordinates, but the latitude-longitude components depend on both the second and third dimensions, even though y and x are mesh dimensions. In other words, the underlying mesh is a tensor product only in the projected coordinate system, not in lat-lon space. Note also that the x and y values cannot be chosen arbitrarily--they must reflect projection coordinates precisely as defined for the "Project" operator.

One might observe that the projection operator is not absolutely necessary: if we were to define a standard pair of coordinates, say YP and XP, that are always used to represent points in a specific polar stereographic projection (and those matched the projection used for this example) then the Transforms section could be empty, and \2,\3 would map directly to YP,XP. However, the large number of commonly used projections (made worse by the several parameters that can be adjusted for each projection) makes this impractical as a general approach.

Now we turn to a list dimension case, redoing example 3:

Example 3 with a Coordinate Attribute

netCDF example 3a {	// surface temperature observations

dimensions: 		// one dimension
	list = unlimited;

variables:		//variable names, shapes, and types
	float Temp(list);
	float lat(list), lon(list)

			// variable-specific attribute
Temp: units = "degrees C"
			// global attribute
:Coord = 
" 	MeshEdges {}
	Coordinates {lat (), lon ()}
	Transforms {}
	References {Lat = \1, Lon = \2}"; }

data:			// coordinate values (for each list element)
lat =  40.46,  25.40,  41.52,   33.56,   47.27,   39.54;
lon = -73.54, -80.17, -87.37, -118.24, -122.18, -105.07:

			//data values (for each list element)
Temp = 17., 24., 18., 22., 16., 20.; }

Note that the coordinate attribute in the above is very similar to that of example 1a---the differences merely reflect that this example has no mesh dimensions, hence the MeshEdges expression is empty. The distinction is an important one, because software for handling gridded data is typically quite different from software for handling (irregularly located) elements of a list. However, the similarity between these examples is not artificial--it reflects that standard latitude and longitude references can be calculated directly from the available variables, and that no transformations are employed.

Now we turn to a new example. Satellite images represent some of the most challenging data to store in self-describing files, because the earth-registration of such images is quite complicated. In many ways, images are like grids: the scan lines and pixels are stored in an ordered fashion, and the relationship between these image elements and their earth locations can be thought of as a satellite "projection" although its geometry usually changes with time and is more complex than that of a typical map projection.

One way to represent earth-referencing information for a satellite image is to employ external "satellite projection" or "navigation" functions that have a small set of governing parameters. These parameters can be stored as netCDF variables and allowed to vary with scan-line index and other dimensions, as necessary. This approach requires the addition of suitable navigation functions and their parameter (or argument) descriptions to our list of reserved operator names, but otherwise the coordinate attribute expressions are somewhat similar to those of example 2a.

Example 4, Satellite Images with External Navigation

A netCDF file containing hourly infrared satellite images might have three dimensions for the images (say "time," "lin," and "pix") and a fourth dimension ("parms") to reflect the number of parameters that are passed to the navigation function. One mesh coordinate variable (named "time") would be used to store date/time values in a natural way. Unlike previous examples with mesh indices, there would be no coordinate variables associated with the lin and pix dimensions. Instead, the lin and pix indices would be used directly as arguments to the navigation functions.

netCDF example 4 {	// satellite image w/ "external" navigation

dimensions:		// four dimensions
	time = 24, lin = 1001, pix = 1001, parms = 5

variables:		// variable names, shapes, and types
	byte Irimage(time,lin,pix); // a 3-D byte array for images
	float time (time);	    //a vector of observation times
	float par(time,lin,parms);  // a matrix of navigation vectors

			// global attribute
:Coord =
"	MeshEdges {time (time), lin, pix}
	Coordinates(time(), lin, pix
	Transforms {\1, Project ({\2, \3}, satellite, par(,,*)}
	References (Time = \l, Lat = \2, Lon = \3}"; }

The MeshEdges section indicates the presence of two mesh dimensions, lin and pix, that do not have associated coordinate variables. As indicated in the Coordinates section, the values of the indices themselves are to serve as coordinate values, and these are passed (as a two-vector) to the Project operator as indicated in the Transforms section. We assume that the Project operator has been properly coded to handle satellite navigation, and that the behavior of the operator is governed by the five-vector designated par(,,*). Note that the navigation parameters vary both with time and line number. As before, the Project operator generates a two-vector representing latitude and longitude as shown in the References section.

Two Difficult Cases

To avoid the complexities and computational overhead associated with eternal navigation functions, other ways of representing earth-reference information for a satellite image have been examined. One obvious way is to store the latitude and longitude coordinates for every element of every image. As a rule, the storage required for this scheme makes it impractical.

A substantially more compact approach is to store latitude-longitude tables from which line/pixel locations can be calculated by interpolation. This takes advantage of the gridlike nature of an image and assumes that line/pixel locations vary with reasonable smoothness--if they do vary smoothly, the size of the tables can be kept reasonably small. This is a highly general way to store earth-referencing data for an image, and it has been found to be both satisfactory and computationally efficient. In the following, we illustrate how the netCDF and the coordinate attribute might be used with this approach.

Example 4 with "Lookup Table" Navigation

This example is identical to example 4 except for the use of lat-lon tables in place of external navigation functions. As before there are no coordinate variables associated with the lin and pix dimensions. Instead, the lin and DiX indices themselves would be treated as pointers into the lat-lon tables. To determine the earth location of pixel (i,j) we would interpolate to a corresponding point in the lat-lon tables, calculated under the assumption that the table edges correspond to the image edges.

netCDF example 4a {	// satellite image w/"lookup-table" navigation

dimensions:		// five dimensions
	time = 24, lint = 1001, pixt = 1001;  // three for the images
	lin = 10, pix = 10;		      // two for lat-lon tables


variables		// variable names, shapes, and types
	byte IRimage (time, lin, pix};	// a 3-D byte array for images
	float time (time);		// a vector of observation times
	float lat (time, lint, pixt), lon (time,lint,pixt);  // lat-lon tables


			// global attribute 
:Coord =
"	MeshEdges {time(time), lin, pix}
	Coordinates {time (), lin, pix}
	Transform{\l, Interp({\2,\3}, {lat(,*,*),lon(,*,*)})}
	References {Time = \1, Lat = \2, Lon = \3 }"; }

In this illustration we have, in essence, stored a lat-lon pair for every 100th pixel on every 100th scan line, and intermediate locations are calculated by interpolation. We chose the sizes so that table elements coincide exactly with particular lines and pixels, but this is not necessary--the interpolation algorithms would work with any image/table ratios.

There are several things to notice about the coordinate attribute. The MeshEdges section employs two mesh dimensions, lin and pix, that do not have corresponding mesh coordinate variables. As indicated in the Coordinates section, the values of the indexes themselves are to serve as coordinate values.

Use of the lat-lon tables is described m the Transforms section. The vector-valued Interp operator takes two indices and two tables as arguments; actually, the two tables are first combined into a vector-valued table and the result is passed to the interp operator as a single argument. Note the use of "*" to indicate which dimensions of lat and lon define the shape of the table; remember that the lat-lon tables vary with time, so the interpolation must be performed on the tables with the time index properly fixed.

As a final example, we observe that a single file easily can include data that are neither entirely gridded nor entirely characterized as elements of a list:

Example 5, a Collection of Balloon-Borne Measurements at Several Times

If a data set contains temperature measurements from a set of eight balloons that float freely, then its netCDF representation might contain two dimensions (named "platform" and "time") and five variables (named "Temp", "press", "lat", "lon", and "time") that are related in the following way: at index pair (i,j), Temp(i,j) represents the temperature recorded on balloon i at time(j), the height of balloon i is represented by press(i,j), and the (horizontal) position of balloon i at that time is represented by lat(i,j) and lon(i,j).

netCDF example 5 {	// multiple balloon measurements

dimensions:		// two dimensions
	platform = 8, time - unlimited;

variables		// variable names, types, and shapes
	float Temp (platform, time);
	int press (platform, time);
	float lat (platform, time), lon (platform, time), time (time);
			// global attributes
	:Launch	= "1990 Aug 10 16:01:07"

			// variable-specific attributes
	Temp units = "K";

			// global attribute

:Coord =
"	MeshEdges {time (time)}
	Coordinates {time (), press (), lat (), lon ()}
	Transforms {\1 + Time (:Launch), \2, \3, \4}
	References {Time = \1, Pressure = \2, Lat = \3, Lon = \4} "; }

The time variable (on the time dimension) represents the only mesh edge in this example. This makes sense because, if we hold the platform index fixed (and ignore the values of the lat and lon variables), we can view Temp as a (one-dimensional) gridded variable on a time mesh. Actually, we could do the same with variables press, lat, and lon, considering them to be time-dependent variables (rather than independent variables). If we know that the pressures are monotone along the time dimension (i.e., because the balloons are all ascending), then we could consider the variable press, for a fixed platform, to represent mesh coordinates. The coordinate attribute does not change very much:

:Coord =
"	MeshEdges {time (time), press (time)}
	Coordinates {time (), press (), lat (), lon ()}
	Transforms {\1 + Time (:Launch), \2, \3, \4}
	References {Time = \1, Pressure = \2, Lat = \3, Lon = \4} "; }

This is the first example in which a mesh coordinate variable does not carry the same name as its dimension. Indeed, Press has more than one dimension, but it can serve to define a mesh edge only in conjunction with its time dimension, i.e., the platform index must be fixed. Observe the notation carefully in the MeshEdges section.

It is a matter of choice whether to name the dimension "press" instead of "time" in the last example. The choice probably depends on how one thinks of the data being collected. However, we do not think same-named variables and dimensions should be used spuriously. To help avoid confusion, we propose the following convention for using netCDF variable and dimension names:

The name of a variable and the name of a dimension in a netCDF file should coincide only if:

the shape of the variable includes the same-named dimension (most typically, as its only dimension);

when all other indices (if any exist) are held fixed, the values of the variable are monotonic (increasing or decreasing) along the same-named dimension and represent ordered points along (the edge of) some coordinate mesh.

Such a variable is considered to be a "mesh coordinate variable," the same-named dimension is a "mesh dimension," and together they may be used as a "mesh edge" in the expression for the coordinate attribute.

Another interesting point in example 5 is that the Transforms section uses another value from the netCDF file, namely, the global attribute :Launch, which defines the time of launch. In this way, values stored in the time coordinate sample can represent elapsed time since launch and yet these values can also be related with ease to the standard (reserved name) Time coordinate, as indicated in the References section.

Conclusion

Though considerable refinement would be required, we believe it is possible to create a formal, unambiguous, and useful coordinate reference language that can be assigned to netCDF coordinate attributes in the fashion illustrated above. The purpose for doing so would be to support a "coordinate reference library" to assist in the calculation of earth-reference and other coordinate information for a very broad class of geophysical data, whether observed or modeled.

Such a library probably should provide many services. The most important ones, however, would be to perform mappings back and forth between "netCDF dimension space" and some "reference space," typically representing a space-time coordinate system. Points in "netCDF dimension space" would be identified by vectors of indices for the various mesh and list dimensions of the netCDF file. Points in the "reference space" would be represented as vectors of "standard" coordinates.

If the approach of this paper were to be followed, the coordinate reference library would probably compile the coordinate attribute expression to establish the mappings discussed in the previous paragraph. For netCDF files containing only mesh dimensions, such mappings can represent inverses of one another, except for discretization errors. In contrast, the mappings can occur only in one direction (from netCDF dimension space to reference space) for netCDF files containing only list dimensions.

Clearly, the problem is very complex, and we are not sure that this approach is the best one. Unidata is in the process of shifting to the use of object-oriented programming for internal development, and this may lead to approaches that are less complicated, through the use of classes and inheritance. In any case, we hope that this paper lays out some of the issues that must be addressed if a comprehensive solution is to be found--in particular, we believe that none of our examples are artificial and, taken together, they hint at the wide range encompassed by earth-referencing techniques for scientific data.