We've been talking conventions for netCDF data that will facilitate the development of generic applications for multidimensional scientific data. It occurred to me that maybe we should take a close look at applications that already deal with this type of data in this manner. One such application that I am familiar with is AVS (the Application Visualization System). AVS is essentially a collection of tools that read, write, process, and display data from a broad class that it calls "field" data (it also handles "unstructured cell data" for finite element applications, but since I don't use this I won't discuss it). Using AVS we have manipulated time series and depth profiles, rendered gridded bathymetric data draped with sidescan imagery, displayed scattered earthquake and CTD data, plotted velocity vectors in 3D from shipboard acoustic Doppler transects, and explored data from a 3D orthogonal curvilinear sigma coordinate model. All of this data were encompassed by the AVS "field" data class. I am proposing that we consider using the field data model as the basis for our oceanographic netCDF files, which will require adopting a few conventions. First a description of the "field" data class, which encompasses a broad range of scientific data. To define the nature of the field data, AVS needs to know 5 pieces of information: 1. The dimensions of the data space. The data array can have any number of dimensions, and the dimensions can be of any size. 2. The number of data components at each coordinate node. Each data element in the array can consist of one value or a vector of values. 3. The data representation. (i.e. byte, integer, float, etc.) 4. The dimensions of the coordinate space. This is not necessarily the same number as the number of dimensions of the data space. A drifter trajectory, for example, has only 1 data dimension (time, or record number), but it's location in the water column needs to be described by 3 spatial coordinates. 5. The nature of the mapping between the data and coordinate space. AVS allows for uniform, rectilinear or irregular mapping of data space to coordinate space. Uniform means that the data is equally spaced along each dimension, so that the coordinates can be determined from min and max extents. Rectilinear means that each dimension of data space is mapped to a corresponding dimension of coordinate space through a coordinate variable vector. This corresponds to the type of data that can be processed by PMELs EPIC format and the UNIDATA netCDF operators. In rectilinear mappings, the number of dimensions in data space and coordinate space is the same. Irregular means that there is no simple mapping between data and coordinate space, and the coordinate location of each data point is explicitly defined. The number of data and coordinate dimensions need not be the same. It is this class that allows AVS to handle curvilinear model output, scattered data in x,y,z space (like Doppler data, drifter data, and CTD data), and air-temperature on the surface of the Earth. Some AVS tools work on any data in field format, some only work with field data that have certain attributes. For example, "print field" works no matter what type of field data it is, while "compute divergence" requires that you have a 2 or 3-vector field on a uniform grid. The point is, with these attributes, you can develop tools that work in a generic manner on a WIDE class of common oceanographic data types. The bad news is, a bare bones netCDF file only supplies us with 1 of these critical pieces of information: the data representation (e.g float, int). The good news is, that with just two conventions, we could supply all the rest of the information. ***************** Convention 1: define a dimension named "components" ***************** In netCDF, we know how many dimensions the variable has, but we don't know which of these dimensions correspond to data space and which dimensions correspons to components of a vector. For example, a three-component velocity vector defined on a 2D coordinate grid might be defined dimensions: lat=20,lon=20,components=3; variables: float velocity(lat,lon,components); A priori, we don't know that "components" doesn't refer to depth, or some other coordinate dimension. Luckily, netCDF uses named dimension, so all we need to adopt a convention that defines a special dimension name that tells the application "this dimension denotes the number of data components at each coordinate node". The design plan for the SIEVE system that is being developed by the USGS in Reston incorporates this convention, suggesting "components" as the special dimension name. Seems logical to me. By adopting this convension, we pick up two more critical pieces of information, numbers 1 and 2 on the list above: the dimensions of the data space and the number of components at each coordinate node. ************ Convention 2: define a variable attribute called "independent_variables" ************ To define the mapping from data space to coordinate space, we need to specify the coordinate variables on which each data variable depends. In other words, for each dependent variable, we need to supply the independent variables. This could be accomplished by a string attribute which simply lists the independent variables. For example, a temperature record from an ocean surface drifter might be defined: Dimensions: position=1000; Variables: float temp(position); temp:long_name = "Temperature"; temp:units = "Celcius"; temp:independent_variables = "lat lon" float lon(position); lon:long_name = "Longitude"; lon:units = "degrees"; float lat(position); lat:long_name = "Latitude"; lat:units = "degrees"; which would be taken to mean that each temperature point corresponds to a 2-space coordinate given by lat and lon. Actually, due to the power of netCDF, we would only *need* to supply the attribute "independent_variables" for irregular mappings where the number of coordinate dimensions exceeds the number of data dimensions. Rectilinear mappings and irregular mappings where the data and coordinate dimensions are the same can be determined from the data and coordinate variables themselves. Uniform mappings are ugly, since they require origin and coordinate interval info to be supplied. I would propose that data coordinates must be supplied (even if evenly spaced), or else the application would assume data indices as coordinates. Getting complicated with attribute schemes for uniform data just doesn't seem worth it. Rectilinear mappings would be determined by checking to see if 1D variables exist with the same name as the named dimensions, just as defined in conventions.info. In these cases, the independent_variables attribute would be unneccessary. An example of the rectilinear mapping is the familiar: Dimensions: lat=10,lon=10; Variables: float temp(lat,lon); temp:long_name = "Temperature"; temp:units = "Celcius"; float lon(lon); lon:long_name = "Longitude"; lon:units = "degrees"; float lat(lat); lat:long_name = "Latitude"; lat:units = "degrees"; If a variable name is also a dimension name, but it is not 1D, then the mapping is irregular. For example, salinity data from a time-dependent curvilinear, sigma coordinate numerical model might look like: Dimensions: x=40,y=40,z=10,time=1000; Variables: float sal(time,z,y,x) sal:long_name="Salinity"; sal:units="psu"; float time(time); float z(z,y,x); float x(y,x); float y(y,x); The application would find the coordinate variables, and deduce that since the coordinates are greater than 1D, that the field must be irregular. It would then assume that since the variable time is 1D, that the entire salinity field at a given time index corresponds to the value of the variable time at this index. Similiarly, the application would deduce that the z locations hold for all time, and that the x and y locations hold for all depths and all times. Adopting these two conventions would allow generic applications to be developed which understand a much wider range of data types, many common in the oceanographic community. Comments? -- Rich Signell | rsignell@crusty.er.usgs.gov U.S. Geological Survey | (508) 457-2229 | FAX (508) 457-2310 Quissett Campus | "George promised to be good... Woods Hole, MA 02543 | ... but it is easy for little monkeys to forget."