4D dimensions and other conventions...

Richard P. Signell (rsignell@crusty.er.usgs.gov)
Sat, 26 Sep 92 14:54:45 EDT

In response to the discussion between Tim Holt and Ethan Alpert about
what the OarS conventions really mean, I thought I would try to give
a little bit of insight into the rationale behind the OarS
approach and the 4D variable concept.

When we first started thinking about netCDF conventions for our oceanographic
data classes, we took a look around to see what other institutions had done,
since one of the major goals of the project was to foster exchange of data
between scientific groups and to use each others developed software tools.
We found that NOAA's PMEL was developing a data analysis and display system
based on netCDF files that use certain conventions. Their tools for directly
interfacing with this data included Don Denbo's PlotPlus graphics program, a
simple command language that can be used to manipulate and perform some
simple manipulations of the data, a spectral analysis package, and a
subroutine library that sits above the netCDF library and provides a simpler
and more powerful interface when dealing with PMEL style netCDF files.

So what do PMEL's netCDF files look like? Don Denbo, who works closely with
PMEL, decided that software development would be simplified if all the
dependent variables had four dimensions, nominally three space coordinates
and time. Each independent variable must be a vector, but not necessarily
evenly spaced. As has been pointed out, this only works for "gridded" data,
but it is important to realize that in this context, "gridded" data can mean
an time-series with non-uniform sampling or CTD profiles with non-uniform
depth spacing. It is also important to note that storing a 1D time series in
a 4D representation does not use extra disk space or decrease the efficiency
with which the data is retrieved. In other words, having a 4D variable with
dimensions 400x1x1x1 is the same thing as having a 1D variable with dimension
400.

Since a number of people in Physical Oceanography at WHOI use PlotPlus, we
decided to use the PMEL conventions for data classes where it made sense:
for time series data, CTD data, for image data and other data sets on
rectilinear grids. Obviously, there are a lot of classes of data that DON'T
fit nicely into PMEL's conventions, such as shipboard data, float data, or
output from a curvilinear/sigma grid numerical model. For these classes,
conventions other than PMELs will be adopted. What we decided is, that
instead of trying to determine in advance what the conventions for some of
these different data classes should be, the first person to decide to use
netCDF for float data would basically dictate what the convention should be,
and OarS would adopt that convention (assuming it made sense).

And just a few words about the other major bugaboo: time conventions. As I
understand it, Unidata is endorsing a time convention that will need to be
parsed by the udunits library, but that this functionality has not been added
yet to udunits. In the meantime, we need to do something with our time
information. What we have decided to do in OarS is to support multiple time
conventions. One of these conventions is the PMEL convention where two long
variables called time(time) and time2(time) are defined. "time" is the
julian day (starting at midnight, not noon), and "time2" is the number of
milliseconds from 0000 on that Julian Day. The advantage of this convention
is that time can be specified with millisecond accuracy over millions of
years. The disadvantage is that is wastes space, if all you need is accuracy
to the nearest second over several years, say. So we also support time in
the form of a long variable time(time) which goes forward in units of seconds
or milliseconds from a global variable called base_date which specifies the
Gregorian start date.

So what tools so we have? Right now, the major tool is a efficient
Matlab/netCDF interface, and a collection of Matlab m-files that reads,
analyzes, and creates data with these conventions. If you would like to
check out the Matlab/netCDF interface, check out the README file available on
my anonymous ftp.

ftp crusty.er.usgs.gov (128.128.19.19)
user anonymous
passwd <your email address>
cd pub/mexcdf
get README
bye

--
Rich Signell               |  rsignell@crusty.er.usgs.gov
U.S. Geological Survey     |  (508) 457-2229  |  FAX (508) 457-2310
Quissett Campus            |  "George promised to be good... 
Woods Hole, MA  02543      |  ... but it is easy for little monkeys to forget."