Hello all,
After the flurry of messages yesterday its almost embarrassing to
contribute further to everyone's mail load ... This message is a
follow-up on messages of September 28 about a straw man profile
for oceanographic netCDF files. Sorry for the delay in getting
back - I took out some time to coordinate this idea with our
Unidata folks.
Attached to this message is a "Draft no. 1" straw man. The
introductory portion of the document suggests a process of using
Internet discussion to progress the straw man into an open and
agreed-upon standard for our netCDF files in oceanography. The
first step it suggests is to expand the preliminary list of
issues and possible resolutions to those issues. Given the
recent discussions about email traffic we may need to revise this
procedure - perhaps develop a private discussion group - but
until we see if we are creating a traffic problem please send
your responses to netcdfgroup.
There is a certain element of inconvenience to creating a
standard at this stage which I want to acknowledge at the outset.
A number of you have already created products using netCDF. Many
of the issues that need to be discussed in creating an open
standard have already been raised and resolved to your individual
satisfaction. I am in that situation myself, as the FERRET
program (described in unidata.ucar.edu anonymous ftp
pub/netcdf/utilities.txt) already supports netCDF. There can be
a certain reluctance to reopening these issues and contributing
to a document that will no doubt conflict in some cases with the
choices we have made.
Yet, the developers who have already created netCDF products and
have thought through the issues are the essential contributors to
creating a quality, inter-operable standard. This is the normal
progression in the creation of open standards - a recent example
being the ANSI C committee where software vendors (et al.) who
already had C implementations worked together to resolve
interoperability problems and extend functionality. And like the
ANSI committees we should not expect quick results (though we can
hope for faster progress than ANSI committees!).
This process is an experiment - open standards often do not come
about easily. I am optimistic it will work but it requires
active input from all interested parties.
| NOAA/PMEL | ph. (206) 526-6080
Steve Hankin | 7600 Sand Point Way NE | FAX (206) 526-6744
| Seattle, WA 98115-0070 | hankin@xxxxxxxxxxxx
P.S. The profile.oceanography file will also be available via
anonymous ftp at unidata.in pub/netcdf.
================================================================
file: cdfprofile.oceanography
OCEANOGRAPHIC CONVENTIONS for NETCDF
Draft no.1 - 10/7/92
Phase 1 - identifying issues
CONTENTS:
Introduction.
Guidelines for Creating Profiles.
Guidelines for Robust Applications.
Oceanographic Profile Issues.
Introduction.
This file, cdfprofile.oceanography, will document a profile of
conventions to standardize the usage of netCDF for many
oceanographic data applications. The primary goal of this
standardization is to facilitate data interchange. The document
is expected to develop in three phases:
Phase 1 - We will enumerate the issues relevant to
oceanographic data storage with netCDF and provide several
alternative resolutions to issues. In Phase 1 **all**
aspects of this document are open to comments and revision.
Phase 2 - Through email dialog we will discuss the
resolutions of the issues that have been identified working
towards a consensus on each issue. In Phase 2 the basic
layout and strategy of the document will be fixed, however
all technical content will be revisable. New issues will be
added only if there is general agreement that they are vital
to the document. (At this stage the dialog may be shifted
off of netcdfgroup.)
Phase 3 - We will edit the document removing ambiguities and
producing stable, readable text. Issues that become
apparent during implementations must also be resolved at
this time.
The entire process of arriving at a standardized profile should
be open and should reflect the views of all members of the
"oceanographic community" (an admittedly ambiguous term) who wish
to participate. If a consensus cannot be reached on some issues
it may be necessary to formalize a "voting" procedure to resolve
the issues. (These procedures could be defined in Phase 1 of the
document? volunteer?)
The scope of this work is broader than what we can hope fully to
achieve. Some issues may need to be classified as "Beyond the
scope of this document". Again, we should try to reach general
agreement before classifying an issue this way.
Producing a final document that unambiguously describes all of
the issues and resolutions will clearly be a significant piece of
work. This can only be accomplished if we each offer complete
and concise text when we make a contribution. I (Steve Hankin)
will volunteer to serve as the document editor - pulling our
contributions together into a single document and making it
available via email and/or anonymous ftp. Since this is to be an
entirely open process please speak your mind if you know of a
preferable prospect for document editor.
This document presumes the contents of "conventions.info"
(unidata.ucar.edu anonymous ftp directory pub/netcdf) and will
not duplicate what is already described there. As both
conventions.info and profile.oceanography will be evolving in
parallel we will need to coordinate the documents throughout
their evolutions.
Guidelines for Creating Profiles.
In the process of discussing issues and comparing alternative
resolutions an explicit set of "guiding principles" would be an
asset. Such principles include (please extend):
o keep it simple (avoid proliferation of attributes)
o minimize restrictions (don't reduce functionality)
o profile-compliant files should remain intelligible to
applications that know nothing of the profile (where
possible)
Guidelines for Robust Applications.
Application programs will in general be far more restrictive in
scope than the conventions described herein. These application
programs can still perform useful work on many netCDF files that
observe the conventions if they observe the following
"motherhood" rules:
o meta rule: don't crash, don't give up if possible
o Uninterpretable attributes should be ignored
o Variables with unsupported data types should be ignored
o Applications should not assume particular units will be
attached to particular variable names.
o Applications that require recognized variable names should
ignore variable names they do not recognize
o Applications should avoid assumptions about the structure
of the netCDF file:
- dimensions may be defined which are unused
- variables may use dimensions which have no corresponding
coordinates defined
- etc. (expand list)
Oceanographic Profile Issues.
1) Time axis representation
The file "conventions.info" suggests (e.g.)
variables:
double time(nobs);
time:units = "milliseconds since (1992-9-16
10:09:55.3 -600)"
(This will be implemented shortly in the udunits library.)
Should we impose restrictions on data types (double, float,
etc)? How should we standardize the format for the date
string? (is this specificed by udunits?)
2) How to determine the orientation of a coordinate variable
The orientation of coordinate axes can be specified through
a variety of mechanisms: agreed-upon names such as
"lat","lon", etc.; implicit orientations inferred from the
ordering of dimension names within a variable definition;
orientations inferred from the units of the coordinate
variable. None of these mechanisms appear to be adequate
in all cases.
Alternative 1:
Minimal restrictions on the naming of coordinate variables
and choice of units. Applications should apply a
multi-step algorithm to identify orientation as follows:
First - check the units of the coordinate variable:
Do the units imply a unique orientation (e.g. units of
time, "degrees longitude", "layer", etc.) ?
If no, then check the name of the coordinate variable:
Does the variable name match a template (e.g. *depth*,
*lon*, *lat*, *time*, x*, y*, z*, t*, etc.)?
Is this approach too complex? What about cases where the
orientation remains ambiguous?
Alternative 2:
Introduce a variable attribute 'orientation' with a
suitable naming convention for orientation strings (e.g.
"west-east", "south-north")
Should this be an optional attribute that can be applied
when the Alternative 1 technique fails?
3) Indicating Missing Data
Two attributes for missing data have been suggested:
missing_value and _FillValue. The missing_value attribute
has been dropped in netCDF version 2.0. Is there a need to
support both attributes?
4) Case-insensitive Names
Should application programs be case-sensitive with respect
to attribute and variable names? Should variable and
attribute names within a single file be required to be case-
insensitive-unique? (This refers to the **names** only;
the values of string attibutes such as units would remain
case-sensitive.)
Alternative 1: Case-insensitive. The peculiarities of Unix
and C, while familiar to programmers, are not necessarily
comfortable for users. Publication and conversation are
complexified by case-sensitive names.
Alternative 2: Case-sensitive. Case-insensitivity would
lead to incompatibilities with non-oceanographic netCDF
files. There are conveniences to the use of e.g. "time",
"Time", and "TIME" within the same file.
5) Multiple Time Axes in a File
Is there a need for multiple time axes defined within a
single netCDF file? Or is there a reason to limit files to
a single time axis? (Multiple time axes would conflict with
some time encodings that have been discussed that involve
global variables.)
Alternative 1: Permit multiple time axes (no conflict with
time axes as suggested in conventions.info).
6) Need a global attribute to indicate profile type and revision
There should be a global attribute informing application
programs explicitly what netCDF profile and revision a file
adheres to. This issue needs to be addressed at a level
higher than this oceanographic profile but some
recommendations would be appropriate.
Alternative 1:
:profile = "oceanography";
:profile_version = 1.0;
7) Standardized (Conventional) Variable Names
The meteorological community has suggested a list of
standardized variable names (see conventions.info). Should
this list be extended to include additional oceanographic
variables? How should these names fit this into the
framework of "resources" as described in conventions.info?
(We need input from folks familiar with "resources" in this
context.)
8) Name String Lengths
Should attribute and variable names be further restricted
with respect to length beyond the limit of `MAX_NC_NAME'
described in conventions.info?
Alternative 1: a practical limit of (say) 32 characters
should be imposed. This is consistent with most programming
languages. It simplifies the formatting burdens on
applications. It does not prevent application programs from
supporting longer names.
Alternative 2: Any limit other than the default limit of
MAX_NC_NAME (128) could lead to incompatibilities with non-
oceanographic netCDF files.
9) Multiple coordinate variables of same orientation
Is there a need to support multiple coordinate variables of
the same orientation in a single netCDF file? (such
multiplicity would preclude the use of strict names such as
"lat" to designate geographical coordinate variables though
templates like *lat* would still be possible)
Alternative 1: yes, there is a need (e.g. multiple current
meter arrays with differing deployment depths; in modelling
it is often desirable to compare results computed on
numerous different axes of the same orientation -
restrictions on naming of axes could be very inconvenient)
10) Requiring non-coordinate variables to be 4 dimensional
Is it acceptable to insist that all non-coordinate variables
be represented as 4-dimensional (lat/long/depth/time)
structures? Should there be other restrictions on number of
axes?
Alternative 1: dimensionality should not be restricted to
exactly 4 - the restriction would preclude some data types
and would force misrepresentation of others. Some
restriction on the maximum number of dimensions for a
variable would, however, ease the burden on application
writing.
11) Mandatory ordering of geographical dimensions
Is it acceptable to mandate that if dimensions with
geographical significance are used in defining a variable
they will be ordered as lat-lon-depth-time (i.e. time as
the slowest moving axis)?
Alternative 1: yes with reservations - are there serious
performance penalties?
Alternative 2: no - applications require greater
flexibility than this. Perhaps a standard ordering could
be defined and an attribute introduced that would indicate
permutations. Example:
var:permutation = "TXYZ";
12) Coordinate Systems
As mentioned in conventions.info there is work underway at
unidata on this subject leading towards the development,
presumably, of a collection of conventional attributes and a
new Unidata library, `udgeoref'. Is this work sufficient
for oceanographic data? Is this beyond the (initial) scope
of this document?
13) Application-specific attributes
Would it be useful to standardize a collection of attributes
that would coach application programs in areas not directly
related to the data content - for example attributes that
recommended display techniques such as
preferred_display_style="contour"
preferred_display_map="spherical polar"
Candidates? ...
14) Climatological Axes
What is the best method to represent a climatological time
axis?
Alternative 1: attach the (boolean) variable attribute
"periodic" to the time coordinate axis indicating the axis
ends "join" modulo-fashion (this solution is useful for any
periodic axis - also applicable to longitude). What about
the base-date string (see issue 1)?
time: periodic = " ";
Alternative 2: Like alternative 1 but the attribute should
indicate the "branch points" of the periodicity:
time: periodic_values = 0.,365.;
15) Use of Boolean Attributes
Issue 14 raises the general question of the appropriateness
of boolean attributes (whose presence or absence indicates a
modal state). There is no explicit mechanism in netCDF for
creating a value-less attribute (see Issue 14 Alternative
1). Should profile.oceanography avoid boolean attributes?
Or is this largely an aesthetic issue of the appearance of
CDL files? Could CDL be extended in a future revision to
support e.g.
time: periodic;
16) Vertical axis orientation
Often oceanographic data is organized with positive down on
vertical axes. What is the best mechanism to indicate this
in a netCDF file? (A similar question arises on latitude
axes which may be south-positive or north-positive.)
Alternative 1: Introduce a (boolean) coordinate variable
attribute "reversed".
Alternative 2: Combine this property together with others
that have been discussed in a new attribute
depth: properties = "reversed, coordinates, vertical";
17) Longitude axis encodings
Longitudes encodings are not standardized - they may be
continuous across the dateline or continuous across the
prime meridian; either westward or eastward may be positive;
the range may be -180 to 180 or 0 to 360 or some other
choice. How should netCDF convey this encoding?
Alternative 1: 4 variable attributes applied to the
longitude coordinate variable:
- "reversed" for X positive, westward
- "discontinuity"=value (always give the minimum value)
- one of "Greenwich=value" or "dateline=value"
e.g. To define a longitude axis from 0 to 360, positive
eastward, with zero representing Greenwich
variables:
float lon(lon);
lon:Greenwich=0.;
lon:discontinuity=0.;
Alternative 2: modify Alternative 1 by replacing the
"discontinuity" attribute with
lon:periodic_values = 0., 360.;
18) Unequally spaced coordinates
Is the location of grid points sufficient information to
fully describe a coordinate axis with irregularly-spaced
points? Or do we need auxiliary machinery to represent the
boundaries between points?
Alternative 1: There are cases that require explicit
boundaries between cells on an axis e.g. data collected in
unequal bins. Is this a special-purpose need beyond the
scope of this document?
19) Huge Data Sets / Multiple Files
Should we provide a standardized mechanism for associating
multiple files in a single "project"? How should it
function? as a time axis distributed among files? as
multiple variables distributed among files? Is this beyond
the scope of this document?
Alternative 1: a "parent" netCDF file with variables and
attributes suitably defined to point to "child" files.
Alternative 2: a file naming convention such as
my_cdf.001, my_cdf.002, my_cdf.003, ...
that will implicitly concatenate netCDF files along their
record (or time?) axis.
20) Representing Sigma Coordinate Systems
How should variables defined on sigma coordinate grids be
represented? Is this question within the scope of this
document? Will it covered by the `udgeoref' library?
Alternative 1:
A variable defined on a sigma coordinate system should
possess an attribute "sigma". The coordinate variable
corresponding to the vertical dimension should exist and
have simple enumerated values 1, 2, ..., n. The coordinate
variable should further have an attribute "sigma_positions"
(?better name?) which gives the name of a variable
containing the z coordinates. The z coordinate variable
should be defined on the same dimensions as the original
variable. e.g.
variables:
float u(lat,lon,level,time); // on sigma coords//
u: sigma = " ";
integer level(level);
level: sigma_positions = "depths";
float depths(lat,lon,level); // time may be a
dimension, too//
depths:units="meters";
**************************
Real-Time and Shipboard data collection?
What are the special issues?
How to represent a cruise track? (** a requirement? **)
How to store variables with differing sampling intervals?
(Beyond Scope?)
Arctic oceanography
What are the special issues?
Climate research
What are the special issues?
Chemical oceanography
What are the special issues?
Biological Oceanography
What are the special issues?
Compressed data
Are there special cases where compression can fit a general
framework?
Other special topic issues?