FWD: A Straw Man Profile for Oceanographic NetCDF Files

Hello all,

After the flurry of messages yesterday its almost embarrassing to 
contribute further to everyone's mail load ... This message is a
follow-up on messages of September 28 about a straw man profile
for oceanographic netCDF files.  Sorry for the delay in getting
back - I took out some time to coordinate this idea with our
Unidata folks.

Attached to this message is a "Draft no. 1" straw man.  The
introductory portion of the document suggests a process of using
Internet discussion to progress the straw man into an open and
agreed-upon standard for our netCDF files in oceanography.  The
first step it suggests is to expand the preliminary list of
issues and possible resolutions to those issues.  Given the
recent discussions about email traffic we may need to revise this
procedure - perhaps develop a private discussion group - but
until we see if we are creating a traffic problem please send
your responses to netcdfgroup.

There is a certain element of inconvenience to creating a
standard at this stage which I want to acknowledge at the outset. 
A number of you have already created products using netCDF.  Many
of the issues that need to be discussed in creating an open
standard have already been raised and resolved to your individual
satisfaction.  I am in that situation myself, as the FERRET
program (described in unidata.ucar.edu anonymous ftp
pub/netcdf/utilities.txt) already supports netCDF.  There can be
a certain reluctance to reopening these issues and contributing
to a document that will no doubt conflict in some cases with the
choices we have made.

Yet, the developers who have already created netCDF products and
have thought through the issues are the essential contributors to
creating a quality, inter-operable standard.  This is the normal
progression in the creation of open standards - a recent example
being the ANSI C committee where software vendors (et al.) who
already had C implementations worked together to resolve
interoperability problems and extend functionality.  And like the
ANSI committees we should not expect quick results (though we can
hope for faster progress than ANSI committees!).

This process is an experiment - open standards often do not come
about easily.  I am optimistic it will work but it requires
active input from all interested parties.

                |  NOAA/PMEL               |   ph. (206) 526-6080
Steve Hankin    |  7600 Sand Point Way NE  |   FAX (206) 526-6744
                |  Seattle, WA 98115-0070  |  hankin@xxxxxxxxxxxx

P.S. The profile.oceanography file will also be available via
anonymous ftp at unidata.in pub/netcdf.

================================================================
               file: cdfprofile.oceanography
                                                                         
                                                                         
                                                                         
                   OCEANOGRAPHIC CONVENTIONS for NETCDF
                           Draft no.1 - 10/7/92
                                     
                       Phase 1 - identifying issues

CONTENTS:
Introduction.
Guidelines for Creating Profiles.
Guidelines for Robust Applications.
Oceanographic Profile Issues.


Introduction.

This file, cdfprofile.oceanography, will document a profile of
conventions to standardize the usage of netCDF for many
oceanographic data applications.  The primary goal of this
standardization is to facilitate data interchange.  The document
is expected to develop in three phases:

     Phase 1 - We will enumerate the issues relevant to
     oceanographic data storage with netCDF and provide several
     alternative resolutions to issues.  In Phase 1 **all**
     aspects of this document are open to comments and revision.
     
     Phase 2 - Through email dialog we will discuss the
     resolutions of the issues that have been identified working
     towards a consensus on each issue.  In Phase 2 the basic
     layout and strategy of the document will be fixed, however
     all technical content will be revisable.  New issues will be
     added only if there is general agreement that they are vital
     to the document.  (At this stage the dialog may be shifted
     off of netcdfgroup.)
     
     Phase 3 - We will edit the document removing ambiguities and
     producing stable, readable text.  Issues that become
     apparent during implementations must also be resolved at
     this time.
     
The entire process of arriving at a standardized profile should
be open and should reflect the views of all members of the
"oceanographic community" (an admittedly ambiguous term) who wish
to participate.  If a consensus cannot be reached on some issues
it may be necessary to formalize a "voting" procedure to resolve
the issues.  (These procedures could be defined in Phase 1 of the
document?  volunteer?)

The scope of this work is broader than what we can hope fully to
achieve.  Some issues may need to be classified as "Beyond the
scope of this document".  Again, we should try to reach general
agreement before classifying an issue this way. 

Producing a final document that unambiguously describes all of
the issues and resolutions will clearly be a significant piece of
work.  This can only be accomplished if we each offer complete
and concise text when we make a contribution.  I (Steve Hankin)
will volunteer to serve as the document editor - pulling our
contributions together into a single document and making it
available via email and/or anonymous ftp.  Since this is to be an
entirely open process please speak your mind if you know of a
preferable prospect for document editor.

This document presumes the contents of "conventions.info"
(unidata.ucar.edu anonymous ftp directory pub/netcdf) and will
not duplicate what is already described there.  As both
conventions.info and profile.oceanography will be evolving in
parallel we will need to coordinate the documents throughout
their evolutions.


Guidelines for Creating Profiles.

In the process of discussing issues and comparing alternative
resolutions an explicit set of "guiding principles" would be an
asset.  Such principles include (please extend):

   o keep it simple (avoid proliferation of attributes)

   o minimize restrictions (don't reduce functionality)

   o profile-compliant files should remain intelligible to
     applications that know nothing of the profile (where
     possible)
          

Guidelines for Robust Applications.

Application programs will in general be far more restrictive in
scope than the conventions described herein.  These application
programs can still perform useful work on many netCDF files that
observe the conventions if they observe the following
"motherhood" rules:
    o meta rule: don't crash, don't give up if possible
    o Uninterpretable attributes should be ignored
    o Variables with unsupported data types should be ignored
    o Applications should not assume particular units will be
      attached to particular variable names.
    o Applications that require recognized variable names should
      ignore variable names they do not recognize
    o Applications should avoid assumptions about the structure
      of the netCDF file:
      - dimensions may be defined which are unused
      - variables may use dimensions which have no corresponding
         coordinates defined
      - etc. (expand list)   
    
Oceanographic Profile Issues.

1) Time axis representation
     The file "conventions.info" suggests (e.g.)
     
          variables:
              double time(nobs);
               time:units = "milliseconds since (1992-9-16
                              10:09:55.3 -600)"
     
     (This will be implemented shortly in the udunits library.) 
     Should we impose restrictions on data types (double, float,
     etc)?  How should we standardize the format for the date
     string? (is this specificed by udunits?)
     
2) How to determine the orientation of a coordinate variable
     The orientation of coordinate axes can be specified through
     a variety of mechanisms:  agreed-upon names such as
     "lat","lon", etc.; implicit orientations inferred from the
     ordering of dimension names within a variable definition;
     orientations inferred from the units of the coordinate
     variable.  None of these mechanisms appear to be adequate
     in all cases.
     
     Alternative 1:
     Minimal restrictions on the naming of coordinate variables
     and choice of units.  Applications should apply a
     multi-step algorithm to identify orientation as follows:
     First - check the units of the coordinate variable:
     Do the units imply a unique orientation (e.g. units of
          time, "degrees longitude", "layer", etc.) ?
          If no, then check the name of the coordinate variable:
     Does the variable name match a template (e.g. *depth*,
          *lon*, *lat*, *time*, x*, y*, z*, t*, etc.)?
          
     Is this approach too complex?  What about cases where the
     orientation remains ambiguous?
     
     Alternative 2:
     Introduce a variable attribute 'orientation' with a
     suitable naming convention for orientation strings (e.g.
     "west-east", "south-north")
     
     Should this be an optional attribute that can be applied
     when the Alternative 1 technique fails? 
     
3) Indicating Missing Data
     Two attributes for missing data have been suggested:
     missing_value and _FillValue.  The missing_value attribute
     has been dropped in netCDF version 2.0.  Is there a need to
     support both attributes?
     
4) Case-insensitive Names
    Should application programs be case-sensitive with respect
    to attribute and variable names?  Should variable and
    attribute names within a single file be required to be case-
    insensitive-unique?  (This refers to the **names** only; 
    the values of string attibutes such as units would remain
    case-sensitive.)
    
    Alternative 1: Case-insensitive.  The peculiarities of Unix
    and C, while familiar to programmers, are not necessarily
    comfortable for users.  Publication and conversation are
    complexified by case-sensitive names.   
    
    Alternative 2: Case-sensitive.  Case-insensitivity would
    lead to incompatibilities with non-oceanographic netCDF
    files.  There are conveniences to the use of e.g. "time",
    "Time", and "TIME" within the same file. 
    
5) Multiple Time Axes in a File
    Is there a need for multiple time axes defined within a
    single netCDF file?  Or is there a reason to limit files to
    a single time axis?  (Multiple time axes would conflict with
    some time encodings that have been discussed that involve
    global variables.)
    
    Alternative 1: Permit multiple time axes (no conflict with
    time axes as suggested in conventions.info). 
    
6) Need a global attribute to indicate profile type and revision
     There should be a global attribute informing application
     programs explicitly what netCDF profile and revision a file
     adheres to.  This issue needs to be addressed at a level
     higher than this oceanographic profile but some
     recommendations would be appropriate.
     
     Alternative 1: 
          :profile = "oceanography";
          :profile_version = 1.0;
     
7) Standardized (Conventional) Variable Names
     The meteorological community has suggested a list of
     standardized variable names (see conventions.info).  Should
     this list be extended to include additional oceanographic
     variables?  How should these names fit this into the
     framework of "resources" as described in conventions.info?
     (We need input from folks familiar with "resources" in this
     context.)
     
8) Name String Lengths
     Should attribute and variable names be further restricted
     with respect to length beyond the limit of `MAX_NC_NAME'
     described in conventions.info?
     
     Alternative 1: a practical limit of (say) 32 characters
     should be imposed.  This is consistent with most programming
     languages.  It simplifies the formatting burdens on
     applications.  It does not prevent application programs from
     supporting longer names.
     
     Alternative 2: Any limit other than the default limit of
     MAX_NC_NAME (128) could lead to incompatibilities with non-
     oceanographic netCDF files.
     
9) Multiple coordinate variables of same orientation
     Is there a need to support multiple coordinate variables of
     the same orientation in a single netCDF file?  (such
     multiplicity would preclude the use of strict names such as
     "lat" to designate geographical coordinate variables though 
     templates like *lat* would still be possible)
     
     Alternative 1: yes, there is a need  (e.g. multiple current
     meter arrays with differing deployment depths; in modelling
     it is often desirable to compare results computed on
     numerous different axes of the same orientation -
     restrictions on naming of axes could be very inconvenient)
     
10) Requiring non-coordinate variables to be 4 dimensional
    Is it acceptable to insist that all non-coordinate variables
    be represented as 4-dimensional (lat/long/depth/time)
    structures?  Should there be other restrictions on number of
    axes?
    
    Alternative 1: dimensionality should not be restricted to
    exactly 4 - the restriction would preclude some data types
    and would force misrepresentation of others.  Some
    restriction on the maximum number of dimensions for a
    variable would, however, ease the burden on application
    writing.
    
11) Mandatory ordering of geographical dimensions
     Is it acceptable to mandate that if dimensions with
     geographical significance are used in defining a variable
     they will be ordered as lat-lon-depth-time (i.e. time as
     the slowest moving axis)?
     
     Alternative 1: yes with reservations - are there serious
     performance penalties?
     
     Alternative 2: no - applications require greater
     flexibility than this.  Perhaps a standard ordering could
     be defined and an attribute introduced that would indicate 
     permutations.  Example:
     
          var:permutation = "TXYZ";
     
12) Coordinate Systems
    As mentioned in conventions.info there is work underway at
    unidata on this subject leading towards the development,
    presumably, of a collection of conventional attributes and a
    new Unidata library, `udgeoref'.  Is this work sufficient
    for oceanographic data?  Is this beyond the (initial) scope
    of this document?
    
    
13) Application-specific attributes
     Would it be useful to standardize a collection of attributes
     that would coach application programs in areas not directly
     related to the data content - for example attributes that
     recommended display techniques such as
          preferred_display_style="contour"
          preferred_display_map="spherical polar"
     
     Candidates? ... 
     
14) Climatological Axes
     What is the best method to represent a climatological time
     axis?
     
     Alternative 1:  attach the (boolean) variable attribute
     "periodic" to the time coordinate axis indicating the axis
     ends "join" modulo-fashion (this solution is useful for any
     periodic axis - also applicable to longitude).  What about
     the base-date string (see issue 1)?
     
          time: periodic = " ";
     
     Alternative 2: Like alternative 1 but the attribute should
     indicate the "branch points" of the periodicity:
     
          time: periodic_values = 0.,365.;
     
15) Use of Boolean Attributes
     Issue 14 raises the general question of the appropriateness
     of boolean attributes (whose presence or absence indicates a
     modal state).  There is no explicit mechanism in netCDF for
     creating a value-less attribute (see Issue 14 Alternative
     1).  Should profile.oceanography avoid boolean attributes? 
     Or is this largely an aesthetic issue of the appearance of
     CDL files?  Could CDL be extended in a future revision to
     support e.g.
     
          time: periodic;
     
16) Vertical axis orientation
     Often oceanographic data is organized with positive down on
     vertical axes.  What is the best mechanism to indicate this
     in a netCDF file?  (A similar question arises on latitude
     axes which may be south-positive or north-positive.)
      
     Alternative 1: Introduce a (boolean) coordinate variable
     attribute "reversed".
     
     Alternative 2: Combine this property together with others
     that have been discussed in a new attribute
     
          depth: properties = "reversed, coordinates, vertical";
     
17) Longitude axis encodings
    Longitudes encodings are not standardized - they may be
    continuous across the dateline or continuous across the
    prime meridian; either westward or eastward may be positive;
    the range may be -180 to 180 or 0 to 360 or some other
    choice.  How should netCDF convey this encoding?
    
    Alternative 1: 4 variable attributes applied to the
    longitude coordinate variable:
    - "reversed" for X positive, westward
    - "discontinuity"=value (always give the minimum value) 
    - one of "Greenwich=value" or "dateline=value"
    
    e.g. To define a longitude axis from 0 to 360, positive
    eastward, with zero representing Greenwich
           variables:
             float lon(lon);
              lon:Greenwich=0.;
              lon:discontinuity=0.;
    
    Alternative 2: modify Alternative 1 by replacing the
    "discontinuity" attribute with
              lon:periodic_values = 0., 360.;
    
18) Unequally spaced coordinates
     Is the location of grid points sufficient information to
     fully describe a coordinate axis with irregularly-spaced
     points?  Or do we need auxiliary machinery to represent the
     boundaries between points?
     
     Alternative 1: There are cases that require explicit
     boundaries between cells on an axis e.g. data collected in
     unequal bins.  Is this a special-purpose need beyond the
     scope of this document?
     
19) Huge Data Sets / Multiple Files
     Should we provide a standardized mechanism for associating
     multiple files in a single "project"?  How should it
     function? as a time axis distributed among files?  as
     multiple variables distributed among files?  Is this beyond
     the scope of this document?
     
     Alternative 1: a "parent" netCDF file with variables and
     attributes suitably defined to point to "child" files.
     
     Alternative 2: a file naming convention such as
          my_cdf.001, my_cdf.002, my_cdf.003, ...
     that will implicitly concatenate netCDF files along their
     record (or time?) axis.  
     
20) Representing Sigma Coordinate Systems
     How should variables defined on sigma coordinate grids be
     represented? Is this question within the scope of this
     document?  Will it covered by the `udgeoref' library?
     
     Alternative 1:
     A variable defined on a sigma coordinate system should
     possess an attribute "sigma".  The coordinate variable
     corresponding to the vertical dimension should exist and
     have simple enumerated values 1, 2, ..., n.  The coordinate
     variable should further have an attribute "sigma_positions"
     (?better name?) which gives the name of a variable
     containing the z coordinates.  The z coordinate variable
     should be defined on the same dimensions as the original
     variable.  e.g.
     
     variables:
       float   u(lat,lon,level,time); // on sigma coords//
               u: sigma = " ";
       integer level(level);
               level: sigma_positions = "depths";
       float   depths(lat,lon,level); // time may be a
                                         dimension, too//
               depths:units="meters";
     
**************************

Real-Time and Shipboard data collection?
     What are the special issues?
     How to represent a cruise track?  (** a requirement? **)
     How to store variables with differing sampling intervals? 
     (Beyond Scope?)
     
Arctic oceanography
     What are the special issues?
     
Climate research
     What are the special issues?
     
Chemical oceanography
     What are the special issues?
     
Biological Oceanography
     What are the special issues?
     
Compressed data
     Are there special cases where compression can fit a general
     framework?
     
Other special topic issues?



  • 1992 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: