Strawman comments

Comments on OCEANOGRAPHIC CONVENTIONS for NETCDF, Draft no.1 - 10/7/92

>This document presumes the contents of "conventions.info"
>(unidata.ucar.edu anonymous ftp directory pub/netcdf) and will
>not duplicate what is already described there.  As both
>conventions.info and profile.oceanography will be evolving in
>parallel we will need to coordinate the documents throughout
>their evolutions.

A few comments about the "conventions.info" file.  The last time I looked 
at conventions.info (3 weeks ago ?) it was in rather bad shape.  
        + some variables occur several times.  Eg., lat, lon
        + many variable names are too short and/or ambiguous.  Here are some
      examples...
        float  Tmin        minimum temperature
        float  Tmax        maximum temperature
        long   meana       mean anomaly
        float  speed       movement speed associated with an echo-object (MDR)

           o "Tmin" and "Tmax" are way to ambiguous.  T doesn't necessarily mean
             temperature, and there is no room for having multiple temperature
             sources and types (ie., water, air, wading pool).

           o "meana" is wide open.  What kind of anomaly?  Gravity?  Food
             quality?  Hem lines?

           o Using a name as general as "speed" for a specific thing like MDR is
             also a no-no.  No room for storing speed of moving platforms,
             etc.

See further comments about variable name limitations.

> Guidelines for Creating Profiles.
>   o keep it simple (avoid proliferation of attributes)

Yes, yes, yes!  The more complex it is, the fewer people will use
it.  Why convert to a confusing new format when your old one works
fine?

> Guidelines for Robust Applications.
>    o Uninterpretable attributes should be ignored
>    o Variables with unsupported data types should be ignored
>    o Applications that require recognized variable names should
>      ignore variable names they do not recognize
        
In my mind, you are trying to put too much functionality and brains
into the application.  To program an application that can "figure
out" a file is bordering on AI and Expert Systems.  I think what you
fail to address that at some point in a file's processing life, a
person will look at it and determine what variables to use.  For example, 
if a file had variables lat, lon, and depth, AS WELL AS x, y, and z, what
should the application do?  Obviously a user has to say "use lat,
lon, & depth", or "use x, y, and depth", or "lat, x, and z".

If an app that can't recognize a variable, or determine
the structure, it should present the user with a list of information from 
the file and ask, "What do YOU think?"  Also, an app that makes
a conclusion about how to interpret a file should be "over-rideable"
by the user.  Imagine having a "smart" app that's not smart enough,
or too smart for it's own good.
    
> Oceanographic Profile Issues.
...
>          variables:
>              double time(nobs);
>               time:units = "milliseconds since (1992-9-16
>                              10:09:55.3 -600)"
>     (This will be implemented shortly in the udunits library.) 
        
Stick with this and let udunits define the format as much as possible.
     
> 2) How to determine the orientation of a coordinate variable
>     Alternative 1:
>     Minimal restrictions on the naming of coordinate variables
>     and choice of units.  Applications should apply a
>     multi-step algorithm to identify orientation as follows:
>     First - check the units of the coordinate variable:
>     Do the units imply a unique orientation (e.g. units of
>          time, "degrees longitude", "layer", etc.) ?
>          If no, then check the name of the coordinate variable:
>     Does the variable name match a template (e.g. *depth*,
>          *lon*, *lat*, *time*, x*, y*, z*, t*, etc.)?
>     Is this approach too complex?  

No.

>     What about cases where the orientation remains ambiguous?

Have the app ask the user what they think.

>     Alternative 2:
>     Introduce a variable attribute 'orientation' with a
>     suitable naming convention for orientation strings (e.g.
>     "west-east", "south-north")
>     Should this be an optional attribute that can be applied
>     when the Alternative 1 technique fails? 
        
Getting too complex!  Rember "Keep It Simple"
     
> 4) Case-insensitive Names
>    Should application programs be case-sensitive with respect
>    to attribute and variable names?  
>    Alternative 1: Case-insensitive.  
        
Easier to use.  You don't mess up because you queried on "lat", "LAT",
and "Lat", but for some reason the file creator liked "lAT".  Stranger
things can happen.
    
>    Alternative 2: Case-sensitive.  
>    There are conveniences to the use of e.g. "time",
>    "Time", and "TIME" within the same file. 

I think making two variables in a dB, one named "time", and the other
"TIME" is bad design.  It's really, really abiguous, fuzzy, and bad.
Creating a readable CDL file is like writing readable "C" or "FORTRAN"
code.  Who likes FORTRAN with all capital letters and no spaces between
reserved words?  It's HARDTOREADANDMAKEANYSENCEOFIT.


> 6) Need a global attribute to indicate profile type and revision

Alternative 1 looks good.
     
> 7) Standardized (Conventional) Variable Names
>      The meteorological community has suggested a list of
>      standardized variable names (see conventions.info).  

"conventions.info", as I said above, is too messing for me to base any
kind of standard on.

>      Should this list be extended to include additional oceanographic
>      variables?  

It must be modified before OCE data can be properly included.

>      How should these names fit this into the
>      framework of "resources" as described in conventions.info?
>      (We need input from folks familiar with "resources" in this
>      context.)

I would suggest both standardized names, and the use of a
"configuration" file that help's one read a file.  The configuration
file could be used to specify the variables to be used in a file.  For
example, assume a ship data file like this...

        dimensions:
                time = unlimited;
        variables:
                float   GPS_lat (time);         // position from GPS
                float   GPS_lon (time);
                float   SatNav_lat (time);      // position from Magnavox Sat 
Nav
                float   SatNav_lon (time);
                float   LORAN_lat (time);       // position from Loran-C
                float   LORAN_lon (time);
                float   GPS_time (time);        // GPS clock time at fix
                float   SatNav_time (time);     // SatNav clock time at fix
                float   LORAN_time (time);      // LORAN clock time at fix
                float   PC_time (time);         // PC acquisition time at fix
                float   SBE_Sea_Surface_Temperature1 (time);    // SeaBird SST 
#1
                float   SBE_Sea_Surface_Temperature2 (time);    // SeaBird SST 
#2
                float   SBE_Sea_Conductivity1 (time);           // Seabird 
Conductivity #1
                float   SBE_Sea_Conductivity2 (time);           // Seabird 
Conductivity #1
                float   Salinity1 (time);                               // T1 
and C1
                float   Salinity2 (time);                               // T1 
and C2
                float   Salinity3 (time);                               // T2 
and C1
                float   Salinity4 (time);                               // T2 
and C2

Writing an app that could interpret this and handle a request like "plot
time vs. salinity", or "plot the ship's track" would be impossible
without user intervention.  In lieu of having the app flat out ask the
user something like...

        Of these latitudes...
                1 = GPS_lat
                2 = SatNav_lat
                3 = LORAN_lat
        Which do you want to use (1-3) ?

for every ambiguous thing (of which there are many), I would suggest
useing a configuration file that might look something like this...

        TIME = PC_time
        LAT = GPS_lat
        LON = GPS_lon
        SEA_SURFACE_TEMPERATURE = SBE_Sea_Surface_Temperature1
        SEA_CONDUCTIVITY = SBE_Sea_Conductivity2
        SALINITY = Salinity2

It's a thought.

> 8) Name String Lengths
>     Should attribute and variable names be further restricted
>     with respect to length beyond the limit of `MAX_NC_NAME'
>     described in conventions.info?
     
No, no, no!  Don't limit names!!!  Just look at some of the names in the
"conventions.info" file and you can see what happens when you scrimp on
name lengths!  For example, why use "Tmin" for temperature min, when you
could go with "temperature_min", or "minimum_temperature"?  Another bad
example is "SST", which should be "Sea_Surface_Temperature".  Two more
examples are "DIR" (wind direction) and "SPD" (wind speed).  Names like
"wind_direction" and "wind_speed" would be much better, and not as
ambiguous.  Suppose I want to store winds from a moving platform.  I
would use...
        variables:
                float   platform_speed (time);          // platform info 
                float   platform_heading (time);
                float   true_wind_speed (time);         // corrected for 
platform motion
                float   true_wind_heading (time);       //     "                
   "
                float   raw_wind_speed (time);          // not corrected
                float   raw_wind_heading (time);    // not corrected

Get wordy! Get descriptive!  Disk space is cheap!

    
> 10) Requiring non-coordinate variables to be 4 dimensional
>     Is it acceptable to insist that all non-coordinate variables
>     be represented as 4-dimensional (lat/long/depth/time)
>    structures?  Should there be other restrictions on number of
>    axes?
>
>    Alternative 1: dimensionality should not be restricted to
>    exactly 4 - the restriction would preclude some data types
>    and would force misrepresentation of others.  Some
>    restriction on the maximum number of dimensions for a
>    variable would, however, ease the burden on application
>    writing.
>    
> 11) Mandatory ordering of geographical dimensions
>      Is it acceptable to mandate that if dimensions with
>      geographical significance are used in defining a variable
>      they will be ordered as lat-lon-depth-time (i.e. time as
>      the slowest moving axis)?
>     
>      Alternative 1: yes with reservations - are there serious
>      performance penalties?
>
>      Alternative 2: no - applications require greater
>      flexibility than this.  Perhaps a standard ordering could
>      be defined and an attribute introduced that would indicate 
>      permutations.  Example:
>     
>          var:permutation = "TXYZ";

It took me a while, but I think what you are presuming here is that when
you make a data request, you will be asking "What was a particular data
value at this given position (X, Y, & Z) and time (T)?" rather than
"What was the data value at a time (T) and what was the platform's position 
(X, Y, & Z) at this particular time (T)?"

As a collector of raw data from a ship, I think in the second case.  The
first case just doesn't exist on a single, moving platform.  It exists
when you have created a model, or grid, or have a large number of
sensors in an array.
     
> 16) Vertical axis orientation
>      Often oceanographic data is organized with positive down on
>      vertical axes.  What is the best mechanism to indicate this
>      in a netCDF file?  (A similar question arises on latitude
>      axes which may be south-positive or north-positive.)

In the last 10 years of processing, I have never seen anything in my
corner of the field where + latitude meant anything else but north, and
- meant south.  Same for east and west.  I would suggest that anyone
with data that is "- is north" should just re-process their data before
trying to warp an app to handle this case.

>      
>      Alternative 1: Introduce a (boolean) coordinate variable
>      attribute "reversed".
>      
>      Alternative 2: Combine this property together with others
>      that have been discussed in a new attribute
>     
>           depth: properties = "reversed, coordinates, vertical";

Z direction (+ is up, + is down) does seem to flop around a bit.  How 
about an attribute such as...
        z: up = "+";    // or "up" or "positive"
or
        z: down = "-";  // or "down" or "negative"


> 17) Longitude axis encodings
>     Longitudes encodings are not standardized - they may be
>    continuous across the dateline or continuous across the
>    prime meridian; either westward or eastward may be positive;
>    the range may be -180 to 180 or 0 to 360 or some other
>    choice.  How should netCDF convey this encoding?
   
One should distinguish between using longitude values as parts of
queries, and longitudes as values.  If you want to query on a particular
value, then true, -170 needs to be translated to +190, +190 to -170 (or
what-ever).  If you want to store a position (like a drifter position),
then it really doesn't matter.  Any program worth it's (sea) salt can
handle -180 to 180 and 0 to 360 when extracting lat/lon data for
plotting or processing.

Kind of a comment on this...

     
> 19) Huge Data Sets / Multiple Files
>     Should we provide a standardized mechanism for associating
>     multiple files in a single "project"?  How should it
>     function? as a time axis distributed among files?  as
>     multiple variables distributed among files?  Is this beyond
>     the scope of this document?
>     
>     Alternative 1: a "parent" netCDF file with variables and
>     attributes suitably defined to point to "child" files.
>
>     Alternative 2: a file naming convention such as
>          my_cdf.001, my_cdf.002, my_cdf.003, ...
>     that will implicitly concatenate netCDF files along their
>     record (or time?) axis.  

How about having some global variables like this...

        variables:
                file_sequence: first = W9205a_001.cdf;
                file_sequence: last  = W9205a_023.cdf;  
                file_sequence: next  = W9205a_013.cdf;
                file_sequence: prev  = W9205a_011.cdf;

Obviously, you might not be able to put all the names in right away as
the sequence may not be known until after processing and acquisition is
completed.

A comment on the naming convention idea.  I would point out the "badness"
of naming netCDF files with any extension but ".CDF".  It's much easier
to say...
        % ls *.cdf
which would give you all the cdf files, than...
        % ls *cdf*
which would give you your cdf files, as well as anything else that had
"cdf" in it's name, like your processing programs, sub directories,
etc.
     
> 20) Representing Sigma Coordinate Systems
    
Could someone explain Sigma Coordinate Systems to me?  I plead
ignorance.  Go ahead, laugh at me and drop me down a notch in your
mind.

**************************

> Real-Time and Shipboard data collection?

Ah, I've been waiting for this...

>     What are the special issues?

I think it's important to note that raw cruise data and processed,
gridded models and the like are very different animals.  In my
raw/real-time/shipboard myopic view, I would say a file containing a 
model or grid will be used for queries like "What is the current vector 
at lat=45.36, lon=-126.97, depth=50.0, and time=325.36?" were as a file
containing cruise data will be used for questions like "What was the
current vector, lat, and lon for the sample at depth=50.0 and time=325.36?"
Very different questions.

One could think of ship data as being 4 dimensional, and you could make
4d queries ("What is the wind speed at a given lat, lon, depth (assume
0), and time?"), but the search would be horrendous -- check through EVERY
sample in the data file, looking for match on lat, lon, depth, and time.
This is assuming that the ship happened to go over the particular point
at the particular time.  I guess I'm saying that you can think of ship
data in a 4d sence, and make 4d queries, but it's not really practical.
Mostly, I would say data is tied around a "1d" coordinate system -- time.
Most all queries are referenced to a given time.  "Where were we at time
T?  What was the temperature at time T?  What was the towed vehicle's
depth at time T?  What was the ship's speed at time T?"

>     How to represent a cruise track?  (** a requirement? **)

I would assume that instead of creating variables like this...
        dimensions:
                time = unlimited;
        variables:
                float   time(time);
                float   lat(lat);
                float   lon(lon);

a cruise track would be...
        dimensions:
                time = unlimited;
        variables:
                float   time(time);
                float   lat(time);
                float   lon(time);

Many of the questions and rules outlined above dealing with determining
variable names and contents also apply. Looking for "*lat*" or what not
works fine.

As to "** a requirement? **" -- Eah-gahds, YES!  Where do you think
processed data comes from anyway?  It comes from raw data!  The only way
you get a set of ADCP data to build a large scale 3d grid of currents is
by going out in a ship and cruising around and collecting data!  The
only way you know what the water depth is at a given point is by going
out there and "pinging" the bottom!

> Other Topic Issues (relating to Shipboard Data)

One point that I would stress for collecting raw data is that a file
contain the true raw data, as well as calculated and computed values.
It is a very "dangerous" step for us data collectors to store only
calculated engineering unit values and not the raw data it came from.
For example, looking at winds in OSU's XMIDAS netCDF file, we store the
following values...
        ship's speed (from 3 nav sources and speed log) in knots
        ship's heading (from 3 nav sources and gyro) in deg
        raw voltage values from wind vane for wind speed (from 0-5 volt)
        raw voltage values from wind vane for wind heading (from 0-5 volt)
        uncorrected wind speed (in knots)
        uncorrected wind heading (in deg)
        corrected (for ship's and hdg, using speedlog) wind speed (in knots)
        corrected (for ship's speed and hdg, using gyro) wind heading (in deg)

This gives the PI that uses the data a great lee-way to work with the
data.  If the gyro or the speed log wigs out at some point, (s)he is
free to re-calculate the values from GPS or what ever.  If the
calibration coefficients had changed or some drift occured in the wind
vane over the period of a cruise, the uncorrected and corrected winds
could be re-computed from the raw voltage values with new callibration
coefficients.  Also, if someone messed up the program that did the
calculations and miss-typed a coefficient, then that could be
disasterous.  Imagine having someone do a whole study based on the
calculated values one had delivered, only to discover that a mistake had
been made and all temperature values were 2 deg C too high, thus there
IS no resumption of the El Nino. (extreme, I'll admit).

Our data files are very large, as we store all kinds of data -- raw and
calculated.  We include the raw text strings from the GPS, SatNav
(Complete with the word "MAGNAVOX" for you SatNavers), and Loran, as
well as floats that hold a subset of the info in the strings.  If you
really want to know exactly what satellites were used for each GPS fix,
it's there, if you want to go for it.  We also store raw frequency values
and voltages for other instruments, along with the number of samples
taken over the sample period (1 minute), and the min, max, and mean
values observed over the sample period.

I guess we believe that "disk is cheap", and "data is precious" to us.
Also, our "disclamer" is that our NFS mandate is to collect raw data,
not to process it.  That's the PI's job (for our case).

> Compressed data

I would die for compression, especially if it was automatic and happened
inside the netCDF routines and I never knew it was there.  

=======================================================================

              |   ||        Tim Holt / Marine Technician / RV Wecoma
+--==o_____+-/|--+||        College of Oceanography / Oregon State
.____|  R/V WECOMA  ~-----/ Corvallis, OR USA, 97331-5503       (503)737-4447
+------------------------'  holtt@xxxxxxxxxxxx  

  • 1992 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: