Re: "Contractions"

Hi again Jonathan-

Well, it is certainly high time I got around to reviewing your mail
from last September (!).  So sorry about the delay.  But I also seem to
have severely underestimated the work it would take to get my head back
into the rather involved topics we were wrestling with at the time.
Anyway, let's see if I can recover the thread...(Unfortunately, I'm
afraid this is going to be another one of those "marathon" Emails!)

For sanity's sake, let's restrict things to the issue of "contractions"
(ie, averages, etc.).  Way back when the GDT proposal first came out, I
questioned (among other things) the fact that "contracted" axes were
assumed to have a dimension of unity, asking how one might store, say,
a time series of January mean temperatures.  Your answer was to split
the time axis into 2 dimensions and "contract" along the "day"
dimension, leaving the "month" dimension to portray the time series.
We subsequently went back and forth with a couple of examples and
compared our individual approaches for documenting the characteristics
of averaged quantities.

However, I must say that the use of 2-dimensional time coordinates,
while a clever and adaptable approach, *is* a bit complicated and
confusing.  I think our users here will find this somewhat hard to
follow (though I plan to present the idea to a select group),
especially since their contact with netCDF, per se, is not as frequent
and intimate as ours :-).  Perhaps more importantly, use of such a
construct will immediately render the file unusable by nearly all the
applications we use here to handle and view netCDF data. Also, I'm not
sure how something like this:

  dimensions:
    months=3;
    days=31;

  variables:
    float Tday(months,days);

would apply if the months were January, February, etc. instead of a
series of Januarys - how would months with less than 31 days be
recognized?

Nevertheless, I like the idea of a "contracted" axis (though it may not
be sufficient in all instances - see below).  However, such an axis
will have to allow for a dimension greater than unity (the GDT limit)
if the axis along which the contractions are taking place remains
1-dimensional.  Also, I think it is necessary to document the number of
"items" going into the contraction.  For example, in the case of a
"mean", one would need to know how many values were used to calculate
the current mean if additional values needed to be averaged in.

I've spent some time trying to meld some of your ideas with our ideas
and I'll toss them out to see what you think...

To start, I should mention that it seems to me that much of the problem
in documenting contractions (most typically "averages") seems to
revolve around characterizing the items comprising the contracted
values.  I've broken this down into "CONTINUITY" and "PLACEMENT"
qualities:

  CONTINUITY:
  ----------
    "CONTIGUOUS" :  items comprising the contracted variable are
                    contiguous along the axis undergoing contraction
  or
    "DISJOINT"   :  there are gaps along the contracted axis between the
                    items comprising the contracted value

  PLACEMENT:
  --------
    "UNIFORM"    :  the items comprising the contracted value are evenly
                    spaced along the axis undergoing contraction
  or
    "ARBITRARY"  :  the items comprising the contracted value are *not*
                    evenly spaced along the axis undergoing contraction

Note that the actual "contracted" variable is already described in
these terms simply by virtue of its coordinate on the contracted axis
and the "bounds" assigned to it. **However**, this only tells you where
the data is along the axis and distance over which the value can be
taken as "representative".  You may still need to document the total
time-span of the items going into the average.  (Example: January
average temperature, derived from 100 years of data.)


I'll give some example of each combination shortly, but for now let me
summarize a proposed set of attributes for documenting contractions.
These would most typically be attached to a coordinate variable (but
might have to attach to a data variable in some cases - see below):

"Mandatory" attributes to document contraction:
----------------------------------------------
   bounds   (a la GDT)       = bounds of period over which the value
                                 is representative
   contraction_type          = type of contraction ("mean", "minimum",
                                 "maximum" etc.)
   contraction_nitems        = number of items comprising avg at each
                                 point on axis
   contraction_itemselection = whether the items comprising avgs are
                                 "contiguous" or "disjoint"
   contraction_itemsample    = whether spacing of items comprising avgs
                                 is "uniform" or "arbitrary" along the
                                 contracted axis

Optional attributes:
-------------------
    contraction_span   = variable containing total span of coordinates
                           encompassing items in the avg if different
                           than that represented by "bounds"

  If "contraction_itemsample" = "uniform" (ie, regular):
  -----------------------------------------------------
    contraction_itemstart         =  starting position locating items on
                                       contracted axis
    contraction_itemdelta         =  size of steps between items comprising
                                       the average
    contraction_itemiscontraction =  if present, indicates contraction
                                       type for items

  If "contraction_itemsample" = "arbitrary" (ie, non-regular):
  -----------------------------------------------------------
    contraction_itemlocate        = variable containing location of each
                                       item comprising avgs
    contraction_itemsize          = variable containing "delta" of each
                                       item comprising avgs
                                    (**though I wonder whether an
                                      "_itembounds(,2)" would be better)
    contraction_itemiscontraction = if present, indicates contraction type
                                       for each item

       Obviously, these last 3 would need to be dimensioned large
       enough to hold information about all the items comprising the
       contraction.


**IF THE AXIS MUST BE USED FOR MORE THAN ONE CONTRACTION and the
   contraction qualities are different, it may be necessary to attach
   the contraction info to the data variables instead. (This would
   especially be a problem if the contracted axis is the UNLIMITED
   dimension, since we can't have both a regular *and* contracted
   version)  If so, use the above attributes, PLUS:

    contraction_axis     =  axis along which the contraction was taken
    contraction_bounds   =  array containing bounds of period over which
                                    the avg value is representative

This may not be a significant concern for the time being.  Perhaps
by the time we need to worry about this, netCDF will support more
one UNLIMITED dimension.


I've tried to define some typical cases which will illustrate various
aspects of this issue:

*********************************************************************
CASE #1 (the easy case):
-------
        **   A time series (12) of monthly average temperature,   **
        **   where each monthly mean is derived from daily means. **

        *************************
          CONTIGUOUS, UNIFORM
        *************************

  dimensions:
     lon = 96;
     lat = 40;
     avgtime = 12;

  variables:
     float lon(lon);
           lon:long_name="longitude";
           lon:units="degrees_E";
     float lat(lat);
           lat:long_name="latitude";
           lat:units="degrees_N";

     float Tavg(avgtime,lat,lon);
           Tavg:long_name="Average monthly temperature";
           Tavg:units="deg_K"

     double avgtime(avgtime);
            avgtime:units="days since 1-1-1990";
            avgtime:calendar="common_year";

            avgtime:associate="bounds_avgtime,nitems_avgtime,Tstart,Tdelta";
                 // ----------------------------
                 // Contiguous, Uniform Sampling
                 // ----------------------------
            avgtime:bounds                    ="bounds_avgtime";
            avgtime:contraction_type          ="mean";
            avgtime:contraction_nitems        ="nitems_avgtime";
            avgtime:contraction_itemselection ="contiguous";
            avgtime:contraction_itemsample    ="uniform";
            avgtime:contraction_itemstart     ="Tstart";
            avgtime:contraction_itemdelta     ="Tdelta";
            avgtime:contraction_itemiscontraction = "mean";

     double bounds_avgtime(avgtime,2);
            bounds_avgtime:long_name="endpoints of time over which the
average",
                                     " value is considered representative";
            bounds_avgtime:units="days since 1-1-1990";
            bounds_avgtime:calendar="common_year";

     long nitems_avgtime(avgtime);
          nitems_avgtime:long_name="Number of items in average";

     double Tstart(avgtime);
            Tstart:long_name="starting time of items on contracted axis";
            Tstart:units="days since 1-1-1990";
            Tstart:calendar="common_year";

     double Tdelta(1);
            Tdelta:long_name="size of steps between items",
                             " comprising average";
            Tdelta:units="days";

   data:
      avgtime = 15.5, 29.5, 44.0, ....... ;

      bounds_avgtime = 0.,31.,  31.,59.,  59.,90., ....... ;

      nitems_avgtime = 31, 28, 31, 30, ........;

      Tstart = 0.5, 31.5, 59.5,  ......... ;

      Tdelta = 1. ;

*********************************************************************
CASE #2 :
-------
        **   A time series of January average temperature, where    **
        **     each monthly mean is derived from daily means.       **

             (contracted data points are widely, but evenly,
              separated in time, with "blank" areas between, but
              the items comprising each avg are still contiguous
              and uniformly spaced in time)

        *************************
          CONTIGUOUS, UNIFORM
        *************************

  dimensions:
     lon = 96;
     lat = 40;
     avgtime = 3;

  variables:
     float Tavg(avgtime,lat,lon);
           Tavg:long_name="Average January temperature";
           Tavg:units="deg_K"

     double avgtime(avgtime);
            avgtime:units="days since 1-1-1990";
            avgtime:calendar="common_year";
            avgtime:associate="bounds_avgtime,nitems_avgtime,Tstart,Tdelta";
                 // ----------------------------
                 // Contiguous, Uniform Sampling
                 // ----------------------------
            avgtime:bounds                    ="bounds_avgtime";
            avgtime:contraction_type          ="mean";
            avgtime:contraction_nitems        ="nitems_avgtime";
            avgtime:contraction_itemselection ="contiguous";
            avgtime:contraction_itemsample    ="uniform";
            avgtime:contraction_itemstart     ="Tstart";
            avgtime:contraction_itemdelta     ="Tdelta";
            avgtime:contraction_itemiscontraction = "mean";

     double bounds_avgtime(avgtime,2);
            bounds_avgtime:long_name="endpoints of time over which the
average",
                                     " value is considered representative";
            bounds_avgtime:units="days since 1-1-1990";
            bounds_avgtime:calendar="common_year";

     long nitems_avgtime(avgtime);
          nitems_avgtime:long_name="Number of items in average";

     double Tstart(avgtime);
            Tstart:long_name="starting time of items on contracted axis";
            Tstart:units="days since 1-1-1990";
            Tstart:calendar="common_year";

     double Tdelta(1);
            Tdelta:long_name="size of steps between items comprising
average";
            Tdelta:units="days";

   data:
      avgtime = 15.5, 380.5, 745.5 ;

      bounds_avgtime = 0.,31.,  365.,396.,  730.,761. ;

      nitems_avgtime = 31, 31, 31;

      Tstart = 0.5, 365.5, 730.5 ;

      Tdelta = 1. ;

*********************************************************************
CASE #3 :
-------
        **  5-year average of the daily avg Temperature for each **
        **   of January 1,2,3  (ignoring any 2-D location)       **

           (items comprising each avg are widely, though evenly,
             separated in time, with "blank" areas in between)

        ***********************
          DISJOINT, UNIFORM
        ***********************

  dimensions:
     avgtime = 3;

  variables:
     float Tavg(avgtime);
           Tavg:long_name="5-year Average daily temperature";
           Tavg:units="deg_K";

     double avgtime(avgtime);
            avgtime:units="days since 1-1-1990";
            avgtime:calendar="common_year";

            avgtime:associate="bounds_avgtime,tspan_avgtime,nitems_avgtime,",
                              "Tstart,Tdelta";
                 // ----------------------------
                 // Disjoint, Uniform Sampling
                 // ----------------------------
            avgtime:bounds                     ="bounds_avgtime";
            avgtime:contraction_type           ="mean";
             avgtime:contraction_span          ="tspan_avgtime";   //
needed now
            avgtime:contraction_nitems         ="nitems_avgtime";
             avgtime:contraction_itemselection ="disjoint";
            avgtime:contraction_itemsample     ="uniform";
            avgtime:contraction_itemstart      ="Tstart";
            avgtime:contraction_itemdelta      ="Tdelta";
            avgtime:contraction_itemiscontraction = "mean";


     double bounds_avgtime(avgtime,2);
            bounds_avgtime:long_name="endpoints of time over which the
average",
                                     " value is considered representative";
            bounds_avgtime:units="days since 1-1-1990";
            bounds_avgtime:calendar="common_year";

     double tspan_avgtime(avgtime,2);
            tspan_avgtime:long_name="endpoints of timespan encompassing",
                                    " items in avg";
            tspan_avgtime:units="days since 1-1-1990";

     long nitems_avgtime(avgtime);
          nitems_avgtime:long_name="Number of items in average";

     double Tstart(avgtime);
            Tstart:long_name="starting time of items on contracted axis";
            Tstart:units="days since 1-1-1990";
            Tstart:calendar="common_year";

     double Tdelta(1);
            Tdelta:long_name="size of steps between items",
                             " comprising average";
            Tdelta:units="days";

   data:
      time = 0.5, 1.5, 2.5 ;

      bounds_avgtime = 0.,1.,  1.,2.,  2.,3. ;       // January 1,2, and 3

      tspan_avgtime = 0.,1825.,  0.,1825.,  0.,1825. ;   // =5
years...or...,
    // tspan_avgtime = 0.,1491.,  1.,1492.,  2.,1493. ; // <- a stricter
bracketing

      nitems_avgtime = 5, 5, 5 ;

      Tstart = 0.5, 1.5, 2.5 ;

      Tdelta = 365. ;



*********************************************************************
CASE #4 :
-------
        **        Average of irregularly-measured surface       **
        **    temperature for periods of <50% and >50% clouds   **

            (items comprising avg are irregularly spaced in time,
                 but there are no "blank" areas in between)

        **************************
          CONTIGUOUS, ARBITRARY
        **************************

  dimensions:
     avgtime = 3;
     items = 4;  // allow for up to 4 measurements during any
cloudy/non-cloudy period

  variables:
     float Tavg(avgtime);
           Tavg:long_name="Average temperature";
           Tavg:units="deg_K"

     double avgtime(avgtime);
            avgtime:units="hours since 1-1-1990";
            avgtime:calendar="common_year";
            avgtime:associate="bounds_avgtime,nitems_avgtime,",
                              "Timeofitem,Dtitem,Itemiscontraction";
                 // ----------------------------
                 // Contiguous, Arbitrary Sampling
                 // ----------------------------
            avgtime:bounds                    ="bounds_avgtime";
            avgtime:contraction_type          ="mean"
            avgtime:contraction_nitems        ="nitems_avgtime";
            avgtime:contraction_itemselection ="contiguous";
            avgtime:contraction_itemsample    ="arbitrary";
             avgtimecontraction_itemlocate         = "Timeofitem";
             avgtimecontraction_itemsize           = "Dtitem";
             avgtime:contraction_itemiscontraction = "Itemiscontraction";

     double bounds_avgtime(avgtime,2);
            bounds_avgtime:long_name="endpoints of time over which the
average",
                                     " value is considered representative";
            bounds_avgtime:units="hours since 1-1-1990";
            bounds_avgtime:calendar="common_year";

     long nitems_avgtime(avgtime);
          nitems_avgtime:long_name="Number of items in average";

     double Timeofitem(avgtime,items);
            Timeofitem:long_name="time of items comprising avg";
            Timeofitem:units="hours since 1-1-1990";
            Timeofitem:calendar="common_year";

     double Dtitem(avgtime,items);
            Dtitem:long_name=="delta-t of items comprising avg";
            Dtitem:units="hours";

     char Itemiscontraction(avgtime,items,4);
          Itemiscontraction:long_name="contraction type for items",
                                      " comprising contraction";

   data:
      avgtime = 1.5, 4., 7. ;

      bounds_avgtime = 0.,3.,  3.,5.,  5.,9. ;

      nitems_avgtime = 2, 4, 3;

      Timeofitem = 0.5, 2.0,    _,   _,
                   3.5, 4.0, 4.25, 4.5,
                   5.5, 7.0,  8.0,  _  ;

      Dtitem = 1.0,    2.0,    _,     _,
               1.0,  0.375, 0.25, 0.625,
               1.25,  1.25,  1.5,     _  ;

      Itemiscontraction = "", "mean",   _,   _, // meas. @ 2.0 hr was an
avg
                          "",     "",  "",  "",
                          "",     "",  "",  _ ;


*********************************************************************
CASE #5
-------
        **    An average "Flight-level Humidity", calculated as avg  **
        **       of %RH in 3 upper-level and 2 lower-level layers    **
              (items comprising avg are irregularly spaced in pressure,
                 with "blank" areas in between items comprising avg)

        **************************
          DISJOINT, ARBITRARY
        **************************

  dimensions:
     pressure = 1;
     items = 5;          // 5 layers comprise the "avg"

  variables:
     float HLhumavg(pressure);
           HLhumavg:long_name="Average humidity over 3 high and",
                              " 2 low layers";
           HLhumavg:units="percent"

     float pressure(pressure);
            pressure:units="Pa";
            pressure:associate="bounds_pressure,nitems_pressure,",
                               "Timeofitem,Dtitem";
                 // ----------------------------
                 // Disjoint, Arbitrary Sampling
                 // ----------------------------
            pressure:bounds                    ="bounds_pressure";
            pressure:contraction_type          ="mean";
            pressure:contraction_nitems        ="nitems_pressure";
            pressure:contraction_itemselection ="disjoint";
            pressure:contraction_itemsample    ="arbitrary";
             pressure:contraction_itemlocate     = "Presofitem";
             pressure:contraction_itemsize       = "DPitem";

     float bounds_pressure(pressure,2);
            bounds_pressure:long_name="endpoints of pressure over which",
                                      " the average value is considered",
                                      " representative";
            bounds_pressure:units="Pa";

     long nitems_pressure(pressure);
          nitems_pressure:long_name="Number of items in average";

     float Presofitem(pressure,items);
            Presofitem:long_name="pressure of items comprising avg";
            Presofitem:units="Pa";

     float DPitem(pressure,items);
            DPitem:long_name=="delta-P of items comprising avg";
            DPitem:units="Pa";

   data:
      pressure = 500. ;                 //
                                        // questionable what this means,
      bounds_pressure = 100., 850. ;    //   but what is the alternative?
                                        //
      nitems_pressure = 5;

      Presofitem = 150., 250., 350.,
                   750., 825.  ;

      DPitem = 100., 100., 100., 100., 50.  ;


*********************************************************************
CASE #6
-------
           **   Two time-series of daily average Temperatures, one of  **
           **      them a moving average over previous 5 days         **

          *************************
            CONTIGUOUS, UNIFORM
          *************************
                ** BUT **
        1. because the original and contracted variables both use the
            UNLIMITED dimension, the contraction info cannot be attached
            to a coordinate variable, so it must be attached to the data
            variable itself
        2. the contracted variable can provide a pointer to the
            original variable from which it was derived.

  dimensions:
     lon = 180;
     lat = 89;
     pres = 12;
     time = 20;

  variables:
     float lon(lon);
           lon:long_name="longitude";
           lon:units="degrees_E";
     float lat(lat);
           lat:long_name="latitude";
           lat:units="degrees_N";

     double time(time);
            time:units="days since 1-1-1990";
            time:calendar="common_year";

     float Temp(time,pres,lat,lon);
           Temp:long_name="Temperature";
           Temp:units="deg_K"

     float Tavg(time,pres,lat,lon);
           Tavg:long_name="10-day moving average Temperature";
           Tavg:units="deg_K"

            Tavg:associate="bounds_Tavg,nitems_Tavg,Tstart,Tdelta";
                 // ----------------------------
                 // Contiguous, Uniform Sampling
                 // ----------------------------
            Tavg:contraction_type          ="mean";
              Tavg:contraction_axis     ="time";
              Tavg:contraction_bounds   ="bounds_Tavg";
            Tavg:contraction_nitems        ="nitems_Tavg";
            Tavg:contraction_itemselection ="contiguous";
            Tavg:contraction_itemsample    ="uniform";
            Tavg:contraction_itemstart     ="Tstart";
            Tavg:contraction_itemdelta     ="Tdelta";
            Tavg:contraction_itemiscontraction = "mean";
            Tavg:contraction_refvar        ="Temp";

     double bounds_Tavg(time,2);
            bounds_Tavg:long_name="endpoints of time over which ",
                                  "the average value is considered ",
                                  "representative";
            bounds_Tavg:units="days since 1-1-1990";
            bounds_Tavg:calendar="common_year";

     long nitems_Tavg(time);
          nitems_Tavg:long_name="Number of items in average";

     double Tstart(time);
            Tstart:long_name="starting time of items on ",
                                  "contracted axis";
            Tstart:units="days since 1-1-1990";
            Tstart:calendar="common_year";

     double Tdelta(1);
            Tdelta:long_name="size of steps between items ",
                                  "comprising average";
            Tdelta:units="days";

   data:
      lon  = 1., 3., 5., ...... 359. ;
      lat  = -88., -86., ...... 88.;
      time = 0.5, 1.5, 2.5, ....... ;

      bounds_Tavg = _, _,
                    _, _,
                    _, _,
                    _, _,
                    0.,5.,  1.,6.,  2.,7., ....... ;

      nitems_Tavg = 5, 5, 5, 5, ........;

      Tstart = 0.5, 2.5, 3.5,  ......... ;

      Tdelta = 1. ;

*********************************************************************
*********************************************************************

Whew! I am still trying to come up with other examples to try out this
approach, so please let me know if you think of any.

Bye for now-
John Sheldon

(jps@xxxxxxxx)
Geophysical Fluid Dynamics Laboratory/NOAA
Princeton University/Forrestal Campus/Rte. 1
P.O. Box 308
Princeton, NJ, USA  08542
http://www.gfdl.gov
---
    "No good deed goes unpunished."
---

  • References:
  • 1997 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the netcdfgroup archives: