A late reply to John Sheldon's comments on multidimensional and contracted
time-axes, as proposed by GDT:
> First, the easy case (by way of example):
> -----
> ** A time series (3) of January average temperature, where each **
> ** monthly mean is derived from daily means. **
>
> Our local approach uses up to 6 additional quantities to store the
> required information. The first 3 of these are used in all cases.
> The last 3 can be used to describe the items comprising the average.
>
> dimensions:
> time = 3;
> day = 31;
>
> variables:
> float Tavg(time);
> Tavg:long_name="Average monthly temperature" ;
> Tavg:units="deg_K"
> Tavg:average_info="T1, T2, nitems, time_of_item, \
> item_is_avg, dt_item";
> double time(time);
> time:units="days since 1-1-1990";
> time:calendar="common_year";
>
> double T1(time);
> T1:long_name="starting time of average";
> T1:units="days since 1-1-1990";
> T1:calendar="common_year";
>
> double T2(time);
> T2:long_name="ending time of average";
> T2:units="days since 1-1-1990";
> T2:calendar="common_year";
>
> long nitems(time);
> nitems:long_name="Number of items in average";
>
> float time_of_item(day,time);
> time_of_item:long_name="time of individual items comprising
> average"
> time_of_item:units="days since 1-1-1990";
>
> short item_is_avg(day,time);
> item_is_avg:long_name="flag indicating whether item in average is
> itself an average";
>
> double dt_item(day,time);
> dt_item:long_name="length of time over which the items comprising
> average are representative";
> dt_item:units="days";
>
> data:
> time = 15.5, 380.5, 745.5 ;
>
> T1 = 0., 365., 730. ;
> T2 = 31., 396., 761. ;
>
> nitems = 31, 31, 31;
>
> time_of_item = 0.5, 1.5, 2.5, ... 30.5,
> 365.5, 366.5, 367.5, ... 395.5,
> 730.5, 731.5, 732.5, ... 760.5 ;
>
> item_is_avg = 1, 1, 1, ... 1,
> 1, 1, 1, ... 1,
> 1, 1, 1, ... 1 ;
>
> dt_item = 1., 1., 1., ... 1.,
> 1., 1., 1., ... 1.,
> 1., 1., 1., ... 1. ;
>
>
> This works fine, because each mean is taken over a continuous span of
> time (ie, all of a January). "T1" and "T2" bracket the period. The
> "time" value is only somewhat arbitrary. (It seems logical to me
> that it be the midpoint of the averaging period, but I've heard
> others argue for assigning it a time equal to the starting or ending
> time of each period.) It is flexible enough to handle disparate
> items included in the average. And, "time" stays 1-D.
>
> * How would you handle this case using "contraction" and "wrt"?
The simplest way you could represent this using the conventions of GDT would
be without the information about the items making up the average. In this case,
as with your scheme, time is one-dimensional. T1 and T2 are recorded as
boundary coordinates, and time as the main coordinate. I agree with you that
it is logical that time should be the midpoints, but it is arbitrary.
dimensions:
time = 3;
variables:
float Tavg(time);
Tavg:quantity="temperature";
Tavg:units="deg_K";
double time(time);
time:quantity="time";
time:subcell="cell"; // indicates these are not instantaneous values
time:units="days since 1-1-1990";
time:bounds="bounds_time";
double bounds_time(2,time);
data:
time = 15.5, 380.5, 745.5 ;
bounds_time=0., 365., 730.,
31., 396., 761.;
To arrive at the idea of a contracted axis, consider the original 3*31 days
organised into two dimensions of time. The first dimension is over the "major"
time interval of months, the second over the "minor" interval of days within
the month. This gives:
dimensions:
months=3;
days=31;
variables:
float Tday(months,days);
double months(months);
months.quantity="time";
months.units="days since 1-1-1990";
float days(days);
days.quantity="time";
days.subcell="cell";
days.units="days";
days.wrt="months";
days.bounds="bounds_days";
float bounds_days(2,days);
data:
months=0., 365., 730.,
days= 0.5, 1.5, 2.5, ..., 30.5;
bounds_days=0.0, 1.0, 2.0 ..., 30.0,
1.0, 2.0, 3.0 ..., 31.0;
The way to interpret the time coordinates here is to add the offset times
(marked with wrt) to the absolute times. Thus, the element Tday[1][2] has
a time coordinate 365.0+2.5, with boundaries 365+2.0 and 365.0+3.0.
Now we contract the days axis, to produce
dimensions:
months=3;
con_days=1;
variables:
float Tday(months,con_days);
double months(months);
months.quantity="time";
months.units="days since 1-1-1990";
float con_days(con_days);
con_days.quantity="time";
con_days.subcell="cell";
con_days.units="days";
con_days.wrt="months";
con_days.bounds="bounds_con_days";
con_days.contraction="mean";
con_days.interval=1.0;
float bounds_con_days(2,days);
data:
months=0., 365., 730.;
con_days= 15.5;
bounds_con_days=0.0, 31.0;
(Here I have departed slightly from GDT, by showing an "interval" attribute
instead of "max_interval" and "min_interval". This is because I have a further
suggestion to make below.)
The contracted axis, with a dimension of unity, tells us that the data value
for each of the three months was derived by averaging values applying to times
separated by 1 day and covering a period of 31 days. The subcell attribute
tells us, further, that these values were initially representative of their
time cells, rather than instantaneous measurements.
It is possible that this last piece of information is not sufficiently
precise. Suppose the original daily values were daily maxima. In this case we
consider a notional sub-daily time axis, containing an indefinitely large
number of times within the daily cycle. This axis is then contracted by finding
the *maximum* value rather than the mean. The sub-daily interval is not defined
or needed. We thus record the information that the monthly value is the mean of
31 daily maxima by appending a third time axis with bounds of 0.0 and 1.0 day,
contraction="max", wrt="con_days".
This might seem a bit excessive. I am not entirely sure about whether in fact
this information might be better off by allowing subcell="max" instead.
However, I think it would be sensible to include an extra contracted axis if
there *was* an interval you wanted to record. For example, suppose you wanted
to record that the daily value was the maximum of pressure measurements made at
3-h intervals through the day.
> NOW, the hard case (again, by way of example):
> ---
> ** 5-year average of the daily avg Temperature for each **
> ** of January 1,2,3 (ignoring any 2-D location) **
>
> ... The principal
> problems are in specifying the "time" coordinate to assign to each
> point, and how to specify the boundaries of the period over which the
> average was taken. And these are only problems because we've
> picked items out of a continuous stream and processed only them.
>
> The decomposition of the time axis into 2 dimensions using the "wrt"
> approach seems to solve the latter problem to a large extent (at the
> expense of added complexity (IMHO:-) and the necessity of dealing
> with a 2-D time axis). The starting and ending "times" of the
> average are (effectively) "1990" and "1994". But we still have the
> problem of what "time" coordinate value to assign to the data...
>
> **
> ** What we lack is a way to express the fact the we have "extracted"
> ** certain points out of a continuum and averaged only those points.
> ** ie, the average was not truly *along* the "time" axis!
> **
> ** Again, there are 2 problems associated with this type of average:
> ** 1. *where* to "locate" the data along the contracted axis;
> ** 2. how to document the span of coordinate values over which
> ** the average was taken (since part of the total span
> ** isn't actually used in the calculation)
> **
These are hard questions, by which I also have been tormented! I cannot deny of
course that the contracted multidimensional time axes are complex; I hope I can
persuade you that the complexity is worthwhile. The approach aims principally
to deal with point (2). To repeat in words what GDT suggest: There are two time
axes. The first is a contracted years axis, with boundaries of 1st Jan 1990 and
1st Jan 1994, interval of 1 year, contraction of "mean". The second is a
5-element days axis, wrt the contracted years axis, coordinates 0.5,1.5,2.5
days, lower boundaries 0.0,1.0,2.0 days, upper boundaries 1.0,2.0,3.0
days. This means that the second data value, for instance, represents a period
of 1 day, and was obtained by averaging corresponding periods spaced a year
apart. The first of these periods is definitely located as from 1st Jan 1990 +
1.0 days to 1st Jan 1990 + 2.0 days, with a representative value of 1st Jan
1990 + 1.5 days. The last, similarly, is in 1994.
I think this gives enough information to enable one automatically to produce a
description of what this value applies to. I would label it "00:00 2nd Jan -
00:00 3rd Jan, meaned over 1990-1994". Although this can be represented by a
brief phrase, the GDT scheme is not limited to cycles that can be easily
related to the calendar. We could use exactly the same method to describe a
value which applied to an average of periods of 23.93 h spaced 365.3 days
apart, for example.
The answer to your question (1) is not really well defined. The best answer I
can give to where to "locate" the value in time is the label I suggest above,
which is a translation of all the available information. It doesn't really
belong anywhere in particular on the contracted axis alone. However, if I had
to produce a single time coordinate, for the sake of plotting, I would probably
go for 1 Jan 1990 + 1.5 days. I would label the point just "12:00 1 Jan" on the
plot, if possible, omitting the year.
An advantage of the multidimensional approach is that you can have as many of
these axes as you like without straining the scheme. At the end of the
discussion of the easy case I gave a three-dimensional example. I think this is
appealingly flexible. It is easy, for example, to label an average as applying
to 10:00-12:00 on all days in JJA in a range of years. Since each contraction
has a separate contraction attribute, it is possible to record that a quantity
is a maximum over a number of years of the March mean of daily minima, for
example.
Unlike yours, our scheme does not indicate how many points there were before
averaging, or what their coordinates were. This is because our aim was to
provide enough metadata to distinguish quantities which are likely to need to
be distinguished. We did not try to include all possibly useful information. I
think it is *unlikely* that you would have two different quantities, both being
means for an average 2nd Jan, one for the years (1990,1991,1992,1993,1994) and
the other for (1990,1991,1994). These could not be distinguished in GDT, but
they could in your scheme, which records the number and values of the original
coordinates.
Perhaps you feel that this distinction may need to be drawn? You have
further examples of this:
>
> a) mean zonal wind within the two principal storm tracks
> (longitude=160-240E and 280-350E)
>
> b) mean combined cloudiness for the two layers 200-400mb and
> 700-850mb
I agree that you might well want to record such information in the metadata.
GDT does not handle this, although we did think about it. My preferred
extension to GDT would be to allow upon a contracted axis an attribute
"expand", naming a variable which provides the coordinates of the original
uncontracted axis, including boundaries if appropriate. In case (a), for
instance, we might have
dimensions:
con_lon=1;
lon=2;
variables:
float uwind(con_lon);
float con_lon(con_lon);
con_lon:expand="lon";
float lon(lon);
lon:bounds="bounds_lon";
float bounds_lon(2,lon);
data:
con_lon = 180.0 ; // a purely nominal value
lon = 200.0, 315.0;
bounds_lon = 160.0, 280.0,
240.0, 350.0;
Best wishes,
Jonathan