Dear John Happy New Year! Thanks for your lengthy contribution to our discussion about representing contracted axes for time coordinates. You raise some useful points about what information we need to record. The main difference between your approach and ours is that you have chosen to attach information about the contracted axis to another axis, whereas we proposed to store it as a singleton axis of its own. For instance, in the first example of a timeseries of 12 monthly average temperatures derived from daily means, * you have Tavg(avgtime,lat,lon), where avgtime is a 12-element time axis, and has attributes describing how each month was derived from days, whereas * we have Tavg(day,month,lat,lon), where month has a dimension of 12 and day a dimension of 1, the latter recording the information about the contraction of daily means into a monthly mean. One reason for adopting your approach is that you say that existing software would not be able to handle our approach, with its two (or more) time axes. That would be undeniably an obstacle to adopting our scheme - what is the problem, in fact (generically)? Apart from this problem, you argue that our approach is more difficult to understand. I would like to argue that it has important advantages of flexibility and consistency. * It is flexible because it can easily be extended. In your system, you can describe a timeseries of 12 monthly means, each derived from daily means; and you can describe a timeseries of climatological daily means for particular days of the year, each derived from a mean of corresponding days from several years. But I do not see how you can describe a combination of these two e.g. the climatological maximum daily mean for each month of the year, i.e. (1) for each day, calculate the mean; (2) within each month, find the maximum daily mean; (3) compute the mean of these maxima for corresponding months over several years. This is represented in our scheme by three time axes (day,month,year), where both day and year are contracted singleton axes. I think your inclusion of the attribute contraction_itemiscontraction indicates that you recognise a need for contractions within contractions; but I would argue that information of the same detail and kind needs to be recorded for each one, and it is simplest to use the same structures to do it. * It is consistent because it is the same as the approach we propose for contracted spatial axes. In our scheme, from a two-dimensional variable (lat,lon) with lat=72, lon=96, we derive a zonal-mean field (lat,con_lon), with con_lon=1 as a contracted axis recording information about the range and spacing of longitudes over which the average is formed. This is exactly the same as our treatment of a time contraction. In your example 5, you have treated a contraction of a pressure axis in just this way, leaving a contracted singleton axis. But in your treatment of time contractions, you don't do this: you record the contraction using attributes on the remaining uncontracted time axis, not using a contracted singleton axis. >From several of your mails, it is clear that you regard our scheme of multi- dimensional time axes as difficult to understand and to process. In fact, I didn't think of it as "multidimensional" in the first place; I regard it as more of a "decomposition". Anyway, perhaps I could propose an alternative which might be a bit easier to come to grips with. I suspect that one of the main reasons why it seems complicated is that we have to add up the coordinates in the various time axes. The reason why we do this is that it's more general. It allows us to describe, for instance, a mean of five 17-day periods spaced at 63-day intervals. But maybe this is unnecessary. The reason why time needs special handling is because it has two natural cycles (seasonal and diurnal) and we frequently want to contract over these. In addition, the double- contraction example above shows that the within-month "cycle", while not natural (I think the lunar cycle is only a cultural link), is fairly common in climatology. But my 17- and 63-day example here does not refer to any of these cycles, and so is no more likely than wanting to make periodic means of arbitrary length in some other coordinate, such as longitude. We have not made special provision for such means. Maybe we should in fact make our currently handling of multidimensional time into something of more general application for arbitrary linear axes. In that case, as regards time specifically, we can restrict our attention to those natural cycles. It is then convenient for us just to "decompose" it into: year, day-within-year, and time-within-day. Day-within-year may further be decomposed into month and day-within-month, if necessary. These quantities can be put back together straightforwardly. Would you be any happier with dimensions (year,mmdd,hhmmss) than a "three-dimensional time axis"? In terms of what you can do with them, there is no real difference; the difference lies in the representation of the coordinates and hence how you combine them. With this scheme, (a) 12 monthly means derived from daily means would have a singleton year axis (not a contraction, presuming they came from a single year), a 12-element mmdd axis, and a singleton hhmmss axis, the result of contracting an axis with separate elements for each day. I feel that once the axis is contracted it is not necessary to say how many days were in each month. (b) the climatological daily means for particular days of the year, each derived from a mean of corresponding days from several years, would be described by a contracted year axis and an uncontracted mmdd axis. (c) the climatological maximum daily mean for each month of the year would have a contracted year axis, an uncontracted month axis, a contracted day-within-month axis (the "maximum" contraction), and perhaps a contracted hhmmss axis showing how each daily value was obtained. When we made our GDT proposal, we were thinking principally about data exchange. Our criterion for what metadata to include was therefore to suggest the minimum necessary to distinguish quantities which one might want to give to another climate centre e.g. for CMIP. For this reason, we chose not to keep much information about what the coordinates were before the contraction. Our proposal only records the range and the minimum and maximum spacing of these coordinates. We thought that, for instance, it would be sufficient to describe a quantity as a vertical average of relative humidity between 100 mbar and 850 mbar made from levels having separations of between 50 mbar and 100 mbar (say). We considered it unlikely that in a single dataset one would have two different sets of levels which upon contraction would have just the same description in these terms; hence we decided there was no need to record any more information for the sake of distinguishing variables. If you broaden the uses of the convention, and want to adopt it as a general- purpose data format, I agree that you might sometimes want to record more information. I think your categories of continuity and placement are sensible, and I would be in favour of including optional attributes of these names, with the possible values you suggest of "contiguous" vs "disjoint" and "uniform" vs "arbitrary". However, if you actually want to record the original uncontracted coordinates, as you do in your examples 4 and 5, I still prefer the suggestion I made last time of using a separate axis from the contracted axis, and pointing to it with an attribute (e.g. "expand") of the contracted axis. In my view, the contraction is unaffected, and should still be represented as a singleton axis. What is different is that additional information is being supplied about the axis before it was contracted. I prefer this approach because it avoids defining new kind of attributes for this purpose, when one can reuse all the definitions we already have for supplying axis coordinates, together with their bounds and perhaps components and so on. The axis named by the expand attribute of the contracted axis would be identical in all attributes to the axis before it was contracted. I would argue also against encoding the uncontracted coordinates, in the "uniform" case, in terms of a starting value and step. This is indeed very tempting. Why don't we do it, then, for ordinary axes? Many of our axes have evenly spaced coordinates, and we record them explicitly in our coordinate variables, when we could instead do it with a starting value and a step. I think the reason we don't do it is because it complicates the code. We cannot avoid having to handle the case of arbitrary coordinates. If we decide to support start-and-step as well, we either write two separate blocks of code, or we expand start-and-step into a vector of coordinates before processing it further. In either case, extra programming is needed. This is an overhead, and it's not worth doing because the amount of space saved by encoding a coordinate variable in this way is trivial. In most cases, the space required by coordinate variables is dwarfed by that of data variables. I feel that it is simpler and better to store all coordinate variables explicitly, and I think the same arguments apply to the record of the coordinates before contraction, if you want to store them. This approach also avoids defining more kinds of attributes. In cases 4 and 6, you have introduced another idea GDT did not consider. Here you wish to record an operation of meaning the data points in groups of varying size according to the value of some other data variable (cloud cover). I think it is unlikely that you would need to supply this kind of information to distinguish between quantities, so I would suggest that we don't need to deal with it for the moment in a convention. I tend to feel that this is rather a specialised operation which does not comfortably fit into the general framework of contractions, as we proposed them, since it does not result in a singleton axis. Your representation of this operation also has the slight awkwardness of having to guess a maximum size for these groups, and needing to supply null values to fill up a two-dimensional array.I To avoid this, I'd like to suggest an alternative way of doing it for your consideration: dimensions: time=10; avgtime=3; variables: double time(time); time:units="hours since 1-1-1990"; long timeindex(time); double avgtime(avgtime); time:units="hours since 1-1-1990"; float Tavg(avgtime); Tavg:quantity="temperature"; Tavg:units="deg_K"; Tavg:contraction="mean"; Tavg:group="timeindex"; Tavg:expand="time"; data: time=0.5,2.0, 3.5,4.0,4.25,4.5, 5.5,7.0,8.0; timeindex=0,0, 1,1,1,1, 2,2,2,2; This scheme proposes that the presence of a "group" attribute means that the "contraction" has not reduced an axis to size one, but has first divided it up into groups and then contracted each group. The variable named by the group attribute shows the allocation of the original axis into groups. As before, the variable named by the expand attribute gives the uncontracted coordinates. I am think of this as reminiscent of SQL, contrasting plain "select avg(time)" with "select avg(time) group by timeindex", where time and timeindex are two columns of a table. Best wishes, Jonathan