Hi again Jonathan- Sorry for the long delay in getting back to you. However, with the exception of a couple of topics needing some clarification, I thought that we were in pretty good agreement on just about all of the issues we covered in our (rather lengthy) comments/replies. Unfortunately, I seem to be continuing in the tradition of lengthy replies...sorry! First, let me dispense with a few of the more minor issues and then get into some more "thorny" topics... > > Take a vertical average (mean) over pressure from some pressure level (say, > > 50mb) down to the earth's surface. Now, since the surface pressure varies > > two-dimensionally, it seems that a dimension, being 1-D, will not be adequate > > to store the information about the integration bounds. [JS] > > I agree, this is a problem. You really want to be able to store a contracted > pressure coordinate with "bounds_pressure=surface,50 mb;". One way - rather > contrived - I can see to do this is suggested by the use of a hybrid vertical > pressure-sigma coordinate, where the pressure at level i is given as > "pressure=pressure_level(i)+sigma_level(i)*surface_pressure". Your upper level > is a pure pressure level, with pressure_level=50 mb, sigma_level=0. Your lower > level is a pure sigma level with pressure_level=0 mb, sigma_level=1. This > could be recorded in our conventions using a component attribute, thus: > > dimensions: > con_eta=1; > > variables: > float ke(con_eta,lat,lon); > float con_eta(con_eta); > con_eta:component="pressure_level,sigma_level"; > con_eta:quantity="hybrid sigma-pressure"; > con_eta:contraction="mean"; > float pressure_level(con_eta); > pressure_level:quantity="pressure"; > pressure_level:units="Pa"; > pressure_level:bounds="bounds_pressure_level"; > float bounds_pressure_level(2,con_eta); > float sigma_level(con_eta); > sigma_level:quantity="sigma"; > sigma_level:bounds="bounds_sigma_level"; > float bounds_sigma_level(2,con_eta); > > data: > bounds_pressure_level=0,50; > bounds_eta_level=1,0; > > This approach does require the application to know how to work out the > pressure, if it needs to, from the pressure_level and the sigma_level. Wow! That was very clever! But also very complicated! And specific to the case where the limits are sigma=0 to 1. I, myself, can't think of any other way to do it, so I suspect I would likely take the coward's way out and just add information to the "long_name" or a "comment" describing the average. I'll have to think about it some more... > Section 24: Time axes > > > Suppose I have an idealized model that is not associated with any calendar - > > it just ticks off hour after hour. How would I be allowed to specify time > > is this case? [JS] > > I think you specify it as a pure interval of time, with units="hours". There > is no problem with this, because if it is really not associated with any > calendar you will not want to add it to a date, which is when the problems > arise. The reason I raised this issue was that you seemed to propose that the absence of a "reference" time implied a time *interval* (section 28). Again, this is no big deal, since all "time" is really an interval from some reference time - in this case the beginning of a model integration - but it is possible that someone might misinterpret it somehow. > Section 26: Non-Gregorian Calendars > > > UDUNITS already has "common_year" defined indicate a calendar with 365 days. > > [JS] > > We used "noleap" for compatibility with LATS. I'd prefer to drive the conventions with UDUNITS than with LATS. > Section 27: Unimonth calendar > > The problem we are trying to address here is that of applications trying to > convert between different representations of time, where > ... > This may help a bit with John Sheldon's issue of comparing calendars, although > it is still a thorny problem. I don't think it will help with comparing calendars. One still cannot compare, say, observed data for December 31 to a 360-days-per-year model, because the latter simply has no "December 31". > Section 32: Missing values in a data variable > > > I think that the data should be checked against the "missing_value" *before* > unpacking. [JS] > > Yes, you may well be correct. Thanks. The problem then becomes: what will you put in the array of unpacked data if you find a missing value in the packed data? We store a global attribute to hold this value (say, -1.E30). In the absence of this global attribute, we simply stuff in a fill-value, which is OK, but you lose the distinction between intentionally and unintentionally missing data. In any case, we tell the calling routine what float values we used in both cases. ---------------- Now, the tough stuff... Mostly, this is an issue of storing averages (at least it is for us). But one can't really address this without at least some consideration of coordinate systems and multi-dimensional coordinates. Without getting too deeply into the details, let me take a minute summarize my thoughts at the moment. ( John Caron's recent mail ("songs about coordinate systems and buildings") came in while I was composing this. I hate to make this mail even *longer* >:-{, so I'll address that mail separately, though it is obviously related to what I'm discussing here. ) ---- Re: multidimensional coords - This is not our (ie, GFDL's) top priority now - Our main concern is that existing applications don't break if we want to use them someday and also want abide by conventions - We *do* use these in one significant instance: Trajectories eg: Q(i) w/ {X(i),Y(i),Z(i),t(i)} but this is a relatively straightforward, non-controversial case ---- Re: Coordinates, in general: I think we should have some way to: - specify the coordinate system occupied by a variable (Cartesian, cylindrical, spherical, etc.) - specify coordinate combinations, a la Gray Granger's proposal. Note that this is not intended to be exclusive; ie user can try other combinations, if appropriate. - preserve the author's power to mandate coordinate system (ie, can't be purely the choice of the viewer or application) Perhaps some combination of coordinate-system and axis-combination specification could be devised, such as in the following (abbreviates) example: dimensions: i=100; variables: float QC(i); QC:coordinates="cartesian_1" // point to a "map", ala Granger float QG1(i); QG1:coordinates="geodetic_1" // first part of string float QG2(i); // indicates system type QG2:coordinates="geodetic_2" float QY(i); QY:coordinates="cylindrical_1" :cartesian_1="X=x_dist, Y=y_dist, Z=z_dist" // X,Y,Z are keywords :geodetic_1="latitude=lat1, longitude=lon1, altitude=pressure" :geodetic_2="latitude=lat1, longitude=lon1, altitude=height" :cylindrical_1="rho=rho1, theta=theta1, z=cylz" float x_dist(i); float y_dist(i); float z_dist(i); float lat1(i); float lon1(i); float pressure(i); pressure:positive="down"; float height(i); height:positive="up"; float rho1(i); float theta(i); float cylz(i); You can see that I like Gary Granger's idea of defining a "coordinate map", then having variables reference it. I find this preferable to possibly having to define the same coordinate mapping several times for each variable in a subset of variables. (If all the variables use the same coordinates, this is obviously not a problem, as one could just make "coordinates" a global attribute.) The prefix of the "coordinate map" name tells what the basic coordinate system is, and the user-defined suffix distinguishes between multiple maps using the same coordinate system. The benefit of the keywords in the map definitions is clarity, and it also allows for a case where all variables in the file employ less than all three dimensions (four if we include time). There was some discussion a while back about the choice of which coordinate to use being up to the user or application. Note that the choice of using vertical coordinate "pressure" or "height" above (for example) is *not* arbitrary - one could be flat-out *wrong*! The values in "pressure" are not necessarily coincident with those in "height"; ie, they are not interchangable representaions of "altitude". Only the originator of the data knows for sure. Now, all this gets a lot more complicated when mapping multiple dimensions into multiple coordinates - I need to put some more thought into how that would work, but I think it's feasible. ---------------------- Re: Averaged quantities: *******!! ------------------ - This is a current (urgent!) concern for us here at GFDL - I do not much like the idea of a time-axis being 2-D (as is needed to use the "wrt" attribute) - Because the "contraction" attribute is proposed to refer to a dimension of length 1, this will not help in the case of a series of time-means (eg, time-series of monthly average temperature); - I can, with our current local conventions, document a certain class of averages now, without "wrt" (example below) - But, I have trouble with the type of average that "extracts" subsets of points along the time-line (example below) **NOTE: this is a general problem along *any* axis! (Examples below...) First, the easy case (by way of example): ----- ** A time series (3) of January average temperature, where each ** ** monthly mean is derived from daily means. ** Our local approach uses up to 6 additional quantities to store the required information. The first 3 of these are used in all cases. The last 3 can be used to describe the items comprising the average. dimensions: time = 3; day = 31; variables: float Tavg(time); Tavg:long_name="Average monthly temperature" ; Tavg:units="deg_K" Tavg:average_info="T1, T2, nitems, time_of_item, \ item_is_avg, dt_item"; double time(time); time:units="days since 1-1-1990"; time:calendar="common_year"; double T1(time); T1:long_name="starting time of average"; T1:units="days since 1-1-1990"; T1:calendar="common_year"; double T2(time); T2:long_name="ending time of average"; T2:units="days since 1-1-1990"; T2:calendar="common_year"; long nitems(time); nitems:long_name="Number of items in average"; float time_of_item(day,time); time_of_item:long_name="time of individual items comprising average" time_of_item:units="days since 1-1-1990"; short item_is_avg(day,time); item_is_avg:long_name="flag indicating whether item in average is itself an average"; double dt_item(day,time); dt_item:long_name="length of time over which the items comprising average are representative"; dt_item:units="days"; data: time = 15.5, 380.5, 745.5 ; T1 = 0., 365., 730. ; T2 = 31., 396., 761. ; nitems = 31, 31, 31; time_of_item = 0.5, 1.5, 2.5, ... 30.5, 365.5, 366.5, 367.5, ... 395.5, 730.5, 731.5, 732.5, ... 760.5 ; item_is_avg = 1, 1, 1, ... 1, 1, 1, 1, ... 1, 1, 1, 1, ... 1 ; dt_item = 1., 1., 1., ... 1., 1., 1., 1., ... 1., 1., 1., 1., ... 1. ; This works fine, because each mean is taken over a continuous span of time (ie, all of a January). "T1" and "T2" bracket the period. The "time" value is only somewhat arbitrary. (It seems logical to me that it be the midpoint of the averaging period, but I've heard others argue for assigning it a time equal to the starting or ending time of each period.) It is flexible enough to handle disparate items included in the average. And, "time" stays 1-D. * How would you handle this case using "contraction" and "wrt"? ------- NOW, the hard case (again, by way of example): --- ** 5-year average of the daily avg Temperature for each ** ** of January 1,2,3 (ignoring any 2-D location) ** (This case was inspired by *your* example in your response to my question about the first situation.) Here's what I would do with our local approach: dimensions: time=3; nyr=5; variables: float Tavg(time); Tavg:long_name="Average monthly temperature" ; Tavg:units="deg_K" Tavg:average_info="T1, T2, nitems, time_of_item, \ item_is_avg, dt_item"; double time(time); time:units="days since 1-1-1990"; time:calendar="common_year"; float T1(time); T1:long_name="starting time of average"; T1:units="days since 1-1-1990"; time:calendar="common_year"; float T2(time); T2:long_name="ending time of average"; T2:units="days since 1-1-1990"; time:calendar="common_year"; long nitems(time); nitems:long_name="Number of items in average"; float time_of_item(nyr,time); time_of_item:long_name="time of individual items comprising average" time_of_item:units="days since 1-1-1990"; short item_is_avg(nyr,time); item_is_avg:long_name="flag indicating whether item in average is itself and average"; double dt_item(nyr,time); dt_item:long_name="length of time over which the items comprising average are representative"; dt_item:units="days"; data: time = 0.5, 1.5, 2.5 ; // where do you place these in time? // - personal choice? case-driven? T1 = 0., 1., 2. ; // How do you express the time interval T2 = 1826., 1827., 1828. ; // over which the average was taken? / or ??: | T1 = 0., 1., 2. ; | T2 = 1., 2., 3. ; \ nitems = 5,5,5; time_of_item = 0.5, 365.5, 730.5, 1095.5, 1460.5, 1825.5, 1.5, 366.5, 731.5, 1096.5, 1461.5, 1826.5, 2.5, 367.5, 732.5, 1097.5, 1462.5, 1827.5 ; item_is_avg = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ; dt_item = 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1. ; Admittedly, this is not a very robust solution. But the principal problems are in specifying the "time" coordinate to assign to each point, and how to specify the boundaries of the period over which the average was taken. And these are only problems because we've picked items out of a continuous stream and processed only them. The decomposition of the time axis into 2 dimensions using the "wrt" approach seems to solve the latter problem to a large extent (at the expense of added complexity (IMHO:-) and the necessity of dealing with a 2-D time axis). The starting and ending "times" of the average are (effectively) "1990" and "1994". But we still have the problem of what "time" coordinate value to assign to the data. Your example indicated that you thought it should be the starting end-point of the averaging period. ** ** What we lack is a way to express the fact the we have "extracted" ** certain points out of a continuum and averaged only those points. ** ie, the average was not truly *along* the "time" axis! ** ** Again, there are 2 problems associated with this type of average: ** 1. *where* to "locate" the data along the contracted axis; ** 2. how to document the span of coordinate values over which ** the average was taken (since part of the total span ** isn't actually used in the calculation) ** There probably is no "right" answer to those questions, but perhaps we could clarify things simply by adding a descriptive attribute to the "time" coordinate variable. If the time span represented in the average were continuous (as in the earlier example), something like the following would suffice: time:domain_type="continuous"; \\ any "span" of times includes \\ all points in between This says that the points along the "time" axis describe a continuum, so that specifying a T1 and T2 for an average implicitly includes all the "times" in-between. In a case (like the "hard" example above) where data is "extracted" in some periodic fashion out of the continuous time-line and "contracted", we could have: time:domain_type="regular"; Here, the "times" are disjoint, but regular. We would use the variables "time_of_item", "item_is_avg", and "dt_item" to record the location along the time axis of the extracted/contracted data, the delta-t for which the average is considered representative, and whether or not the data values comprising the average were, themselves, averages (eg, taking an average of the daily average temperature for January 1, over the years 1990-1994). Now, we can specify "T1(1)=0., T2=1826." and know that while these are the endpoints of the time period for which the January 1 average was constructed, the average does *not* include all data between 0.<time<1826. We can look at "time_of_item", "item_is_avg", and "dt_item" to get the necessary characteristics of the quantity. It might be possible to simplify this a bit if we specify: time:domain_type="regular: <start>, <interval>, <width>"; and the average can be characterized by specifying a starting coordinate for the first point, an interval between points, and the width of the "time" window at each time. If the "extracted" data were not regular, or did not have the same delta-t at each sample window, we could use: time:domain_type="irregular"; I still need to think about the most concise way to merge all this and rationalize it with your "bounds", "subcell" and "contraction" concepts. I would be very interested in hearing your thoughts. --- BTW (though I'm sure this is obvious to you also), this problem is not necessarily limited to the time axis. Some examples (simple and contrived) of averages of "extracted" data: a) mean zonal wind within the two principal storm tracks (longitude=160-240E and 280-350E) b) mean combined cloudiness for the two layers 200-400mb and 700-850mb Cheers- John