Hello again, There's been so much to talk about in the past few days I'm not sure I know how to respond. First, I think we all agree that conventions are needed and these conventions should support the "possibility of writing applications that access generic netCDF files and operate on them." So, what do we need to do to support this. I contend that the conventions.info, in its current form, is somewhat inadaquate, for a few reasons which I intend to cover. I'd like to start by looking at some examples based on what Tim Holt stated. Before I start I should state that I am a "computer tweak?". Although I've never heard this term before, I'm sure it could be applied to me. Tim writes: What it comes down to for the average tech/PI with raw data is this -- "I want to make a graph of time vs temperature", "I want to plot the tracklines from the cruise", or "Where were we when we took water sample 38, and what was the flow-through temp and conductivity?" I think discussing what a "generic application" will need to know, about the data, inorder to accomplish these tasks will highlight some areas where conventions will do a lot of good and some areas where conventions, of the wrong type, may inhibit the production of general tools. So lets look at each of these examples and what's involved with implementing them from the applications perspective. "I want to make a graph of time vs temperature" Seems simple enough. First a generic application may know about the concept of time but it realy doesn't need to know that the dependent variable is temperature. In fact all the application really needs is an array that represents coordinates in the X direction, an array that represents coordinates in the Y direction and which one is the independent variable. With these arrays it can then determine what the ranges of the values in this data are and can then set up a window->viewport mapping for transforming the data onto a location on the screen. Not really much of a problem except for how does the application no which of possibly many variables in the file are the appropriate variables to use to make this plot? Now what happens when the data is not stored in two simple arrays? Whose responsibility is it to state how variables in the netCDF file should be selected and ordered to produce the two arrays needed for this task? For example, a data set could be collected that contains temperature, pressure and humidity. The following are a couple of the many possible ways to put this data in to a netCDF file. netcdf file1 { dimensions: values = 5; time = UNLIMITED; variables: float dataset1(time,values); dataset1:index0 = "temperature" dataset1:index1 = "pressure" dataset1:index2 = "humidity" dataset1:index3 = "lattitude" dataset1:index4 = "longitude" long time(time); } netcdf file2 { dimensions: values = 3; latlon = 2; time = UNLIMITED; variables: float dataset2(time,values); dataset2:index0 = "temperature" dataset2:index1 = "pressure" dataset2:index2 = "humidity" long time(time); float location(time,latlon); dataset2:index0 = "lattitude" dataset2:index1 = "longitude" } The reasons why someone would want to organize their data in this fashion is inconsequential. The reasons may be related to how the instrument measuring the data works. In these two examples one file uses 3 netcdf-dimensions and three variables and the other uses 2 netcdf-dimensions and two variables to represent the same data. So now I ask the question again, how is the application supposed to know what it means to plot time vs temperature? These are VERY VERY simple examples. The complexity of "understanding" the organization of the data, from simply looking at the organization of the variables and dimensions in a file, grows as higher dimensional datasets are looked at. The number of permutations in the organization of a dataset grow as the dimensionality of the data grows. "I want to plot the tracklines from the cruise" What information is needed by the application in this case. The app needs to know which variables in the netCDF file are "latitude" and "longitude" and that the data is infact geographic data. It then needs to determine what the extent of the lattitude and longitude variables are so it can select the appropriate map projection. Again as in the previous example this data could exist in the netCDF file in various organizations. "Where were we when we took water sample 38, and what was the flow-through temp and conductivity?" This type of request, if made directly to a generic application, would require the application to "know" what "sample 38", "flow-through", "temp" and "conductivity" are, where they're stored and how to access and display them. This certainly seems like it would be out of a resonable scope of capability of a generic application. As can be seen there are several things that a self-describing netCDF file cannot possibly describe to an application. IMHO the primary problem is a lack of standard organizations of data or a lack of a mechanism for communicating the organization of the data. By organization I mean what are the geometries of the data( 1D, 2D, 3D ...), what set of variables and dimensions make up a single data set, is the set of a certain class of data ( Rectilinear grid, scattered, line, irregular grid, mesh . . .), does a given variable represent an independent or dependent variable. I maintain that these are the types of information for which conventions are needed in order to realize "applications that access generic netCDF files and operate on them." The current conventions.info document only standardizes names, although important for allowing humans to understand the data, it is inadaquate for communicating to the application how the data is organized. Understanding the organization is needed to allow the application to determine which methodes could be used to visualize the data. If the intention of standardizing names is to allow applications to "understand" the data based on the names of variables in a netCDF file, it won't work as well as standardizing data representations (organizations). Why, because many types of data from different disciplines can be classified and visuzilized based on the geometry information(coordinate system) of the data which does not depend on the names or type of data, but on the structure. Using names like "sfc_t" for surface temperature does nothing to comunicate the organization of the data or allow an application to infer a visualization method unless the application has been configured to "understand" all of the names in the conventions.info document. This is completely unnecessary do to the fact that most data fit in to simple classes (organizations, structures) of data. A boat moving around on the surface of the ocean and collects data The struture or class of data for these data sets can be classified as a "2D Random data set." Why 2D, because in each case there are 2 coordinates (lat,lon) that define the location of the sample point. Why Random, because there are no functional relationships between the coordinate pairs. Similar abstractions can be made for gridded data and other classes of data. I feel very strongly that these are the areas that need to be standardized not names but structures. Until there is a method of grouping variables in a netCDF file such that the geometric properties of the data can be inferred a generic visualization application is really impossible. -ethan -- Ethan Alpert internet: ethan@ncar.ucar.edu | Standard Disclaimer: Scientific Visualization Group, | I represent myself only. Scientific Computing Division |------------------------------- National Center for Atmospheric Research, PO BOX 3000, Boulder Co, 80307-3000