GALEON and Other Outreach Progress 

Ben Domenico
October 13, 2008

Standard Interfaces for Data System Interoperability


Interactions with the Unidata proposal review panel helped clarify the philosophy and strategy for pursuing interoperability for our data systems via standard interfaces.  For the core community, we support components for an end-to-end system, namely, IDD/LDM for real time push data deliver, decoder suite for format transformations, TDS for "pull" data access from remote servers, client applications for analysis and display.  For other communities who use different client applications, we provide access to data via standard interfaces, e.g., the netCDF API, OPeNDAP and WCS.  Thus they can use any client they wish as long as it can access the data via these interfaces.  There are many such clients in use: IDV, PMEL Ferret, ITT Vis IDL, Matlab, ESRI arcGIS, to name a few.   Providing data via these standard interfaces thus makes the data available to a wide variety of communities using a broad array of analysis and display tools, but does not impose a burden of support on the UPC for supporting those tools.  

Work on OGC Oceans Interoperability Experiment is winding down.  The final report has been drafted and is under review by the Oceans IE team.

Two main accomplishments for the OGC GALEON IE since the last report:

Possible NSF Proposals

CoAHP Proposal led by NCAR GIS Initiative

Olga Wilhelmi continues to revise the proposal according to the guidance of NSF's Doug James. As you may recall the original proposal was for a project that would bring the data systems of the Unidata and CUAHSI HIS together to serve interdisciplinary research.  NSF encouraged us to continue with an expanded proposal that would bring in the ecology community. But the effort should start with a workshop that would include representatives of all three disciplines.  Olga has now 1) strengthened the ecology component (and workshop agenda) and 2) provided a list of potential participants from each community.  But the proposal for the workshop has yet to be reviewed.

Datanet Proposal led by SDSC

 Initial steps take for pre-proposal to the NSF Datanet opportunity.  This is being led by the SDSC.  There would be a thematic focus on transforming scientific climate data into forms usable by a broader audience, e.g., decision makers and the general public.  Unidata's role would relate mainly to providing data and tools for real-time access to datasets related to extreme weather events.  Our role would be similar to one of our roles in LEAD, namely specialized support for the use of our tools in the context of the project and informing our community about the project and disseminating the components of the system to our community where appropriate.  For example some of our sites may be interested in implementing the systems that transform the research data into forms useful in an educational setting. 

Data System Standards

OGC Technical Committee MeetingHighlights

This is is a brief summary of topics of interest to Unidata and the GALEON project from the June OGC Technical Committee meetings.
 
Relating to CF-netCDF, there was discussion of the new encoding format document for which a draft is nearly ready to be submitted as a WCS standard extension specification.  There is still some concern that this encoding format specification will be closely coupled to the actual WCS protocol specification and a suggestion was made that coverage encoding format documents be submitted as "best practices" documents rather than standard extensions.  However, subsequent discussions via email and at the meeting led to the conclusion that a "best practices" approach would lead to a coupling with the standard that was too loose.  So the operative plan now is to continue on the previous path and submit the CF-netCDF encoding specification as a WCS standard extension as soon as possible. A much expanded draft of that extension standard is under review by the GALEON participants.

There was also interest in the fact that we are attempting to map a variety of scientific data types (e.g., the Unidata CDM scientific data types, the BADC Climate Science Modelling Language scientific feature types) as coverages as understood by ISO.  This would include collections of point, station, sounding, trajectory, radar scan, swath, etc. that are not "gridded" and hence have not traditionally been thought of as coverages.  The ISO 19123 coverage definition does however include collections of discrete point.  For the netCDF community, the first order of business is to extend and adapt the CF conventions to encompass these data types fully.
 
In terms of catalogs, there were discussions with the ESRI reps who indicated that there may actually be some facilities for tying CSW catalog information with WCS access in the new arcGIS 9.3.  But none of the people at the meeting has really had enough experience with these OGC interfaces in that release to say exactly how that can be done.  It's something I have to follow up on.   It was confirmed that the 9.3 WCS client cannot access CF-netCDF encoded information.   So our initial experiments with the beta release are in fact accessing geoTIFFs from THREDDS Data Servers.   In discussing it, we concluded that this is not as much of a mystery as it seemed initially.  The native netCDF read/write (from local disk) capability in arcGIS 9.2 actually only brings in 2D "slices" at a time, but that restriction is not possible in the current WCS implementations.  
 
Lorenzo Bigagli gave an excellent summary of the ESSI (Earth and Space Science Informatics) sessions at the recent EGU conference.  He also indicated that a new release of the Gi-GO catalog and data access client is in the works.
 
The discussions of Google's KML were focused mainly on mass market display of data in Google Earth and Google Maps.   Much of the interaction was dominated by commercial applications interested in wide exposure to large segments of the general public.  The emphasis is on the display of data and not so much on the analysis tools needed by the research community.  Obviously there is considerable interest in the academic community, but, in terms of supporting infrastructure, what might be of most use is some sort of service that would automate the process of conversion of netCDF data slices into KML for display.

The meeting presentations are available at:

 http://portal.opengeospatial.org/index.php?m=projects&a=view&project_id=82&tab=2&artifact_id=27720

Recent GALEON Activity

The GALEON Interoperability Experiment has been very active recently -- mainly via a set of email exchanges.  The main topic areas under active discussion are:

1. WCS-netCDF extension standard.
The main issue here is how to incorporate an OPeNDAP option.  I sense agreement that there should be an OPeNDAP option in WCS, but it could be done in terms of a small addition to the proposed CF-netCDF extension standard or it could be done as a separate extension spec that leverages and builds on the CF-netCDF proposal.
To me, this is still the top priority issue and it would be good if it could be resolved so the resulting WCS extension standard(s?) can be formally submitted to the WCS1.2.SWG before the next OGC TC meeting which is the first week in December.

2. Non-gridded coverages.  
The issue here is what we do in both the CF community AND the OGC community about the types of data that do not fit the current WCS definition of regular grids.  This includes the types of datasets (and collections thereof) that have been listed and discussed as CDM (Common Data Model) scientific data types and as CSML scientific feature types.   I put this as the second highest priority because it encompasses important work for both communities:  
-- the CF community must extend the conventions to include these data types -- with special care to explicitly Coordinate Reference System (CRS) information whereas 
-- the OGC community has to come to grips the fact that these data types can be seen as features and/or coverages and that some harmonization -- or at least better understanding -- is needed among alternative delivery protocols (WFS, WCS, SOS, WMS?).  
I'd will attempt to spawn separate discussions of these issues because the CF work should be undertaken sooner rather than later and the OGC issues are on the agenda for the December TC meeting in a special joint session the afternoon of Monday Dec. 1.

3. WCS Core and Extensions.  
A couple topics have come up here.  One is the more general question of whether there is anything in the current WCS 1.2 draft that prevents the GALEON community from addressing its standardization needs in the extension standard(s) for CF-neCDF (with an OPeNDAP option either as a part of the CF-netCDF extension or as a separate extension standard built on the CF-netCDF foundation).  The second issue that came up in the email discussion is more specific: namely whether having the WCS 1.2 core include only 2D coverages is an obstacle to the service of 4D FES (metocean) data to the traditional GIS community.  In this regard, it has been noted that, while the current draft WCS 1.2 core spec allows clients to be compliant even though they cannot work with 3D or 4D datasets, forcing those clients to deal with a 4D WCS would not mean that they would be able to analyze and display those datasets.  On the other hand, any client developer whose user community is interest in GALEON 4D datasets would implement the CF-netCDF extension.  I plan to start a separate email discussion on this topic to give people a chance to respond to my obvious prejudice.

4. The Need for Catalog Services. 
This was really only touched on in the discussion thread, but I think it warrants an item in this list because the discussion of a wider variety of data types along with the possibility of multiple access protocols (WMS, WCS, SOS) places additional emphasis on the importance of standards-based catalog services (CS-W) which was one of the issues that the GALEON phase 1 brought out.    Getting CS-W discovery systems working together with WCS (or other OGC) access systems is a substantial interoperability challenge when the clients and servers are developed independently.   This item is mainly just a reminder that more thought and experimentation is needed in this area.

5. Collect a set of GALEON use cases to guide the evolution of relevant OGC standards.  This is being done on the GALEON wiki.

ESRI User Meeting Highlights

There was a major emphasis on "climate change" workshops, training sessions, general presentations, etc. this year.  But nearly all of  this turned out to have a different focus than what I had anticipated.  While there were a few sessions on the climate research and education topics, the vast majority of the discussion was about approaches to reducing and mitigating the effects of CO2 generation.   Some topics were at least vaguely related to data systems of the sort we work with, e.g., climate data for wind farm siting or solar radiation data for solar energy collection.   But the vast majority of the discussion was about demographic, infrastructure, land use, biological, governmental zoning, transportation systems and other such geographic data systems.     There were some surprising and fascinating ideas like planning tranportation systems to encourage fewer left turns and more right turns.

There were several talks related to the NWS NDFD (National Digital Forecast Database) which apparently is heavily used in the GIS world  One interesting item is that the NDFD is moving to a 2.5 km resolution for their data.  It was not clear  (even after a few questions) exactly how they get to forecasts with that fine a resolution, but it seems to depend a lot on human input. Another significant change is that watches and warnings are now for polygon delimited areas rather than by county which was always very inaccurate.   The most directly relevant presentation for us was by Eoin Howlett who reported on his work with Roy Mendohlsson of Pacific Fisheries Environmental Labs.  This system enables access to  THREDDS/OPeNDAP services directly from within the ESRI arcGIS  applications.  In essence, this EDC (Environmental Data Connector) makes ESRI products OpeNDAP clients, using the supplied Python scripting language in version 9 of arcGIS, and the Pydap library.

There were also quite a few interactions with the ESRI staff involved in implementing OGC/ISO interfaces.  In particular, arcGIS 9.3 has a plug in for access to OGC CS-W (Catalog Services for the Web) that should be able to connect to the CS-W service that GMU has created for enabling access via standard protocols to THREDDS catalogs. Teddy Matinde of ESRI and I has some real success accessing the GMU CS-W server for THREDDS datasets.  The search system turned up a number of North American Model datasets on motherlode when we looked for "NAM".  However, for the search "hits" we got, there was not a direct pointer to the WCS access point so we were not able to access the data after the successful searches.  To access the dataset via WCS, the ESRI client needs a URL pointing directly to the WCS service.

The key to making this happen at all is the intstallation of the GIS Portal Toolkit plugin into arcGIS.   In a follow up after the meetings, I finally got this toolkit installed properly on my own computer and was able to do the same sort of searches we did at the meetings.  Working with our George Mason partners, we discovered that, although the arcGIS search interface looks like a simple free text search (ala Google), it really only searches the titles of datasets, so it does not find anything if the search term is in a field other than the title.  

AccessData Workshop

This year's AccessData Workshop was hosted by Unidata.  It was held downtown in Portland, Oregon,  the first one in an urban setting rather than a resort of retreat type of venue and Tina did a masterful job organizing it.  The total number of attendees was 57, with 8 of them from UCAR.  The AccessData (originally called DLESE Data Services) workshops provide an opportunity for data providers, software tool specialists, scientific researchers, curriculum developers, and educators to interact with one another in a variety of sessions, all working toward the common goal of facilitating the use of data in education.  In addition to keynote presentations, hands-on lab sessions (Tool Time) and a Demo Session/Share Fair, attendees are grouped into teams that include the full range of roles represented at the meeting. Team members   work together to develop an educational module, drawing upon the expertise of individuals in each role on the team. This practical exercise enables team members to learn from each other about the needs, practices, and expectations of the other groups.  Jeff Weber enthralled the participant with  one of the keynote presentations entitled "Data, data everywhere, but not a bit to display."   He also  provided a tool time session on the use of the Unidata IDV.


For more details on the workshop, there's a web page at: http://serc.carleton.edu/usingdata/accessdata/index.html

ACCESS Geoscience Project Extension

Our NASA ACCESS Geoscience grant was extended with a modest addition of funds to carry through this calendar year.  As noted above, there has been (limited) success with the arcGIS client finding datasets on THREDDS servers via the standard CS-W interface on the George Mason server which harvests metadata from THREDDS sites.  In adddition, we've been successful with the Gi-GO client from the University of Florence.  While arcGIS only searches the Title field, Gi-GO searches both Title and Abstract so it finds more THREDDS datasets.  But neither of them searches in other fields such as the list of variables in the datasets.  Hence a search for "vorticity" would not get any hits unless that word shows up in the Title or Abstract. What we are learning, however, is that building interoperable data search systems that are practical and useful ain't easy.  These experiments appear to be the first where the client and server have been developed independently that results in searches that are successful at all.  As noted, one dificulty is that the user interface on the clients tends to be a simple free text search, but the underlying server capability is more of a precise database query system for which one has to specify which field(s) are being searched.  This mismatch causes many of the difficulties.  The granularity of the searched objects is another key issue.  THREDDS servers support a hierarchy of catalogs of catalogs, so, if one finds a high level catalog, it must be possible to "drill down" to individual datasets, but the clients we are working with do not have this capability yet.  We are working with the Gi-GO team to make it work for their client.  This would mean we could use the Gi-GO client to find datasets on THREDDS servers via the CS-W protocol and then download the dataset via the standard WCS protocol.