(DRAFT)

THREDDS Technical Task Force Workshop Summary

6-8 May 2002

Boulder, Colorado

 

Participants

Participant Paragraphs

Workshop participants were asked to provide a one-paragraph description of affiliation, and of related THREDDS activities, prior to the workshop. These were valuable to all of the participants because of the diversity of the THREDDS partners. During the course of the workshop, time was allocated for the partners to present thoughts on how their system could be integrated into THREDDS. This consisted of five-minute presentations by each participant.

Agenda

Opening Remarks (Domenico)

Workshop Goals:

  • Develop a concrete plan for Unidata and partners for the project
  • Clarify what has been accomplished thus far
  • Provide an opportunity for each group to briefly describe how their projects fit into THREDDS
  • Identify problem areas and unresolved issues
  • Set priorities
  • Overview Digital Library Connections

Workshop Themes:

  • Data provider tools
    • Provider centers offer data in: Real time; archived; third party
  • Application developer tools
  • Discovery Centers/Digital Libraries
  • Metadata issues
  • Open GIS in THREDDS

John Caron: Technical Summary

Participants at the workshop include:

Data providers have collections of datasets and are willing to make them available on-line. Clients are software that accesses the data.. Discovery centers provide browse and search services for multiple data collections. Third-party providers create logical dataset collections and additional metadata.

Types of data: archived (static catalogs); realtime (catalogs polled/notify); or dynamically generated by request.

THREDDS' present technology focus is on acquiring real data (not just pictures of data), creating a framework for loosely coupled systems,developing "human in the loop" automation tools, and metadata standards. Future development will include making choices about communications mechanisms. Phase One development, which is drawing to a close, has focussed on data catalogs creation. Developers want feedback from providers using the tools presently in place.

Granularity issues at the catalog level affect the number and size of catalogs and how they are included in Discovery Centers. Ctalog updating frequency has also been an area of concern.

Phase Two will focus on catalog servers and augmented metadata for discovery centers. Phase Three's focus will be data semantics, tools that allow data classification, and creating a collaborative "knowledge building environment" (KBE).

Some of the issues facing developers include:

Data Inventory Catalogs

The catalogs are hierarchical collections of datasets requiring minimal metadata to keep barriers to entry low.

There is no THREDDS data object model. THREDDS focuses on metadata. Other long-term technical goals are to use existing and emerging standards for efficient handling of large datasets keeping things as simple and clean as possible. THREDDS client software is in Java and eventually may be ported to C..

Ethan Davis: THREDDS Catalog Generator

Goal: to automate catalog generation as much as possible

Because catalog generation is tedious when more than a handful of datasets are involved, a THREDDS goal is to automate the generation as much as possible.

A first-generation catalog generator creating a Unidata model data catalog is currently running on UCAR computer "motherlode." While functional, it is difficult to maintain. Currently being developed is a Java application that scans local directories and can generate THREDDS catalogs or an aggregation server config file. It can also create catalogs from GrADS servers. Current weaknesses include: requires human setup, can only scan local file, and it does not "know" anything about data.

For the short term, plans are to expand the directives language, do some cleanup, and improve the handling of GDS 1.2 XML catalogs. Long-term plans include building a DODS server crawler, building a user interface (build XML input files, create additional metadata); and, determining how XML schemas will impact catalog generation efforts.

Robb Kambic: Dynamic Catalog Generator for NEXRAD Real-time Dataset

The problem of creating metadata for a real-time dataset is that the dataset is changing so rapidly the metadata represents the data inaccurately. To solve this problem, the Dynamic Catalog Generator is invoked on command to generate metadata by scanning directory structures in real time to create catalogs. One real-time dataset, the NEXRAD radar feed, generates 2.8 million products/week or about 5 products/sec. The Radar feed was used as a prototype to demonstrate that the Dynamic Catalog Generator could handle these kind of problems.

Other high volume real-time datasets are being considered as candidates for the Dynamic Catalog Generator. These datasets may present different problems from the Radar dataset, such as the METAR datasets that have reports embedded in bulletins. FSL's MADIS has reports in NetCDF files.

Discussion of Data Provider Issues

Discussion of Discovery System

Application Developer Tools

Discussion:

Discovery Centers

Discussion:

Metadata, Open GIS/ISO standards - Stefano Nativi, Univ of Florence, provided an overview of standards

Breakout Groups

Random breakout groups formed to review topics and issues.


THREDDS Collaboration Tools - Chris Klaus

Following a brief demo of NSDL's WIKI site, the group agreed to try it out for collaboration purposes.

ACTION: Chris will follow up with the group with instructions for accessing the software.


All participants had the opportunity to articulate recommended steps to be taken to better integrate their projects with THREDDS and suggest what tasks they would recommend for the Unidata-THREDDS to pursue. These next steps are included in Participant Input and Discussion.

Conclusions from the Workshop:


Linda Miller - lmiller@unidata.ucar.edu
External Liaison, Unidata
University Corporation for Atmospheric Research
P.O. Box 3000
Boulder, CO 80307-3000
303 497-8646 fax: 303-497-8690