(DRAFT)

THREDDS Technical Task Force Workshop Summary

6-8 May 2002

Boulder, Colorado

Workshop participants were asked to provide a one-paragraph description of affiliation, and of related THREDDS activities, prior to the workshop. These were valuable to all of the participants because of the diversity of the THREDDS partners. During the course of the workshop, time was allocated for the partners to present thoughts on how their system could be integrated into THREDDS. This consisted of five-minute presentations by each participant.

Agenda

Opening Remarks (Domenico)

Workshop Goals:

Develop a concrete plan for Unidata and partners for the project
Clarify what has been accomplished thus far
Provide an opportunity for each group to briefly describe how their projects fit into THREDDS
Identify problem areas and unresolved issues
Set priorities
Overview Digital Library Connections

Workshop Themes:

Data provider tools
- Provider centers offer data in: Real time; archived; third party
Application developer tools
Discovery Centers/Digital Libraries
Metadata issues
Open GIS in THREDDS

John Caron: Technical Summary

Participants at the workshop include:

Providers/producers: Archived; Realtime (unique); 3rd-party collections. (NOAA/NGDC; NOAA/PMEL; ARM/Argonne, UAH/ITSC; CRAFT/CAPS; UnivOK; FNMOC/GODAE; Lamont/IRI; NOAA/NOMADS; UnivWiscMadison/SSEC; NOAA/CDC; NCAR)
Data Clients: software to read the data and documents to access it (LAS; ESMI; INGRID; MetApps)
Data Centers: Discovery center/Digital Library (GCMD; DLESE; NSDL)
Technologies: ESML; COARDS NetCDF, DODS/GrADS; ADDE, IDL

Data providers have collections of datasets and are willing to make them available on-line. Clients are software that accesses the data.. Discovery centers provide browse and search services for multiple data collections. Third-party providers create logical dataset collections and additional metadata.

Types of data: archived (static catalogs); realtime (catalogs polled/notify); or dynamically generated by request.

THREDDS' present technology focus is on acquiring real data (not just pictures of data), creating a framework for loosely coupled systems,developing "human in the loop" automation tools, and metadata standards. Future development will include making choices about communications mechanisms. Phase One development, which is drawing to a close, has focussed on data catalogs creation. Developers want feedback from providers using the tools presently in place.

Granularity issues at the catalog level affect the number and size of catalogs and how they are included in Discovery Centers. Ctalog updating frequency has also been an area of concern.

Phase Two will focus on catalog servers and augmented metadata for discovery centers. Phase Three's focus will be data semantics, tools that allow data classification, and creating a collaborative "knowledge building environment" (KBE).

Some of the issues facing developers include:

Real-time versus archived data (seamless transition)
Quality Control
URL permanence / data permanence
THREDDS' role vis-a-vis data models
Use of databases at data provider sites
Granularity from the client viewpoint
Interoperability with existing Metadata descriptions
Relationship between THREDDS standard quantities and existing systems controlled vocabularies
Communication mechanisms
Multiple "redundant" servers for certain products
Descriptive metadata or data model

Data Inventory Catalogs

The catalogs are hierarchical collections of datasets requiring minimal metadata to keep barriers to entry low.

There is no THREDDS data object model. THREDDS focuses on metadata. Other long-term technical goals are to use existing and emerging standards for efficient handling of large datasets keeping things as simple and clean as possible. THREDDS client software is in Java and eventually may be ported to C..

Ethan Davis: THREDDS Catalog Generator

Goal: to automate catalog generation as much as possible

Because catalog generation is tedious when more than a handful of datasets are involved, a THREDDS goal is to automate the generation as much as possible.

A first-generation catalog generator creating a Unidata model data catalog is currently running on UCAR computer "motherlode." While functional, it is difficult to maintain. Currently being developed is a Java application that scans local directories and can generate THREDDS catalogs or an aggregation server config file. It can also create catalogs from GrADS servers. Current weaknesses include: requires human setup, can only scan local file, and it does not "know" anything about data.

For the short term, plans are to expand the directives language, do some cleanup, and improve the handling of GDS 1.2 XML catalogs. Long-term plans include building a DODS server crawler, building a user interface (build XML input files, create additional metadata); and, determining how XML schemas will impact catalog generation efforts.

Robb Kambic: Dynamic Catalog Generator for NEXRAD Real-time Dataset

The problem of creating metadata for a real-time dataset is that the dataset is changing so rapidly the metadata represents the data inaccurately. To solve this problem, the Dynamic Catalog Generator is invoked on command to generate metadata by scanning directory structures in real time to create catalogs. One real-time dataset, the NEXRAD radar feed, generates 2.8 million products/week or about 5 products/sec. The Radar feed was used as a prototype to demonstrate that the Dynamic Catalog Generator could handle these kind of problems.

Other high volume real-time datasets are being considered as candidates for the Dynamic Catalog Generator. These datasets may present different problems from the Radar dataset, such as the METAR datasets that have reports embedded in bulletins. FSL's MADIS has reports in NetCDF files.

Discussion of Data Provider Issues

granularity is an issue for users
- in situ data

phased array
modis

structure of catalog: data provider vs data user model
- multiple models for users: for scientists; for educators, etc.
- encapsulate multiple models in metadata or in client
- hierarchy vs ontology (e.g., mass store needs one view, data provider needs another, data user needs another, ...)
- annotation
- ontology building/google indexing
- XML provides representation for communication - separate from backend storage
THREDDS needs to have something between the provider and the data consumer, e.g., middleware
- options could include converting user vew into data provider
- client is smart client (DODS)

Discussion of Discovery System

discovery system work should be coordinated with Digital Libraries (browsing approach is good)
should consider Knowledge Discovery, e.g., semantic indexing.

Application Developer Tools

Don Murray provided a short demonstration of the Integrated Dave Viewer from MetApps using THREDDS catalog services to access remote datasets
- use with Web Start, e.g., embedding applications for digital libraries, uses DODS 2 and 3D data
- to provide data for IDV required to have DODS server
- could create catalog, e.g., COARDS
THREDDS Java library (Caron)

Discussion:

Consider a command line client side for THREDDS catalog access (Ferret and IDL have this feature)
Third party ancillary information correct metadata in data file
Is the THREDDS protocol (SML, etc) a metadata representation of the data?

Discovery Centers

John Weatherly, DLESE, provided a brief overview of DLESE activities
- datasets available through DLESE focus on educational resource documents now
  - ready for direct data cataloging in 2-4 years
  - include metadata record for each resource
  - uses OAI transfer protocol pull method-provides means for harvesting metadata
- human catalogs
  - use XML
- want to integrate dataset tools
- NSDL provides annotation services (educators from K-College)
  - use ADEPT

Discussion:

discovery system work should be coordinated with Digital Libraries (browsing approach is good)
should consider Knowledge Discovery, e.g., semantic indexing
NASA/GCMD has over 11,000 datasets indexed, uses DIF; 4 tier parameter hierarchy has HTTP/RMI API for programmatic access
ESIP Federation has search tool based on Z39
Alexandria Gazeteer provides georeferenced datasets
need a crosswalk between FGDC and Dublin Core

Metadata, Open GIS/ISO standards - Stefano Nativi, Univ of Florence, provided an overview of standards

THREDDS is trying to integrate the workings of two communities: digital libraries and systems for geographic information (not just GIS). The two communities require different sets of metadata. How can THREDDS wed these two metadata requirements:
- contain both sets of metadata
- extend one set to include the other
- develop higher level metadata model that captures both DL and GI metadata
- can we convert between OpenGIS Web Catalog/Services and THREDDS catalogs

Breakout Groups

Random breakout groups formed to review topics and issues.

Breakout points

THREDDS Collaboration Tools - Chris Klaus

Following a brief demo of NSDL's WIKI site, the group agreed to try it out for collaboration purposes.

ACTION: Chris will follow up with the group with instructions for accessing the software.

All participants had the opportunity to articulate recommended steps to be taken to better integrate their projects with THREDDS and suggest what tasks they would recommend for the Unidata-THREDDS to pursue. These next steps are included in Participant Input and Discussion.

Conclusions from the Workshop:

THREDDS should focus on widespread adoption of catalogs. The Data Providers present were generally enthusiastic about getting their sites to use catalogs. Suggested improvements to catalog-generating tools were made.
THREDDS should come to agreement with DODS developers to adopt a common catalog format. A number of important suggestions were made, and further work will be done in the THREDDS mailgroup.
THREDDS should for now concentrate on metadata standards and discovery rather than data models. Use existing standards when possible.
Join or at least keep track of OpenGIS/ISO work.
Data Providers want flexible, mininal metatdata; Data Clients want maximal, standardized metatdata.
THREDDS will look to DLESE for cataloging educational resources, and to GCDM for dataset search and discovery.
LAS and GrADS could be important "thin-clients" for THREDDS.
Develop a "command-line tool" for use with non-Java applications.

Linda Miller - lmiller@unidata.ucar.edu
External Liaison, Unidata
University Corporation for Atmospheric Research
P.O. Box 3000
Boulder, CO 80307-3000
303 497-8646 fax: 303-497-8690