(DRAFT)
THREDDS Technical Task Force Workshop Summary
6-8 May 2002
Boulder, Colorado
Participants
Participant
Paragraphs
Workshop participants were asked to provide a one-paragraph
description of affiliation, and of related THREDDS activities, prior to the
workshop. These were valuable to all of the participants because of the diversity
of the THREDDS partners. During the course of the workshop, time was allocated
for the partners to present thoughts on how their system could be integrated
into THREDDS. This consisted of five-minute presentations by each participant.
Agenda
Opening Remarks (Domenico)
Workshop
Goals:
- Develop a concrete plan for Unidata and partners for the project
- Clarify what has been accomplished thus far
- Provide an opportunity for each group to briefly describe how their
projects fit into THREDDS
- Identify problem areas and unresolved issues
- Set priorities
- Overview Digital Library Connections
Workshop Themes:
- Data provider tools
- Provider centers offer data in: Real time; archived; third party
- Application developer tools
- Discovery Centers/Digital Libraries
- Metadata issues
- Open GIS in THREDDS
Participants at the workshop include:
- Providers/producers: Archived; Realtime (unique); 3rd-party collections.
(NOAA/NGDC; NOAA/PMEL; ARM/Argonne, UAH/ITSC; CRAFT/CAPS; UnivOK; FNMOC/GODAE;
Lamont/IRI; NOAA/NOMADS; UnivWiscMadison/SSEC; NOAA/CDC; NCAR)
- Data Clients: software to read the data and documents to access it
(LAS; ESMI; INGRID; MetApps)
- Data Centers: Discovery center/Digital Library (GCMD; DLESE; NSDL)
- Technologies: ESML; COARDS NetCDF, DODS/GrADS; ADDE, IDL
Data providers have collections of datasets and are willing to make them
available on-line. Clients are software that accesses the data.. Discovery
centers provide browse and search services for multiple data collections.
Third-party providers create logical dataset collections and additional metadata.
Types of data: archived (static catalogs); realtime (catalogs polled/notify);
or dynamically generated by request.
THREDDS' present technology focus is on acquiring real data (not just pictures
of data), creating a framework for loosely coupled systems,developing "human
in the loop" automation tools, and metadata standards. Future development
will include making choices about communications mechanisms. Phase One development,
which is drawing to a close, has focussed on data catalogs creation. Developers
want feedback from providers using the tools presently in place.
Granularity issues at the catalog level affect the number and size of catalogs
and how they are included in Discovery Centers. Ctalog updating frequency
has also been an area of concern.
Phase Two will focus on catalog servers and augmented metadata for discovery
centers. Phase Three's focus will be data semantics, tools that allow data
classification, and creating a collaborative "knowledge building environment"
(KBE).
Some of the issues facing developers include:
Data Inventory Catalogs
The catalogs are hierarchical collections of datasets requiring minimal metadata
to keep barriers to entry low.
There is no THREDDS data object model. THREDDS focuses on metadata. Other long-term
technical goals are to use existing and emerging standards for efficient handling
of large datasets keeping things as simple and clean as possible. THREDDS client
software is in Java and eventually may be ported to C..
Goal: to automate catalog generation as much as possible
Because catalog generation is tedious when more than a handful of datasets
are involved, a THREDDS goal is to automate the generation as much as possible.
A first-generation catalog generator creating a Unidata model data catalog
is currently running on UCAR computer "motherlode." While functional,
it is difficult to maintain. Currently being developed is a Java application
that scans local directories and can generate THREDDS catalogs or an aggregation
server config file. It can also create catalogs from GrADS servers. Current
weaknesses include: requires human setup, can only scan local file, and it does
not "know" anything about data.
For the short term, plans are to expand the directives language, do some cleanup,
and improve the handling of GDS 1.2 XML catalogs. Long-term plans include building
a DODS server crawler, building a user interface (build XML input files, create
additional metadata); and, determining how XML schemas will impact catalog generation
efforts.
The problem of creating metadata for a real-time dataset is that the dataset
is changing so rapidly the metadata represents the data inaccurately. To solve
this problem, the Dynamic Catalog Generator is invoked on command to generate
metadata by scanning directory structures in real time to create catalogs. One
real-time dataset, the NEXRAD radar feed, generates 2.8 million products/week
or about 5 products/sec. The Radar feed was used as a prototype to demonstrate
that the Dynamic Catalog Generator could handle these kind of problems.
Other high volume real-time datasets are being considered as candidates for
the Dynamic Catalog Generator. These datasets may present different problems
from the Radar dataset, such as the METAR datasets that have reports embedded
in bulletins. FSL's MADIS has reports in NetCDF files.
Discussion of Data Provider Issues
- granularity is an issue for users
- structure of catalog: data provider vs data user model
- multiple models for users: for scientists; for educators, etc.
- encapsulate multiple models in metadata or in client
- hierarchy vs ontology (e.g., mass store needs one view, data provider
needs another, data user needs another, ...)
- annotation
- ontology building/google indexing
- XML provides representation for communication - separate from backend
storage
- THREDDS needs to have something between the provider and the data consumer,
e.g., middleware
- options could include converting user vew into data provider
- client is smart client (DODS)
Discussion of Discovery System
- discovery system work should be coordinated with Digital Libraries (browsing
approach is good)
- should consider Knowledge Discovery, e.g., semantic indexing.
Application Developer Tools
- Don Murray provided a short demonstration of the Integrated Dave Viewer
from MetApps using THREDDS catalog services to access remote datasets
- use with Web Start, e.g., embedding applications for digital libraries,
uses DODS 2 and 3D data
- to provide data for IDV required to have DODS server
- could create catalog, e.g., COARDS
- THREDDS Java library (Caron)
Discussion:
- Consider a command line client side for THREDDS catalog access (Ferret and
IDL have this feature)
- Third party ancillary information correct metadata in data file
- Is the THREDDS protocol (SML, etc) a metadata representation of the data?
Discovery Centers
- John Weatherly, DLESE, provided a brief overview of DLESE activities
- datasets available through DLESE focus on educational resource documents
now
- ready for direct data cataloging in 2-4 years
- include metadata record for each resource
- uses OAI transfer protocol pull method-provides means for harvesting
metadata
- human catalogs
- want to integrate dataset tools
- NSDL provides annotation services (educators from K-College)
Discussion:
- discovery system work should be coordinated with Digital Libraries (browsing
approach is good)
- should consider Knowledge Discovery, e.g., semantic indexing
- NASA/GCMD has over 11,000 datasets indexed, uses DIF; 4 tier parameter hierarchy
has HTTP/RMI API for programmatic access
- ESIP Federation has search tool based on Z39
- Alexandria Gazeteer provides georeferenced datasets
- need a crosswalk between FGDC and Dublin Core
Metadata, Open GIS/ISO standards - Stefano Nativi, Univ of Florence,
provided an overview
of standards
- THREDDS is trying to integrate the workings of two communities: digital
libraries and systems for geographic information (not just GIS). The two communities
require different sets of metadata. How can THREDDS wed these two metadata
requirements:
- contain both sets of metadata
- extend one set to include the other
- develop higher level metadata model that captures both DL and GI metadata
- can we convert between OpenGIS Web Catalog/Services and THREDDS catalogs
Breakout Groups
Random breakout groups formed to review topics
and issues.
THREDDS Collaboration Tools - Chris Klaus
Following a brief demo of NSDL's WIKI site, the group agreed to try it out
for collaboration purposes.
ACTION: Chris will follow up with the group with instructions
for accessing the software.
All participants had the opportunity to articulate recommended steps to be
taken to better integrate their projects with THREDDS and suggest what tasks
they would recommend for the Unidata-THREDDS to pursue. These next steps are
included in Participant
Input and Discussion.
Conclusions from the Workshop:
- THREDDS should focus on widespread adoption of catalogs. The Data Providers
present were generally enthusiastic about getting their sites to use catalogs.
Suggested improvements to catalog-generating tools were made.
- THREDDS should come to agreement with DODS developers to adopt a common
catalog format. A number of important suggestions were made, and further work
will be done in the THREDDS mailgroup.
- THREDDS should for now concentrate on metadata standards and discovery rather
than data models. Use existing standards when possible.
- Join or at least keep track of OpenGIS/ISO work.
- Data Providers want flexible, mininal metatdata; Data Clients want maximal,
standardized metatdata.
- THREDDS will look to DLESE for cataloging educational resources, and to
GCDM for dataset search and discovery.
- LAS and GrADS could be important "thin-clients" for THREDDS.
- Develop a "command-line tool" for use with non-Java applications.
Linda Miller -
lmiller@unidata.ucar.edu
External Liaison, Unidata
University Corporation for Atmospheric Research
P.O. Box 3000
Boulder, CO 80307-3000
303 497-8646 fax: 303-497-8690