THREDDS Technical status
report
Overview
THREDDS fundamentally
provides middleware services to bridge the gap between data providers and data
consumers. This has been done by developing
tools and services for both data providers and consumers, as well as services
that sit between providers and consumers. We are also involved in developing
and enhancing some of the underlying data access software tools, libraries and
protocols themselves, as well as influencing how data providers and clients use
them.
Accomplishments for Current Funding Cycle
Dataset Inventory
Catalogs are XML documents that
allow a data provider to simply list available on-line datasets. The catalog
creator can group datasets into a simple hierarchical classification scheme, which
makes a catalog into a “logical data directory”. At a minimum, the catalog
specifies the “human readable” dataset name, and how to access it. The catalog also provides a place to add
arbitrary metadata about the dataset. We are focusing on enhancing selected
datasets by adding space and time bounding boxes, standard names, and data type
information. Catalogs can be static XML files, or dynamically generated by Web
servers to track continuously changing datasets.
THREDDS Servers are data servers that have Dataset Inventory Catalogs
associated with them. The primary focus of THREDDS has been developing these
servers in collaboration with our data provider partners. Current servers include
ones at IRI/LDEO (
The THREDDS/IDD Server
makes much of the real-time data coming in on the Unidata IDD available on a
THREDDS server. This includes the NCEP model data, satellite data from NOAAPORT
and the Unidata/Wisconsin data streams, NEXRAD Radar, Profiler data from
NOAA/FSL, as well as METAR, upper air, buoy, SAO and SHEF hydrology station
data. The THREDDS/IDD Server will become part of an enhanced LDM that will be
available to the Unidata community of 150 IDD users.
We have worked extensively
with OpenDAP/DODS developers, and the next version of
OpenDAP servers will have
integrated THREDDS Catalogs. We have also developed the THREDDS OpenDAP Aggregation Server which is an OpenDAP data server that aggregates OpenDAP
datasets, as well as serving netCDF datasets, and has THREDDS catalogs already
integrated. This means that the next
generation of OpenDAP servers will automatically be
THREDDS servers. The Live Access
Server from NOAA/PMEL is a Web server that provides access and
visualization of scientific data. It is currently being modified to provide
THREDDS catalogs for its data.
Another key THREDDS component
for data providers is the Catalog Generator, which scans file
directories and generates THREDDS catalogs automatically. This is a highly
configurable tool that gives users control over the arrangement and naming of
their datasets, adding metadata, extracting information from the datasets, etc.
The Catalog Validator provides XML and
semantic validation of Catalogs, as well as verification of the datasets
themselves.
The ADDE Cataloger
is a middleware service that constructs Catalogs for ADDE/Mcidas
data servers. It provides “virtual dataset” services, for example, a dataset
named “latest” or “last 3 hours”, along with a resolver
service to translate avirtual dataset into a list of actual
datasets available on the ADDE server. This level of indirection is
important for realtime and very large datasets, in
order to provide users with the ability to choose datasets of the right
granularity.
Dataset Query
Capability XML documents are used
by middleware services such as the ADDE Cataloger and the THREDDS/IDD Server to
specify in a succinct way what datasets are available from a data server. These
allow data providers to specify the set of orthogonal choices (for example:
station, field, time) that an end-user should make to select from a large
and/or real-time collection of datasets. It allows data clients to know how to
present appropriate choices to their users in a user interface, without knowing
anything specific about the server.
Catalogs are read by the Dataset
Searcher, which provides a programmatic interface for searching by
space and time bounding boxes, standard names, data type and server type.
People can also search for datasets through a web interface. This is a prototype
system that will be developed further in the future.
The THREDDS Dataset
Exporter creates “resource records” appropriate to add to Digital
Libraries such as DLESE, NSDL and GCMD. This prototype system uses special
metadata records that are added to the datasets in a catalog, which specify the
additional information needed by the DL, such as Dublin Core or DIF formats.
The Dataset Exporter uses the Open Archives Initiative (OAI) protocol to send
these records into the DLESE and NSDL databases. (Q: How does it send into
GCMD?)
THREDDS clients are application programs that know how to read
THREDDS Catalogs and know how to read data using some or all of the THREDDS
data server types, such as OpenDAP, ADDE, netCDF,
etc. The Integrated Data Viewer
(IDV), also developed at Unidata, is a full featured analysis program
capable of advanced 3D visualization based on the VisAD
library. VGEE is an educational content development system build
on top of the IDV. New Media Studios is another educational
content development framework which uses Macromedia Director and IDL, and is
now in the process of being made THREDDS capable. The THREDDS Data Viewer
is a tool for debugging data servers and prototyping client software, using the
Java client library user interface components and catalog and
data access APIs. We expect to use this library to THREDDS-enable the OpenDAP Data Connector software
and other Java clients.
A key to successful use of
scientific datasets is providing use metadata, especially georeferencing
metadata, which allows client software to manipulate and visualize datasets,
and to overlay and compare data from different sources. We have helped develop
and promulgate georeferencing metadata conventions for netCDF datasets, such as
the CF Conventions for model data. We have also developed extensions to
the netCDF data model and implemented libraries which automatically
recognize and extract georeferencing information in many of the important
netCDF and OpenDAP datasets.
We have also developed
extensions to the Netcdf Markup
Language (NcML) that allows metadata to be
added, deleted or changed in netCDF and OpenDAP
datasets, as well as to subset or aggregate netCDF files. This capability has
been added to the OpenDAP aggregation server,
providing a powerful tool for 3rd party metadata augmentation, which
is in addition to the ability to add metadata into the Inventory Catalogs.