LEAD 2 Year Site Review Held in July
During the first part of this summer the LEAD project was heavily
focused on the project's two year site review. This major review,
held at NCSA in July, involved presentations to NSF representatives and
a panel of reviewers.
Unidata's major efforts in preparation for this meeting included
- significant technical project management efforts, including
coordination of the overall LEAD effort for requirements and
architecture definition and prototype build out
- providing material for the Annual Report, discussing past
accomplishments and future plans
- soliciting input and providing functional requirements for the
Architecture and Implementation Plan document, and
- organizing and running meetings to support these efforts.
The panel seemed pleased with LEAD's efforts. Indeed, one outcome
is that NSF is providing additional funding for two students: one at
CAPS to study dynamically adaptive Numerical Weather
Prediction (NWP) and one at Indiana University to
study issues pertaining to streaming data. The panel recommended
strongly that LEAD focus on use of streaming data and dynamic
workflows.
Unidata LEAD Test Bed
The LEAD test bed is being built out to a 40 terabyte storage array
that will be a primary storage
repository for the LEAD effort and the Unidata Community, housing
large quantities of IDD data and LEAD generated data
products.
ADAS output in netcdf format is now being ftp'ed
to the test bed where it will be cataloged and made publicly
available. Additionally, the compute infrastructure is
being built out to
facilitate NWP
by running WRF and Data Assimilation by running the ARPS Data
Assimilation System (ADAS) on the test bed. Leveraging this effort, we are also working on getting ADAS to
assimilate IDD data for establishing the initial conditions for the WRF
runs occurring on the test bed.
Lastly,
we are working in collaboration with Millersville University, Howard
University
and the University of Alabama Huntsville to establish a 3 part ensemble
regional forecast using the Algorithm Development and Mining system
(ADAM) to
determine the location of the forecast, WRF running at MU, HU and UPC
and the
results being stored and cataloged on the UPC test bed.
Steered Forecasts
As a first
step toward dynamic adaptivity, regional forecasts are now being
generated 4
times daily using the Weather Research and Forecast (WRF)
running in multiprocessor
mode on the UPC LEAD test bed. The
location for the regional model runs is being steered in a dynamic
fashion
using the center latitude and longitude location provided in the IDD as
determined by an algorithm that processes NAM
precipitation forecasts and determines the location of highest 24 hour
cumulative
predicted precipitation. The results, along with parallel runs of
the Workstation
Eta are being served via OPeNDAP and cataloged using THREDDS. The
top
level THREDDS catalog is found at: http://lead.unidata.ucar.edu:8080/thredds/topcatalog.xml.
THREDDS Data Repository
LEAD orchestrations need a large, robust, and reliable storage back end
with speedy access in order to stage data and store both intermediate
and final results. Along similar lines, it became apparent
that the Unidata community could benefit from a storage repository that
allowed users to store and retrieve data that would otherwise be lost
due to scouring.
Towards this end, the Unidata LEAD team is designing and building the
THREDDS Data Repository (TDR).
The TDR is a modular framework for a repository that will
- locate storage
- move data to that storage
- generate a unique ID, a handle to the data
- register the data in a name resolver that maps data handles to
one or more physical locations
- generate metadata if none is provided
- crosswalk the metadata to another schema, if desired
- update one or more catalogs, if desired.
The goal of the framework is to support a variety of implementations of
the modules. This way we hope to provide good functionality for both
ends of the user spectrum: LEAD at one end, and a single Unidata
community site at the other. For example, we hope to be able to
support storage implemented via a mass storage system as well as a UNIX
disk on a local area network.
Unidata community users will be able to install this repository
on their local file system. It will use THREDDS catalogs to
support browsing and querying. Where possible it will use the
Common Data Model to retrieve data. We see this as a complement
to the recently released THREDDS Data Server (TDS).
TDR development is following an agile model. We will be making
frequent small releases. Initially the interface will support
three functions: putData, getDataURL (which returns a URL to the
data), and getData (which copies the data out of the repository to
another location). Later, the framework will be expanded
to handle aggregation and subsetting.
We are maintaining an evolving web page to describe the effort and also
to communicate with other LEAD team members:
http://www.unidata.ucar.edu/projects/LEAD/ThreddsDataRepository.html.
We are targeting the end of September to release Interation 1 of the
repository, a very simple implementation that will store and retrieve a
file to a UNIX disk, generate a unique ID, use a simple table as a name
resolver, copy existing THREDDS metadata to a THREDDS catalog for the
repository.
This will also involve development of code to crosswalk from the
THREDDS schema to the LEAD schema.
In order to interface with a variety of module implementations, this
effort also requires some degree of understanding of existing relevant
technology. Thus we are surveying technologies such as Storage
Resource Broker (SRB), Storage Resource Manager (SRM), Replica Locater
Service (RLS), and Data Replica Service (DRS) to understand their
functionality and interfaces. We are also working with NCSA to
understand their data moving application, Trebuchet, and to understand
issues involved in using a mass store system. Other technologies
will likely become known to us along the way.
Publications
Abstracts submitted to AMS:
Data Access and Storage in the LEAD
Cyberinfrastructure, by Anne Wilson, Doug Lindholm, and Tom
Baltzer
An Architecture for the LEAD Data
Repository
by Doug Lindholm, Anne Wilson, and Tom Baltzer
Toward dynamic adaptivity: steering
the WRF model on the Unidata LEAD test bed, by Tom Baltzer,
Steven R. Chiswell, Ben Domenico, and Mohan Ramamurthy.
EarlyLEAD: A Non-Grid Application of
LEAD Capability, by David Fitzgerald, Rahul Ramachandran, Ben
Domenico, Richard Clark, Thomas Baltzer, Sen Chiao, and Everette Joseph.
Abstracts submitted to AGU:
Storing, Browsing, Querying, and
Sharing Data: the THREDDS Data Repository (TDR)
by Anne Wilson, Doug Lindholm, and Tom Baltzer
Facilitating Interdisciplinary Geosciences and Societal Impacts Research
and Education via Dynamically Adaptive, Interoperable Data and Forecast Systems,
by Jeff Weber, Ben Domenico, Steve Chiswell, and Tom Baltzer