LEAD at Unidata
Overview
LEAD is a NSF Large ITR project involving nine institutions to create an integrated, scalable framework in which meteorological analysis tools, forecast models, and data repositories can operate as dynamically adaptive, on demand, grid-enabled systems. For more information see http://portal.leadproject.org/.
The two key goals for LEAD are:
1) To democratize the availability of advanced weather technologies for research and education, lowering the barrier to entry, empowering application in a grid context, increasing the realism of how technologies are applied and facilitating rapid understanding, experiment design and execution of complex end-to-end weather analysis and prediction systems.
2) To improve our understanding of and ability to detect, analyze and predict mesoscale atmospheric phenomena by interacting with the weather in a dynamically adaptive manner.
The LEAD effort at Unidata includes:
Status Update
Unidata Policy Committee Meeting
Near Term LEAD Goals
LEAD has identified two aggressive goals for the spring of 2007. The first goal is to provide support for the WxChallenge Collegiate Forecast Contest, a collegiate weather forecasting competition. The second goal is to support the CAPS Spring Experiment which itself has three primary thrusts. The first thrust involves launching ensemble forecasts to study areas of deep convection. Ensemble forecasts allow for specifying uncertainty in model initial conditions and quantifying uncertainty in model output. The second thrust provides for dynamic forecasts that are triggered by the receipt of tornado watches and warnings. Finally, using the LEAD portal, forecasters will be able to determine domains for and launch forecasts on demand.
Beta Users Program
Unidata LEAD Test Bed Status
We are maintaining a rolloing archive of at least 120 days of each of the seven LEAD canonical datasets, and in some cases more please see Data Description for LEAD 7 Datasets. The archive also has at least 120 days of the remaining IDD feeds. In addition we are maintaining a smaller archive of ADAS and steered WRF model output. The current data volume is close to 24 terabytes. This volume of data is intractable for back up, so it is at risk for loss. Data volumes continue to create technical problems. The UPC LEAD team is working with the THREDDS group to investigate these problems and come up with the best solution for these technical problems. Given the data volumes and complexities, the UPC LEAD testbed has been an excellent test case for studying scalability aspects of many of Unidata's technologies and strategies.
The test bed is using the latest TDS technologies, which includes the ability to work with native GRIB files as well as RADAR levels II and III formats. This accomplishment facilitates comparisons of WRF model predictions to RADAR observations. The TDS also provides for the ability to directly download a file via http, subset a dataset and download a CF convention netCDF file (a feature we are encouraging our colleagues in LEAD to make greater use of), catalog gridftp availability of files and provide a WCS interface to gridded data files. All these capabilities are useful and desirable in the LEAD context. The UPC test bed is integrated into the LEAD workflow system to provide initial and boundary conditions for real-time and retrospective steered WRF predictions. Until recently it was also being used to store model output. Our partners at Indiana University have set up a TDS on their system for storage of these as well as intermediate products used by the workflow system.
There are two TDS catalogs that provide access to the data. The primary catalog provides complete access to all the IDD data. The other catalog is the operational LEAD top catalog. This catalog does not yet include radar data as the volumes are too great for the current LEAD software and require some strategy for handling. This is being worked on by the LEAD team.
Recently, all of the test bed nodes were upgraded to a gigabit internet connection. This addresses a gridftp overload problem that was revealed at the LEAD Lab day at the Unidata workshop last summer.
TDR
Synopsis
The THREDDS Data Repository (TDR) is a repository space to store data and other items and their associated metadata. Users can upload data and metadata to the repository. The TDR will locate space to put the data, move the data into the repository, and generate catalogs containing both externally provided and internally generated metadata.
The TDR is integrated with the THREDDS Data Server (TDS), so all TDS functionality for serving data is available for items stored in the repository. The TDR complements TDS technology by providing a means to populate a repository of data that can be served via the TDS. TDR requirements are influencing TDS design and development by providing new use cases involving dynamically generated catalogs and catalog "editing" capabilities to support maintenance of catalogs and metadata.
Also, like the TDS, while the UPC does provide a TDR for use by designated projects, the TDR is intended to be deployed by other institutions so that they may create and administer their own repositories.
TDR Use Cases
At this time TDR development is being steered by two use cases. The
Next Generation Case Study project
is a case study repository in which archive designers interactively arrange and store items related to a case study, such as data, notes, images, IDV bundles, etc., and make these studies available to their community.
Also, the LEAD project needs storage and access for items relevant to a user's experiment. The latter includes items involved in running an orchestration, such as input, output, and intermediate files, but also includes items that a user wants to publish.
These use cases have in common the need for a repository space that: provides data storage, can be structured by the client, provides integrated metadata management, and can serve the data. The Current State of the TDR In the Unidata TDR deployment, the repository is subdivided on a per project basis, currently the Next Generation Case Study project and the LEAD project. Each project has a separate partition in the repository space. Each project also has a different front end to the repository in the form of a servlet interface. Within a project, clients can store data and create catalogs in a hierarchically structured manner of their own design. A client can add or remove nodes within their space. The Next Generation Case Study project requires an interactive interface. Users communicate with the server via a web input form. This form provides a means to specify a data source, enter metadata, and also provide information about structuring the storage space. Users can add or delete nodes in this space via this input form. Once stored, the data is browsable and retrievable via the TDS.
The LEAD orchestration system is based on a Service Oriented Architecture (SOA). Thus the LEAD interface to the TDR must provide a Web API. Early versions will provide a simple http interface, but later versions will likely need to provide a SOAP interface and a WSDL service description. This interface is under development. More details about its design are given below.
Features of the TDR include: A prototype of this server is available on the LEAD test bed. TDR Next Tasks
Tomcat security will be implemented to authenticate and authorize repository writers. This will prevent unauthorized users from uploading material to the server. There are no current plans to authenticate readers.
The LEAD interface to the TDR will be expanded. A prototype client will be built in order to explore and test this programmatic interface. LEAD inputs to the server will include metadata generated by the orchestration system, plus user information such as certificates required to perform Grid operations. The TDR may generate additional metadata. The TDR will return a handle to the data that the orchestration system can use for later access. The orchestration system could then query the TDR for data access options, for example choosing gridFTP retrieval. Crosswalk The THREDDS to LEAD crosswalk has been updated to generate valid LEAD metadata and continues to serve to provide LEAD metadata for the community datasets offered by and used in LEAD.
Discussions have occurred regarding updating of the crosswalk in order to handle large data volumes such as radar data. A simple data and host specific solution has been outlined that will provide updates to continuously maintained list of available radar data. The development of this software is a necessary step in the integration of radar data into LEAD. LEAD Target Audience
The LEAD team held it smi-annual All Hands meeting in San Antonio, TX this January, in conjunction with the AMS Annual Meeting. At that meeting, the team agreed that its best target audience will initially be undergraduate and early graduate students in meteorology and their professors. This is the primary community that has the most to gain from the capabilities LEAD is creating at this time and that has caught on to the promise of LEAD as demonstrated by the comments received at the Triennial Unidata Workshop. Presentations
Demonstrations of LEAD capabilities were given in the UCAR Office of Programs booth and two papers from Unidata was presented by UPC staff at the American Geophysical Union (AGU) Fall meeting in San Francisco, CA.
Visitors Mark Govette of NOAA's Earth Systems Research Laboratory visited Unidata in advance of the External Advisory Panel meeting on which he sat in place of Steven Koch. Valentine Anantharaj from the GeoResources Institute at Mississippi State visited Unidata for discussions about LEAD and Grid computing initiatives as well as to attend the Unidata Workshop on TDS and NetCDF-Java.