LEAD at Unidata
Overview
LEAD is a NSF Large ITR project involving nine institutions to create an integrated, scalable framework in which meteorological analysis tools, forecast models, and data repositories can operate as dynamically adaptive, on demand, grid-enabled systems. For more information see http://lead.ou.edu/.
The LEAD effort at Unidata includes:
Status Update
Unidata Workshop
July 13 was LEAD day at the Unidata User’s Workshop. The goal was to allow users to experience LEAD software and to also receive feedback from them. As LEAD has two orchestration systems, Siege and the Experiment Builder, preparation for the workshop involved testing of both systems to allow 25 simultaneous launches of steered WRF forecasts. This (and other workshop sessions) required staging a lab of 50 computers with internet access and other requisite software.
The day was the first large scale test of Siege and the second for the Experiment Builder. The computational results varied in that some runs failed at various stages (due mainly to the previously untested heavy load) while others completed successfully. Several system weaknesses were revealed and are now being addressed. Nonetheless, most users were pleased and impressed at the promise of launching steered WRF forecasts simply and easily. Users provided feedback via evaluation forms, a summary of which is in this workshop write up. See the following links for additional information:
Beta Test Program
One outcome of the workshop is the identification of a list of additional persons interested in being beta testers for LEAD. (Some testers have been previously identified at Unidata, Millersville, and OU.) Unidata is taking the lead on organizing a beta test program that would allow this group access to the software and provide support in return for their testing of the system and provision of feedback.
Unidata LEAD Test Bed Status
We are maintaining an archive of at least 120 days of each of the seven LEAD canonical datasets, and in some cases more. The archive also has 120 days of the remaining IDD feeds. In addition we are maintaining a smaller archive of ADAS and WRF steered products. The current data volume is close to thirteen terabytes. Most of this data is being backed up and a strategy is currently being implemented to handle the remainder. Data volumes have grown to the point of creating significant technical problems. We are working with the THREDDS groups to study these problems and come up with the best solution. This is an excellent test case for many of our technologies and strategies.
The test bed is using the latest TDS technologies, which includes the ability to work with native Grib files as well as RADAR levels II and III. This accomplishment facilitates comparisons of Mesoscale model runs to reality. TDS also provides for the ability to directly download a file via http, subset a dataset and download a CF convention netCDF file, catalog gridftp availability of files and provide a WCS interface to gridded data files. All these capabilities are useful and desirable in the LEAD context.
There are two TDS catalogs that provide access to the data. The top catalog provides complete access to all the data. The other catalog is the operational LEAD top catalog. This catalog does not yet include radar data as the volumes are too great for the current LEAD software and require some strategy for handling.
Most of the test bed hosts have been upgraded to a gigabit internet connection. This addresses a gridftp overload problem that was revealed at the LEAD Lab day at the Unidata workshop. The hardware has been purchased to upgrade the remaining machine.
The primary lead systems have been put on redundant power so data ingest and availability will not be affected if a circuit is lost.
Work has begun on a dedicated, high-performance computing cluster within the LEAD environment. This cluster has four processors at the moment, but will be expanded as jobs are migrated and hardware becomes available. The compute cluster will initially be used for ongoing work on the steered WRF capabilities. In the past, this experimental work led, among other things, to our ability to work with NCSA to create the Siege workflow solution demonstrated at the UPC Workshop. This new cluster will be used as the basis for evolving that capability. Additionally it will be available to the LEAD project for use in the portal based workflow solution.
In light of the lessons learned to date, the Unidata LEAD test bed is undergoing a review and is expected to be largely reorganized. As part of this effort, accompanying documentation is expected to be developed that describes the hardware, software, and operation of the systems involved.
TDR
The TDR can currently move data to a remote host, including third party transfers using gridftp and scp, via a web interface as well as programmatically. Preliminary metadata is being generated. Talks are underway to integrate the TDR with the LEAD myLEAD cataloging system. Another use case for the TDR is the IDV case study project, which will use the TDR as the case study archive. The requirements for these use cases are steering TDR development.
Crosswalk
The crosswalk was enhanced to generate file level metadata. Thus the crosswalk now provides temporal information and multiple access methods for each file. This is the version currently in use by the LEAD Geo Gui query tool provided by the LEAD portal.
In testing this enhancement, it was found that the crosswalk could not handle the volumes of individual radar data files. It will be necessary to address this problem.
Miscellaneous
The LEAD annual report, which was due at the end of July, required an assessment of the past year’s budget and progress as well as plans for the upcoming year.
Two abstracts from Unidata have been submitted to the next IIPS conference at the next annual AMS meeting. These are:
Stephano Nativi
from Italian National Research Council and the