LEAD at Unidata

Overview

 

LEAD is a NSF Large ITR project involving nine institutions to create an integrated, scalable framework in which meteorological analysis tools, forecast models, and data repositories can operate as dynamically adaptive, on demand, grid-enabled systems.  For more information see http://lead.ou.edu/.

 

The LEAD effort at Unidata includes:

  • Maintenance of a test bed for software development and deployment, as well as data storage.  This includes:
    • Running automatically steered WRF jobs
    • The provision of a four month data archive for the LEAD seven canonical datasets (see Data Description for LEAD 7 Datasets)
    • Storage for data and other information generated via LEAD orchestrations
  • Development of the THREDDS Data Repository, a storage archive that integrates with the THREDDS Data Server to provide easy access to data, which includes:
    • Data movement into and out of an archive
    • Support for a variety of storage media, including mass storage
    • Generation and/or enhancement of metadata
  • Development and maintenance of  a crosswalk that translates THREDDS metadata into LEAD metadata
  • Installation and testing of existing assimilation packages and forecast models on the Unidata LEAD test bed as well as hosts at other institutions such as supercomputers at NCSA
  • Ensuring integration across relevant Unidata technologies: especially LDM, TDS, and IDV
  • Providing an interface between the Unidata community and LEAD and leveraging our community building skills to help LEAD to develop its own community
  • Providing expertise in successful software development and deployment to help LEAD succeed

 

 

 

Status Update August 25, 2006

 

Unidata Workshop

 

July 13 was LEAD day at the Unidata User’s Workshop.  The goal was to allow users to experience LEAD software and to also receive feedback from them.  As LEAD has two orchestration systems, Siege and the Experiment Builder, preparation for the workshop involved testing of both systems to allow 25 simultaneous launches of steered WRF forecasts.  This (and other workshop sessions) required staging a lab of 50 computers with internet access and other requisite software.

 

The day was the first large scale test of Siege and the second for the Experiment Builder.  The computational results varied in that some runs failed at various stages (due mainly to the previously untested heavy load) while others completed successfully.    Several system weaknesses were revealed and are now being addressed.  Nonetheless, most users were pleased and impressed at the promise of launching steered WRF forecasts simply and easily.  Users provided feedback via evaluation forms, a summary of which is in this workshop write up.    See the following links for additional information:

 

 

Beta Test Program

 

One outcome of the workshop is the identification of a list of additional persons interested in being beta testers for LEAD.  (Some testers have been previously identified at Unidata, Millersville, and OU.)   Unidata is taking the lead on organizing a beta test program that would allow this group access to the software and provide support in return for their testing of the system and provision of feedback.

 

Unidata LEAD Test Bed Status

 

We are maintaining an archive of at least 120 days of each of the seven LEAD canonical datasets, and in some cases more.  The archive also has 120 days of the remaining IDD feeds.  In addition we are maintaining a smaller archive of ADAS and WRF steered products.   The current data volume is close to thirteen terabytes.  Most of this data is being backed up and a strategy is currently being implemented to handle the remainder.  Data volumes have grown to the point of creating significant technical problems.  We are working with the THREDDS groups to study these problems and come up with the best solution.  This is an excellent test case for many of our technologies and strategies. 

 

The test bed is using the latest TDS technologies, which includes the ability to work with native Grib files as well as RADAR levels II and III.   This accomplishment facilitates comparisons of Mesoscale model runs to reality.  TDS also provides for the ability to directly download a file via http, subset a dataset and download a CF convention netCDF file, catalog gridftp availability of files and provide a WCS interface to gridded data files. All these capabilities are useful and desirable in the LEAD context.

 

There are two TDS catalogs that provide access to the data.  The top catalog provides complete access to all the data.  The other catalog is the operational LEAD top catalog.  This catalog does not yet include radar data as the volumes are too great for the current LEAD software and require some strategy for handling. 

 

Most of the test bed hosts have been upgraded to a gigabit internet connection.  This addresses a gridftp overload problem that was revealed at the LEAD Lab day at the Unidata workshop.  The hardware has been purchased to upgrade the remaining machine.

 

The primary lead systems have been put on redundant power so data ingest and availability will not be affected if a circuit is lost.

 

Work has begun on a dedicated, high-performance computing cluster within the LEAD environment.  This cluster has four processors at the moment, but will be expanded as jobs are migrated and hardware becomes available.  The compute cluster will initially be used for ongoing work on the steered WRF capabilities.  In the past, this experimental work led, among other things, to our ability to work with NCSA to create the Siege workflow solution demonstrated at the UPC Workshop.  This new cluster will be used as the basis for evolving that capability.  Additionally it will be available to the LEAD project for use in the portal based workflow solution.

 

In light of the lessons learned to date, the Unidata LEAD test bed is undergoing a review and is expected to be largely reorganized.  As part of this effort, accompanying documentation is expected to be developed that describes the hardware, software, and operation of the systems involved.

 

TDR

 

The TDR can currently move data to a remote host, including third party transfers using gridftp and scp, via a web interface as well as programmatically.   Preliminary metadata is being generated.  Talks are underway to integrate the TDR with the LEAD myLEAD cataloging system.  Another use case for the TDR is the IDV case study project, which will use the TDR as the case study archive.  The requirements for these use cases are steering TDR development.

 

Crosswalk

 

The crosswalk was enhanced to generate file level metadata.   Thus the crosswalk now provides temporal information and multiple access methods for each file.  This is the version currently in use by the LEAD Geo Gui query tool provided by the LEAD portal.

 

In testing this enhancement, it was found that the crosswalk could not handle the volumes of individual radar data files.  It will be necessary to address this problem.

 

Miscellaneous

 

The LEAD annual report, which was due at the end of July, required an assessment of the past year’s budget and progress as well as plans for the upcoming year.

 

Two abstracts from Unidata have been submitted to the next IIPS conference at the next annual AMS meeting.  These are:

 

  • “LEAD at the Unidata workshop: demonstrating democratization of NWP capabilities”,  by Tom Baltzer, Anne Wilson, Suresh Marru, Albert Rossi, Marcus Christie, Shawn Hampton, Dennis Gannon, Jay Alameda, Mohan Ramamurthy, and Kelvin Droegemeier.

 

  • “The THREDDS data repository (TDR) for storage of LEAD data and metadata”, by Anne Wilson, John Caron, and Tom Baltzer.

 

Stephano Nativi from Italian National Research Council and the University of Florence visited Unidata this summer.  He is interested in interoperability and using grid technology to implement his interoperability concepts.  For this reason he is very interested in LEAD and thus held extensive discussions with Unidata LEAD team members about this topic during his visit.