The Unidata Local Data Manager (LDM) package provides a stable mechanism for the dissemination of meteorological data from various sources to the greater Unidata community over the Unidata Internet Data Distribution (IDD) network. The Unidata THREDDS and RAMADDA platforms provide simple methods to access these data. The Department of Meteorology at The Pennsylvania State University (Penn State) has participated as an IDD level-two relay data distribution site since 1998, as a top-level relay of CONDUIT data since 2010, and as a THREDDS/RAMADDA data provider since 2011.
Prior to 2011, our IDD relay service was provided by servers purchased in 2005, each with an Intel Xeon 3.0MHz processor and 8 GB of RAM. In addition to maintenance issues related to the machines' age, the existing servers were only marginally capable of meeting our current needs. Using funds provided by the 2011 Unidata Community Equipment Award program, a trio of quad-processor Xeon E5606 2.13 GHz 64-bit machines running the LDM and Linux Virtual Server (LVS) in a virtual machine configuration is now providing IDD relay service at Penn State. In addition, the storage capacity of our THREDDS/RAMADDA data service was expanded to make more historical Bufkit data available to the Unidata community.
Unidata's Role at Penn State and Elsewhere
Unidata's IDD feed, along with analysis/display software packages including GEMPAK/NAWIPS, IDV, McIDAS-X, and the upcoming AWIPS II are considered vital tools for research, instruction and outreach at Penn State. For example, the GEMPAK/NAWIPS suite of software allows our students to explore current and past weather scenarios as part of upper-level undergraduate meteorology courses. The GEMPAK/NAWIPS software is also used for the generation of graphics for the popular and publicly-available Penn State electronic map wall (e-Wall).
Paul Knight is the Pennsylvania State Climatologist and a senior lecturer in synoptic meteorology at Penn State. "In the state climate office," he says, "we heavily depend on the quality and reliability of the Unidata LDM to keep our data streams current so that research projects monitoring agricultural weather risks, such as pests and fungus can be tested in real-time. The suite of data that are available through Unidata are crucial to teaching new forecasting techniques to the three dozen students involved in the forecasting practicum each fall."
The Unidata LDM feed and GEMPAK/NAWIPS software is also used on our 36-panel electronic display wall in the Penn State Weather Station. (More information on the visualization wall is available here.) Additionally, Penn State serves as an IDD relay for 13 other sites outside the University, helping to keep meteorological data flowing through the university community.
Students, instructors, and faculty use real-time and archived data from the IDD for numerous research initiatives. As an example, researchers and students access historical Bufkit data via the THREDDS data service (http://tds.meteo.psu.edu:8080) or RAMADDA repository (http://tds.meteo.psu.edu:8080/repository). As the scope of products and the volume of data available via IDD increases, so does the importance of Unidata products to our educational, outreach, and research programs.
System Configuration and Rationale
The hardware upgrade made possible by the Unidata Community Equipment Grant consists of 3 servers each configured with an Intel Xeon 5606 quad-core processor running at 2.13 GHz, 24 GB of memory, and a 450 GB 10Krpm hard drive. In order to leverage the capability of these servers, we installed virtual host hypervisors and then virtual machines (VMs) on each physical server. The result is a high-availability Linux Virtual Server (LVS) environment.
In the LVS environment, a director virtual machine is used to field incoming LDM connection requests for data, and these requests are then passed on to "real" servers which then handle the actual data transfer (see Virtual Server via Direct Routing for information). In order to increase reliability, two directors are implemented in a failover configuration; if one director goes down, the other immediately takes over. In our virtual environment, the first and second physical servers are each configured with one director and two real servers. The third physical server is configured with two real servers. Each director VM is configured with 1 GB of dedicated memory to prevent swapping and a dedicated processor. Each real server has 9.5 GB of dedicated memory and one dedicated processor. The remaining 4 GB of memory and one processor is shared by the VMs as needed.
To populate data among the machines, one real server acts as the primary ingest machine obtaining all LDM ingest data from the Internet. The other real server instances obtain their feeds either from this primary ingest machine or from a peer on the same physical machine. Should the primary ingest machine malfunction, one real server is configured to act as a failover to replace the function of the primary ingest machine.
In operation, one director (the other director is not used unless the first one fails) fields all LDM connection requests and forwards them to a real server using an LVS load-balancing algorithm. As connections increase, each of the six virtual real servers on the three physical machines will be assigned a connection in a load-balanced manner, thus spreading-out the load between all the virtual real servers. If the first physical server with the active director fails, the second physical server's standby director takes over. The net loss is two virtual real servers handling data, but four real servers remain on the two remaining physical servers. The result is a high-availability system.
In our previous environment (without the use of virtualization), five machines were required to operate two directors and three real servers. In the new environment, only three machines are required, reducing costs for hardware. In addition, efficiencies found through the sharing of resources using an efficient hypervisor further leverage our hardware resources.
Caveats and Future Expansion
In the current configuration, we are attempting to take the most advantage of our hardware resources by configuring two directors and six real servers on three physical machines. The physical machines each have 24 GB of memory, so each real server (two per physical machine) has less than 12 GB of memory to work with. However, the LDM queue size must be about 14 GB or larger to maintain one hour of LDM data in the queue during peak data flow. Since the queue is larger than the available memory, some of it must be swapped out to disk. During normal operation disk swapping is not a problem, but if there were an extended network outage requiring downstream sites to reach back to the tail of the queue, the potential for thrashing and degraded behavior increases. If this were to actually occur in a problematic way, the configuration could be modified to allow for only one real server per physical machine, thus making nearly 24 GB of memory available for use. Alternately, the systems could be upgraded by adding a second processor and an additional 24 GB of memory. Unless this problem actually occurs, however, we will maintain the current leveraged state and monitor operation.
About the Bufkit Historical Data
With the addition of 12 TB of storage to our existing THREDDS/RAMADDA data server, we have been able to retrieve and restore Bufkit data back through 2007. Our archives extend back through 2004 and the process of restoring these data to the THREDDS/RAMADA server continues.
For more information on the Unidata Community Equipment Awards program, see the Equipment Awards page.