The University of Colorado at Boulder's Cooperative Institute for Research in Environmental Sciences (CIRES) Climate Diagnostics Center (CDC) maintains a repository of climate datasets that is used daily by researchers and educators at CIRES and around the world. These datasets are being used to answer questions about the Earth's climate system, such as the cause and nature of extreme climate events like the 2010 Russian Heat Wave. We received funds from the 2011 Unidata Community Equipment Awards program to purchase a new server to enhance and expand our existing THREDDS Data Server (TDS) capabilities and establish a RAMADDA server at the CDC in order to provide end-to-end data services that facilitate research and education in the climate sciences.
Equipment Configuration
Using the Equipment Award funds, we purchased a Supermicro A+ Server 2022-URF 16-core server with 96GB of memory to run both TDS and RAMADDA. We chose to run the servers in virtual machines rather than purchasing separate hardware for each server. Using a virtual machine configuration provides extra security and allows us to better deal with changing CPU and memory requirements for the TDS and RAMADDA. If either server needs more resources, they can easily be allocated to the virtual machine without buying more hardware.
CDC had several objectives in acquiring the new equipment. The chief objective was to improve the performance of our existing TDS by moving it to its own server; the previous server was taxed by the load placed on it by the TDS and the other processes running on it. A secondary objective was to run the TDS using a newer version of the Apache Tomcat Java servlet technology. The previous server ran an older version of Tomcat with a known problem that caused the access URLs automatically generated by the TDS to be incorrect, and could not be upgraded. With the new server, we were able to move the TDS to its own virtual machine, which solved the performance issue, and upgrade to Tomcat version 6, which solved the URL problem. The new system was made operational on August 8, 2011 and is available at http://www.esrl.noaa.gov/psd/thredds/.
Wide Use
Since the upgrade, CDC is making approximately 33 Gbytes/month (on average) available via the TDS. Our most popular datasets are the NCEP/NCAR Reanalysis I, the Twentieth Century Reanalysis (V2), the NCEP/DOE Reanalysis 2, and the NASA GPCP Precipitation dataset. Investigators from institutions in over 30 countries use the data our TDS provides, with the biggest users being Woods Hole Oceanographic Institute, NOAA's PMEL, and the University of Amsterdam. Some institutions, including PMEL and the Integrated Climate Data Centre at Hamburg, Germany, make use of data from CDC in their Live Access Server (LAS) applications via TDS-supplied OPeNDAP URLs.
Another objective in acquiring the new equipment was to run a publicly accessible RAMADDA server locally at CDC. We have been running an internal RAMADDA server on the new hardware since Fall, 2011. For the publicly accessible server, we plan on running a read-only version of RAMADDA that shares the same database as the internal server starting in May 2012 (http://www.esrl.noaa.gov/psd/repository/). We use RAMADDA's access control features to limit access to some portions of the database to CDC users (for internal collaboration and datasets that cannot be distributed) while making other portions available for external viewing and access. We are in the process of acquiring an additional 22 TB of storage that will be used for sharing climate datasets and accessible through RAMADDA.
We also hope to create timeseries aggregations for datasets that — in order to avoid the need for extremely large files — are separated into multiple files by time. We have not yet been successful in doing this with the TDS, and we are working with Unidata to try to solve the problem. In the meantime, we have successfully implemented timeseries aggregation of several of our datasets on our RAMADDA server.
Future Plans
Overall, we believe that given the number of datasets CDC makes available, improved OPeNDAP access has made the files more useful. (Our access statistics appear to support this view.) In addition to continuing work on the timeseries aggregation issue, our future plans include the creation of an improved catalog with more human readable dataset names and making additional types of data (such as satellite and time series) available via OPeNDAP. Additional services will be added to RAMADDA to provide interactive climate data analysis capabilities. Don Murray will do a presentation on access to climate data using RAMADDA and IDV at the upcoming 2012 Unidata Users Workshop in July.
For more information on the Unidata Community Equipment Awards program, see the Equipment Awards page.