Climate Change research is becoming an ever more data intensive and oriented scientific activity. Petabytes of climate data are continuously produced, delivered, accessed, and processed by scientists and researchers at multiple sites at an international level. In this regard the Euro-Mediterranean Centre on Climate Change (CMCC) in Italy aims to study climate change issues at both global and regional (Mediterranean area) scales, from several points of view including numerical models, information and communication technology, and impact studies. The University of Salento is the Information and Communication Technologies (ICT) partner of CMCC; several courses at the University of Salento strongly address topics like data management, data mining, parallel computing, and distributed systems which are of interest for CMCC and represent the proper ICT foundations to work in the climate change area as a computational scientist. Moreover, several Bachelor, Master, and Ph.D. thesis are related to ICT topics in this challenging geoscience field.
In 2011, the University of Salento became the first European institution to receive a Unidata Community Equipment Award, which allowed us to establish a data management platform for climate change data related to the Mediterranean area at the High Performance Computing Laboratory at the University of Salento, in Lecce, Italy. The project has been carried out in close collaboration with the CMCC.
Student Impacts
Students attending the 2010-2011 "Advanced Data Management" course at the University of Salento learned through a series of seminars titled "Science data management", about the Common Data Model and the NetCDF Java and C libraries, as well as the ISO19115 and the ISO19139 standards.
The equipment purchased with this grant has helped to create a strong facility for the HPC Laboratory at the University of Salento, where students have been able to test and learn more about Unidata software like THREDDS, RAMADDA, IDV, NetCDF libraries. There are several educational and research benefits coming from this experience. In particular, the students:
- Worked on a real environment, starting from the setup of the hardware to the configuration of the services and the implementation and testing of the code.
- Learned a lot about tuning a VM-based environment with multiple machines, cores, services, and data.
- Deployed and managed a multiplexed configuration for the THREDDS installation, analyzing the performance benefits coming from such an architectural choice in terms of load balancing. They used an apache service (configured with the mod_proxy_ajp module) with several tomcat service instances running behind it to balance the load related to multiple incoming requests.
- Designed and implemented some simple software applications running on climate change datasets, exploiting the C and Java NetCDF libraries.
- Worked on real data, understanding in a concrete way the nature of these multidimensional datasets, as well as the concepts of variable, dimensions, metadata, etc.
Students were supported by CMCC personnel, who have a strong background in (i) Virtual Machine environments (e.g. based on the ESXI software), (ii) THREDDS software installation and configuration, and (iii) NetCDF format, libraries, and tools. This existing expertise allowed students to quickly set up the virtual machine environment as well as the THREDDS and RAMADDA services, and aided in learning more about the Unidata software both from end-user and administrator points of view.
Equipment Configuration
The hardware purchased with the Unidata equipment grant is an IBM x3630 M3 system equipped with a large amount of RAM, fast disk storage, and good processing power. In particular the server is configured with two Intel Xeon six core E5645 CPUs running at 2.40GHz, 24GB of RAM, and 8TB of disk storage in RAID 5 configuration. The server manages a Virtual Machine-based environment hosting climate change datasets provided by CMCC and running (multiplexed) THREDDS and RAMADDA installations.
A Dashboard system developed at CMCC and providing monitoring capabilities was also set up in the VM-based environment to monitor the deployed service instances.
Future Directions
The Unidata equipment is being used this year (2011-2012) too, with a stronger focus on the NetCDF C library and parallel data mining applications exploiting MPI and OpenMP. The multicore platform is being used to implement a parallel data access software for NetCDF files.
This year, the students of the new course on Advanced Data Management inherited the Unidata environment already set up during the last academic year. So they are focusing their attention much more on (i) designing and developing data-mining applications which analyze NetCDF files and (ii) understanding the NetCDF storage structure, format and libraries (C and Java).
This Unidata community equipment grant has been a great and successful experience. Overall the students were so excited and interested in working on such challenging climate data management topics with strong scientific tools, software and libraries.
Moreover, several Bachelor, Master, and Ph.D. thesis are related to ICT topics in this challenging geoscience field.
Posted by David on August 08, 2012 at 02:01 AM MDT #