Customer Solution Case Study
/ Scientists Extend Health-Related Pollution Research with High-Performance Computing
“Because of the speed we gained from using Windows HPC Server 2008, we were able to thoroughly explore sources of air pollutants in a much more fluid way than in the past.”
Dr. Mike Hannigan, Assistant Professor of Mechanical Engineering, University of Colorado at Boulder
University of Colorado at Boulder researchers in the Department of Mechanical Engineering are studying air pollution and correlated mortality rates. Their Python-based data-analysis tools were enabled for high-performance execution using IPython and Windows HPC Server 2008. Researchers chose Windows HPC Server expressly for its ease of use, which has helped them focus on speeding their research rather than struggling with unfamiliar technology.
This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.Document published June 2010
Business Needs
Undergraduate and graduate students in the Department of Mechanical Engineering at the University of Colorado at Boulder (CU-Boulder) engage in stimulating education and research projects that span a broad spectrum of modern applications.
One of the department’s research areas has to do with air pollution and public health. As part of the multi-disciplinary, multi-institution Denver Aerosol Sources and Health (DASH) study, researchers collected samples of fine particulate matter for five years and compared the time-series data to daily health data collected from nearby hospitals during the same period. The goal is to reveal correlations between specific pollutants and morbidity and mortality rates.
To ensure confidence in the study’s results by delving into a greater range of possibilities, the researchers needed a faster way to analyze their data. “We wanted to be able to look at this one-of-a-kind data set in lots of ways,” says Dr. Mike Hannigan, Assistant Professor of Mechanical Engineering at CU-Boulder. “We knew that, if we investigated many alternatives along the way, we’d be better able to determine the uncertainties in the results and come up with more conclusive findings.”
Solution
Dr. Hannigan’s team decided to take advantage of high-performance computing (HPC) to advance the DASH study. The department chose to use up to eight nodes, each with eight cores, of a Cray CX1 supercomputer running the Windows HPC Server 2008 operating system. “Developers are often reluctant to run our code in a distributed environment because those environments can be so daunting to deploy and use,” says Josh Hemann, Professional Research Assistant in the CU-Boulder Department of Mechanical Engineering and Statistical Advisor at Rogue Wave Software, a Microsoft Registered Partner. “But seeing that we’d be able to work in the Windows environment with Windows HPC Server 2008 definitely overcame my own and the rest of the research team’s hesitations. We’re Windows users—it’s what we’re accustomed to.”
To start, Hemann developed a set of data-analysis tools in Python using PyIMSL Studio, a key component of which was OpenMP-enabled neural network algorithms for pattern recognition. He also used IPython, an open source software project developed in part by Dr. Brian Granger, Assistant Professor of Physics at California Polytechnic State University.
IPython provides an enhanced interactive shell for the Python language. It also provides a framework for interactive parallel computing that is fully integrated with Windows HPC Server 2008. “Our goal is to make high-performance computing with IPython accessible to as many users as possible,” says Granger. “About half of our users rely on the Windows operating system. We sought to support them, plus open the door to new researchers because the Windows platform makes IPython easy and approachable.”
In January 2010, Granger and Hemann worked part-time for three weeks to parallelize the DASH application and configure the supercomputer cluster. “Everything went really smoothly,” notes Hemann. After running numerous computations using the Python data-analysis tools running on IPython and Windows HPC Server 2008, the CU-Boulder researchers have arrived at preliminary results that suggest a correlation between pollution from motor vehicles that are not combusting well and higher rates of hospitalization. The same appears to hold true for commercial diesel vehicles. “The U.S. Environmental Protection Agency will be able to look at our research results and determine the best way to control dangerous particulates to improve public health,” says Hannigan.
Benefits
CU-Boulder researchers use Windows HPC Server 2008 to accelerate their research and heighten their confidence in its results, all within a familiar environment.
Ease of Use
Using Windows HPC Server 2008 makes it possible for researchers to take advantage of HPC without a steep learning curve. “The Windows technology doesn’t distract us from what we’re trying to accomplish. We can write code, analyze data, and focus on the science—without being plagued by the details,” says Hemann.
System management, too, presented no barriers. “The cluster is easy to manage,” says Hemann. “I can start and cancel jobs; see which jobs were running on which nodes and how many resources are available; and take advantage of visualization tools, job-history reports, and so on.”
Faster Modeling, More Accurate Results
Researchers benefit from performance gains by using Windows HPC Server 2008 for critical portions of their work. “Although not all of the Python application is parallelized, the key analysis steps are, and they run extremely quickly,” says Hemann. “What took 30 minutes on a laptop computer takes only four minutes using the Cray cluster.”
That increased speed has given researchers the opportunity to conduct a more comprehensive investigation of pollution source and public health correlations. “As well as taking us much further in our analyses, our computing power also makes it easier to fully test the robustness of our research, which adds credibility to our results,” says Hannigan. “We wouldn’t have the same confidence in them without using Windows HPC Server 2008.”
New Avenues for Research
With tools such as IPython available for use with Windows HPC Server 2008, more researchers can take advantage of HPC. “Although many of us in the IPython community come from the Linux world, we consider Windows HPC Server 2008 an excellent means by which many types of new users can and will parallelize their software for larger projects,” says Granger. “Before, the parallel parts of IPython didn’t work very well in the Windows environment, but now we can say that it fully supports Windows HPC Server 2008, and that’s a game-changer for us.”
This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.Document published June 2010