15 Lower Hope Road · Rosebank · 7700 · South Africa
Tel: +27 21 658 2740 · Direct: +27 21 658 2756
Advanced Computer Engineering Laboratory
Parallel Computer Architecture and Programming
Dates:7-11th September, 2009
Venue:CHPC, 15 Lower Hope St, Rosebank, 7700.
Course Description
The principles and tradeoffs in the design and programming of parallel computers. Topics include the varieties of parallelism in current hardware (e.g., fast networks, multicore, accelerators such as GPUs), the importance of locality, implicit vs. explicit parallelism, shared memory, cache-coherence, synchronization mechanisms (locking, atomicity, transactions, barriers), and parallel programming models (threads, data parallel/streaming, transactions, and nested parallelism). Recent ressearch results from the Pervasive Parallelism Lab will also be presented. A significant parallel programming assignment will be given as homework.
The course will include practical tasks, and a larger project to be completed after the course. Tutoring will be provided by the CHPC ACELab staff.
Course Prerequisites
Working knowledge of C/C++
Suggested Text
John L. Hennessy and David A. Patterson Computer Architecture: A Quantitative Approach, 4th Edition Morgan-Kaufmann
Research papers
Instructor
Kunle Olukotun
Professor
Department Electrical Engineering and Computer Science
Stanford University
Director
Pervasive Parallelism Laboratory
Contact Information
Office: Gates Hall 3A, Room 302
Phone: (650) 725-3713
Fax: (650) 725-6949
Email:
Address:
Department of Electrical Engineering
Stanford University
Gates Hall 3A, Room 302
Stanford, CA 94305-9030 USA
Assistant:
Darlene Hadding
Administrative Associate for Professor Kunle Olukotun
Gates 4A-408, M/C 9040
Phone: (650) 723-1430
Fax: (650) 725-6949
Short Bio
Kunle Olukotun is a Professor of Electrical Engineering and Computer Science at Stanford University where he has been on the faculty since 1992. Olukotun has been a researcher in and proponent of chip multiprocessor technology since the mid 1990's. Olukotun is well known for leading the Stanford Hydra research project which developed one of the first chip multiprocessors with support for thread-level speculation (TLS). Olukotun founded Afara Websystems to develop high-throughput, low power server systems with chip multiprocessor technology. Afara was acquired by Sun Microsystems; the Afara microprocessor technology, called Niagara, is the basis of systems that have become one of Sun's fastest ramping products ever. Olukotun is actively involved in research in computer architecture, parallel programming environments and scalable parallel systems. Olukotun currently directs the Pervasive Parallelism Lab (PPL) which seeks to proliferate the use of parallelism in all application areas.
Olukotun is an ACM Fellow and IEEE Fellow. He has authored many papers on CMP design and parallel software and recently completed a book on CMP architecture. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.
Tentative Course Schedule
Date / Lecture / Subject / ReadingSept 7 AM / 1 / Introduction, course overview / 4.1, [1][6]
Sept 7 PM / 2 / Parallel Programming
Sept 8 AM / 3 / Parallel Algorithms
Sept 8 PM / 4 / Performance Evaluation / [2][3]
Sept 9 AM / 5 / Symmetric Shared Memory I / 4.2, 4.3
Sept 9 PM / 6 / Symmetric Shared Memory II
Sept 10 AM / 7 / Synchronization and Consistency / 4.5, 4.6, [4] [5]
Sept 10 PM / 8 / CMPs, GPUs / [8][9], 4.8
Sept 11 AM / 9 / Beyond Shared Memory / [10] [11][13]
Sept 11 PM / 10 / Pervasive Paralellism Lab (PPL)
Research Papers
Parallel Applications
[1]Sutter, H. and Larus, J. 2005. Software and the concurrency revolution. ACM Queue, vol. 3, no. 7, September 2005.
[2]S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proc. 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.
[3]C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Proc. 17th International Conference on Parallel Architectures and Compilation Techniques, October 2008.
Locking and Memory Consistency
[4]S. V. Adve and K. Gharachorloo, “Shared memory consistency models: a tutorial,” IEEE Computer, vol. 29, no. 12, pp. 66–76, Dec. 1996.
[5]M. D. Hill, “Multiprocessors should support simple memory consistency models,” IEEE Computer, vol. 31, no. 8, pp. 28–34, Aug. 1998.
Chip-Multiprocessors (CMPs)
[6]K. Olukotun and L. Hammond, “The future of microprocessors,” ACM Queue, vol. 3, no. 7, pp. 26–34, September 2005.
[7]L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese, “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th Annual International Symposium on Computer Architecture (ISCA'00), Vancouver, British Columbia, Canada, pp. 282–293, June 2000.
[8]K. Olukotun, L. Hammond and J. Laudon, Chapter 2, Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency, Morgan Claypool 2007.
Thread-Level Speculation (TLS)
[9]M. J. Garzaran, M. Prvulovic, J. M. Llaberia, V. Vinals, L. Rauchwerger, and J. Torrellas, " Tradeoffs in buffering memory state for thread-level speculation in multiprocessors," Proc.9th International Symposium on High-Performance Computer Architecture (HPCA), February 2003.
[10]L. Hammond, B. Hubbert, M. Siu, M. Prabhu, M. Chen, and K. Olukotun, "The Stanford Hydra CMP," IEEE MICRO, vol. 20, no. 2, pp. 71–83 March-April 2000.
Transactional Memory
[11]B. D. Carlstrom, J. Chung, A. McDonald, H. Chafi, C. Kozyrakis, and K. Olukotun, “The Atomos transactional programming language,” Proc. ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Canada, June 10–16 2006.
[12]A. McDonald, J. Chung, H. Chafi, C. Cao Minh, B. D. Carlstrom, C. Kozyrakis, and K. Olukotun, “Characterization of TCC on chip-multiprocessors,” Proc. 14th International Conference on Parallel Architecture and Compilation Techniques (PACT 2005), St. Louis, MO, Sept. 17–21 2005.
[13]K. Moore, J. Bobba, M. Moravan, M. Hill, and D. Wood, “LogTM: log-based transactional memory,” International Symposium on High Performance Computer Architecture (HPCA), February 2006.
[14]Saha, B., Adl-Tabatabai, A., Hudson, R., Cao Minh, C., Hertzberg, B. McRT-STM: A high-performance software transactional memory system for a multicore runtime. Proc. Symposium on Principles and Practice of Parallel Programming, June 2006.
An initiative of the Department of Science and Technology
Managed by the Meraka Institute of the CSIR and the University of Cape Town