On Fri, Apr 2, 2010 at 12:09 AM, Charles Weems <> wrote:

Dear Sushil,
Attached is my pass through the first part of the document. I did not edit the rationale for the theoretical material. As chair of a departmental curriculum committee, my reaction to the version you sent is that this would never be approved by my department, and it would simply be ignored. It contains too many topics for a BS level program. Many of these topics would be senior level elective material, or suitable for an MS. But to call them core to a BS, which is to say that they should be required, is expecting too much. My reading of this is that to do all of these topics justice would involve approximately three semester-long courses, which would be about 20% of the major credits in a BS program.
In addition, some of the areas are too specific. Departments need flexibility in how they choose to cover something. Perhaps some people prefer LogP instead of PRAM, for example. There will be resistance to proscriptive indications of how a general concept should be taught. Finally, some of the terms are dated or tied to particular technologies (e.g., FPGA instead of just saying reconfigurable). There were also topics in the list that were not specific to parallel processing, even if they are prerequisite to some of the other topics (e.g., recursion).
I have tried to reduce the list to one page, and to make the specifics more broad and general. I still think there is too much in it, but at least it is in the realm that I could consider going to my committee and making a case for how we could weave a subset of the topics across various existing courses, and then put the rest into one elective. I added dataflow and pipeline as architectural classes, since they are a distinct kind of parallelism, and are used in DSP. Also, the shared memory parallel programming paradigm was very proscriptive about the style being lock-based, so I added transactions plus checkpoint/rollback -- but an alternative would be to just omit all the specifics about how to program shared memory. I think these models are currently in a state of flux, and less specificity may keep this from becoming obsolete in the near future.
Regards,
Chip

TCPP/NSF Curriculum planning workshop on Curriculum Standards for Parallel and Distributed Computing
Washington DC, Feb 5-6, Hilton Arlington
Core Topics
(Body of Knowledge)
for Parallel and Distributed Computing for CS/CE Graduates
Chtchelkanova, Almadena (NSF), Das, Sajal (University of Texas at Arlington, NSF), Das, Chita (Penn State, NSF), Dehne, Frank (Carleton University, Canada), Gouda, Mohamed (University of Texas, Austin, NSF), Gupta, Anshul (IBM T.J. Watson Research Center), Jaja, Joseph (University of Maryland), Kant, Krishna (NSF, Intel), Lumsdaine, Andrew (Indiana University), Padua, David (University of Illinois at Urbana-Champaign), Parashar, Manish (Rutgers, NSF), Prasad, Sushil (Georgia State University), Prasanna, Viktor (University of Southern California), Robert, Yves (INRIA, France), Rosenberg, Arnold (Colorado State University), Sahni, Sartaj (University of Florida), Shirazi, Behrooz (Washington State University), Sussman, Alan (University of Maryland), and Wu, Jie (Temple University)
Contact: Sushil Prasad ()
3/29/2010
Abstract: This is a preliminary draft of core topics that a student graduating with a Bachelors degree in Computer Science or Computer Engineering is expected to have covered. Additional elective topics are expected. Also included here are the expected minimum skill set and a brief rationale for topics in algorithms. This is a working document, and its current limited release is to engage the various stakeholders for their initial feedback and update of this document (core vs. elective, additional topics, …?) to be taken up at the follow-up TCPP/NSF Curriculum Meeting on April 19th, 2010 in Atlanta (co-located with IPDPS-10).

Core Topics (Body of Knowledge)

Architectures

  • Classes

SIMD

  • MIMDData and control parallel
  • Shared and distributed memory (SMP, NUMA, CC-NUMA)

Distributed memory

Hybrid

  • Heterogeneous/Accelerator (e.g., GPU, FPGA) and reconfigurable
  • Dataflow and pipeline
  • Grid/cloud
  • Memory

Memory hierarchies

  • CachesConsistency and atomicity
  • Cache Ccoherence
  • Interconnects
  • Topologies (e.g., diameter latency vs. bandwidth)
  • Routing (packet and circuit based)

Switches

  • Power

Power sinks

Methods to control power

AlgorithmsTheory

  • Parallel and distributed models and complexity
  • Cost of computation: time, space, power, etc.; Asymptotics
  • Cost reduction: speedup, space compression, etc.

Cost tradeoffs: time vs. space, power vs. time, etc.

  • Scalability in algorithms and architectures

Model-based notions:

Notions from complexity-theory: the PRAM, simulation/emulation, P-completeness, #P-completeness

Notions from scheduling: dependencies, task graphs, work, (make)span

  • Algorithmic sParadigms

Divide & conquer

Recursion

  • Scan (parallel prefix) and reduction (map-reduce)
  • Series-parallel composition
  • Stencil-based iteration

Graph embedding as an algorithmic tool

oAlgorithmic Problems

  • Communication: broadcast, multicast, scatter/gather, gossip
  • Sorting & selection
  • Leader election
  • Termination detection

Graph algorithms: search, path selection

  • Matrix computations: matrix product, linear systems, matrix arithmetic

Programming

  • Paradigms
  • Data parallelVectors and streams

Task parallel

  • Shared memory (threads, atomic operations, mutual exclusion, locks, barriers, race conditions, transactions, checkpoint-rollback)
  • Distributed memory
  • Hybrid
  • Parallel programming frameworks
  • Libraries/APIs(examples: Java threads, Pthreads, MPI, etc.)

Languages (examples: Cilk, UPC, etc.)

  • Language extensions (example: OpenMP)

Hybrid frameworks (examples: CUDA, OpenCL)

  • Program development
  • Task and data ddecomposition: Locality, partitioning/tiling, over-decompositionaggregation

Load balancing

Scheduling

  • Mapping: Static, dynamic

Synchronization: critical sections, asynchronous computation

Communication: latency hiding, aggregation of messages, false sharing

  • Tools
  • Debuggers and performance analysis

Performance monitoring

oProgram analysis

Dependences: task graph, span vs. work

Synchronization analysis: races, deadlocks, livelocks

Minimum skill set for a CS/CE graduate

-An undergraduate student should be able to program parallel and distributed systems efficiently (productivity and performance)

  • Productivity - Languages, software engineering, programming methodologies, algorithms, parallel design patterns, tools, libraries
  • Performance - Execution time, power, memory, I/O, scalability, throughput

-To be aware of interaction among different tools, algorithms, architectures, programming models

-The knowledge should be relevant for the foreseeable future

e.g., multi-core, GPUs, web-services, clusters
Rationale: Algorithms Topics

Parallel/Distributed Models and Complexity: It is essential to provide students with a firm background in the conceptualunderpinnings of parallel and distributed computing. Technology changesin this dynamic field at a hitherto unheard of pace. Students whose educationis tied too closely to current technology will be at a disadvantagewhen technology shifts render their knowledge obsolete. To mention just afew instances of such paradigm-changing shifts: (1) The VLSI revolution ofthe late 1970s and early 1980s enabled computers with unheard-of levels ofparallelism. New problems related to interprocesssor communication arose,exposing an aspect of parallel computing that could safely be ignored whenlevels of parallelism were very modest. (2) As clusters of modest processors(or, workstations) joined multiprocessors-in-a-box as parallel computingplatforms in the early 1990s, two sea-changes occurred: (a) computing platformsbecame heterogeneous for the first time, and (b) communication delaysbecame impossible to hide via clever latency-hiding strategies. (3) As computationalgrids became “parallel” computing platforms around the turn ofthe millennium, the distinction between parallel and distributed computingbecame fuzzier than ever before.

Fortunately, in spite of the “leaps and bounds” evolution of parallel computingtechnology, there exists a core of fundamental algorithmic principles.These principles are largely independent from the details of the underlyingplatform architecture and provide the basis for developing applications oncurrent and future parallel platforms. Students should be taught how toidentify and synthesize fundamental ideas and generally applicable algorithmicprinciples out of the mass of parallel algorithm expertise and practicalimplementations developed over the last decades.

What, however, should be taught under the rubric “conceptual underpinningsof parallel and distributed computing?” Our choices reflect a combinationof “persistent truths” and “conceptual flexibility.” In the formercamp, one strives to impart: (1) an understanding of how one reasons rigorouslyabout the expenditure of computational resources; (2) an appreciationof fundamental computational limitations that transcend details of particularmodels. In the latter camp, one needs to expose the student to a varietyof “situation-specific” models, with the hope of endowing the student withthe ability to generate new models in response to new technology.

Algorithmic Paradigms: This section acknowledges the folk wisdom that contrasts giving a person afish and teaching a student how to fish. Algorithmic paradigms lie in thelatter camp. We have attempted to enumerate a variety of paradigms whosealgorithmic utility have been demonstrated over the course of decades. Insome sense, one can view these paradigms as algorithmic analogues of high-levelprogramming languages. The paradigms in our list can be viewed asgenerating control structures that are readily ported onto a broad range ofparallel and distributed computing platforms. Included here are: the well-knowndivide-and-conquer paradigm that is as useful in the world of sequentialcomputing as in the world of parallel and distributed computing; theseries-parallel paradigm that one encounters, e.g., in multi-threaded computingenvironments; stencil-based controllers of iterative processes; graph-embeddingtechniques that facilitate creating virtual computing platformsthat real platforms can then emulate.

Algorithmic problems: Two genres of specific algorithmic problems play such important roles insuch a variety of computational situations that we view them as essential toa education about parallel and distributed computing. Our list contains twogenres of problems. (1) Several of our entries are auxiliary computations thatare useful in a variety of settings. Collective communication primitives arefundamental in myriad applications to the distribution of data and the collectionof results. Certain basic functions play important control functions,especially as parallel computing incorporates many of the characteristics ofdistributed computing. The process of leader election endows processorswith computationally meaningful names that are useful in initiating andcoordinating activities at possibly remote sites. The essential process of terminationdetection is easy when parallelism is modest and processors arephysically proximate, but it is a significant challenge in more general environments.(2) Several of our entries are independent computations that areusually sub-procedures of a broad range of large-scale computations. Sortingand selection are always near the top of everyone’s list of basic computations.

Algorithms that operate on graphs and matrices can occur in almostevery application area.

1