University of Connecticut CSE 4500 (228)

Computer Science and Engineering Fall 2008

CSE 228 Parallel Systems

Homework 4

Due date: November 18, 2008

Problems:

  1. [20 pts] Consider a recirculating shuffle-exchange interconnection network (as discussed in class and in the handout on Interconnection Networks) that connects P processors to P memory cells (you can assume that P is a power of 2).
    (a) Draw the network for P = 4.
    (b) Sketch an algorithm for each switch that implements a write to a memory location
    given its address.
    (c) What is the time complexity of the write operation as a function of P?
  2. [20 pts] You are given a P-processor PRAM machine, with a linked list of P elements stored in shared memory, with each processor having a pointer to a distinct linked list element. Each processor also has the pointer to the head of the list element. Some processors consider themselves “busy”, others consider themselves “free”. Develop and present the pseudocode for an algorithm that efficiently computes the number of “free” and “busy” processors, with all processors learning these numbers. Give the analysis in terms of time, work (cost), and speedup (over a sensible sequential algorithm).
  3. [20 pts] A set of N processors is logically arranged in a tree using pointers. Each processor has a data structure, stored in shared memory, which contains a pointer to its parent. The root of the tree points to itself. The problem is for each processor to find (1) its distance from the root, and (2) the pid of the root processor.
    (a) Describe a log-efficient shared-memory parallel algorithm that solves this problem.
    (b) Give a detailed time and work complexity analysis of your algorithm. Under what circumstances is its time smallest? Largest?
  4. [20 pts] Consider the algorithm from the Notes Set 7 (online).
    (a) Analyze time, work (cost), and speed-up complexities when the processors are
    completely synchronous.
    (b) Perform optimality analysis.
    (c) For what ranges of P with respect to N is the algorithm optimal?
    Poly-log efficient? Polynomially-efficient?
  5. [20 pts] Give high-level pseudocode for a polynomially-efficient asynchronous shared-memory algorithm to compute the product of the elements of an n-size array using p processors, where 1 pn.
    (a) Perform work (cost) analysis.
    (b) For what range of p is the work (cost) of your algorithm (log-/poly-) efficient?
    Show analysis.
    (c) For what range of p is the cost of your algorithm optimal? Show analysis.

1