The Art of Multiprocessor Programming

Errata for

The Art of Multiprocessor Programming

Version of 3 January 2011

In many places, inserted text is highlighted in red.

There are 5 new eps files to replace existing figures. Replace Figure 14.3 with lazyskiplist-add.eps. Replace Figure 6.5 with lock-free-universal.eps. Replace Figure 10.12 with LockFreeQueue.eps. Replace Figure 15.3 with SimpleTree.eps. Replace Figure 6.7 with wait-free-universal.eps.

Also, please completely remove Figure 16.8 on page 380 and remove the sentence “Fig. 16.8 shows such a sub-DAG.” In the 5th line from the bottom on page 379.

Table of contents

p. xviadd item after Bibliography (note page number)

Suggested Ways to Teach 494

Preface

p. xvii / Pearlman -> Perlman
p. xx / “all of which are useful in structuring concurrent applications.”

Chapter 1

P3 / Include few words => include a few words
P. 3 / references the end -> references at the end
P3: “The hope is that the reader will understand the issues in a way that will later allow him or her to apply the lessons learned to specific multiprocessor systems.”
P3: line 25: the end -> at the end
p. 3
p. 4 / In Section 1.1 in place of "there are many primes between 1 and 109, but hardly anybetween 9x109 and 1010"should say"there are more primes between 1 and 109 than between 9x109 and 1010"
In Fig. 1.1, line 4 should say
for (int j = (i * block) + 1; j <= (i + 1)* block; j++) {
p. 5 / Fig.1.3 add 2 more spaces at the start of line 6
p. 11
p. 18 / Second paragraph of Mutual Exclusion bullet: “Initially the can is either up or down. Let us say it was down. Then only the pets can go in, and mutual exclusion holds.”
Replace question 6 with:
Use Amdahl’s law to answer the following questions:

Suppose the sequential part of a program accounts for 40% of the program's execution time on a single processor. Find a limit for the overall speedup that can be achieved by running the program on a multiprocessor machine.
Now suppose the sequential part accounts for 30% of the program's computation time. Let sn be the program's speedup on n processes, assuming the rest of the program is perfectly parallelizable. Your boss tells you to double this speedup: the revised program should have speedup s'n > sn/2. You advertize for a programmer to replace the sequential part with an improved version that runs k times faster. What value of k should you require?
Suppose the sequential part can be sped up three-fold, and when we do so, the modified program takes half the time of the original on n processors. What fraction of the overall execution time did the sequential part account for? Express your answer as a function of n.

P 22 / Change “… threads typically” to “…. Threads typically”

Chapter 2

p. 23 / Figure 2.3 should be amended as shown:
public long getAndIncrement() {
lock.lock();
try {
long temp = value;
value = temp + 1;
return temp;
} finally {
lock.unlock();
}
}
P 24 / “some thread make” => “some thread makes”
p. 25 / Pragma 2.3.1: “We explain the reasons in Chapter 3 and Appendix B.”
P24: change in line 3-4 to: “Let us assume, for simplicity, that each thread repeatedly acquires and releases the lock, with other work taking place in the meantime.”
P25: after "Pragma 2.3.1"replace “read A(v == x)”by “read A(x == v)”.
p. 26
p. 35 / In Fig. 2.5, remove the word "volatile" from line 2.
Change
“SupposeA’s token is on Node 0, andB’s token on Node 1 (soAhas the later timestamp). ForA, thelabel()method is trivial: it already has the latest timestamp, so it does nothing. For B, thelabel()method “leapfrogs”A’s node by jumping from 0 to 2.”
to
“SupposeA’s token is on Node 0, andB’s token on Node 1 (soBhas the later timestamp). ForB, thelabel()method is trivial: it already has the latest timestamp, so it does nothing. For A, thelabel()method “leapfrogs”B’s node by jumping from 0 to 2.”
p. 27 / In Fig. 2.6, remove the word "volatile" in lines 3 and 4
p. 27 / “if one thread runs completely before the other” -> “if one thread never tries to get the lock”
p. 29 / Fig 2.7 line 13 “level 1” => “level i”
p.43 / Fig 2.17 add the following line between 6 & 7
… // initialize myIndex
p. 44 / Exercise 19. Instead of 3n should be O(3n) and instead of n2n should say O(n2n).
p. 48 / Fig 3.3 delete line 6
P 48 / Line 4 of Fig 3.3: add space “(T[])new” => “(T[]) new”

Chapter 3

p. 54. / First paragraph: “We use the following shorthand: <p.enq(x) A> -> <p.deq(x) B> means that any sequential execution must order A's enqueue of x at p before B's dequeue of x at p, and so on.”
P 55 / Empty Exception should be EmptyException
P.57 / Top of page, “shorthand notion" should be "shorthand notation".
P 60 / “the cost ... are” => “the cost ... is”
P. 61 / "This method must be synchronized to ensure that only one instance is created, even if several threads observe instance to benull create new instances." Delete "create new instances".
p. 67 / "... head is the index of the next slot from which to remove an item, and tail is the index of the next slot in which to place an item."

Chapter 4

p.77
p.84
p. 88
p. 91 / In the table of Figure 4.5 in the second line MRMW Boolean Regular should be MRSW Boolean Regular.
In Fig.4.12, add a new line between 22 and 23 Line
if (i == me) continue;
In addition, Line 32 should be
a_table [ 0 ][ i ] = value;
Should be “when B’s snapshot was taken before A started its Scan() call”.
In Fig 4.21 Line 20 should be “returnnewCopy[j].snap;
p. 85 / figure 4.13
"Thread 0 reads a_table[0][1] with value t+1."
Should be a_table[0][0]
P 87 / linearizable to) -> linearizable) to
p. 88 / 2nd last line
"before A started its update() call" should be "... scan() call".
p. 90
p. 94 / In Fig 4.19 Line 11 should be “stamp = label;”
In Exercise 40 “ Does Peterson’s two thread mutual exclusion algorithm work if the shared atomic flag registers are replaced by regular registers”.
p. 95. / In Figure 4.23, the comment “N is the total number of threads” is incorrect. N is actually the length of the register.
P95: “If multiple processors call write(), is this register atomic?” Replace “atomic” with “safe.”
P95: “If multiple processors call write(), is this register atomic?” Replace “atomic” with “safe.”

Chapter 5

P 100 / “The reader will notice that since the decide() method of a given consensus object is executed only once by each thread,and that there are a finite number of threads, by definition, a lock-free implementation would also be wait-free and vice versa.”
P101: remove the sentence: “A's moves are showed in black, and B's in gray.”
P104: Caption of figure 5.3: “In the second execution scenario, A moves first, driving the protocol to a 0-valent state s’’. “ replace by: “In the second execution scenario, A moves first, then B executes one operation, driving the protocol to a 0-valent state s’’.”
P 106 / can provide -> can we provide
p. 108 / for a threads A -> for a thread A
P110: “Exercise 72 asks one to justify this claim.”
p 111 / figure 5.11, 3: “Assign23 assign2 = new Assign23(NULL)” should be“Assign23assign23= new Assign23(NULL)”
p 111 / figure 5.11, 5: [indentation should match line 6]
P114: Figure 5.14 line 5 need to add indentation before “propose(value)”.
p 115 / figure 5.14, 5: [indentation]
P 117 / provides a get() -> provide a get()
P117: replace “provides” by “provide” and “a get() method is only a convenience, and as proved in a homework assignment.” By “a get() method is only a convenience, and so it follows that:”
P119: In the text “We consider the compareAndSet() operation mentioned earlier, a synchronization operation supported by several contemporary architectures. (For example, it is called CMPXCHG on the Intel PentiumTM). This method is also known in the literature as compare-and-swap. compareAndSet() takes two arguments: an expected value and an update value.”
Remove “mentioned earlier” and replace “As noted earlier, compareAndset() takes…” by “A compareAndSet() takes…”
P119: and P120 exercise 59 should read: “The SetAgree class, like the Consensus class, provides a decide() method whose call returns a value that was the input of some thread’s decide() call. However, unlike the Consensus class, the values returned by decide() calls are not required to agree. Instead, these calls may return no more than k distinct values. (When k is 1, SetAgree isthe same as consensus.) What is the consensus number of the SetAgree classwhen k >1?”
p 121. / In Exercise 65, Replace “provides the same propose() and decide() method as consensus” by “provides the same decide() method as consensus”.

Chapter 6

P 127 / figure 6.2
"Invocation" should be "Invoc".
p. 128 / figure 6.4 line 21 "current" should be "Node current".
P129: Figure 6.5 file named lock-free-universal.pdf seems that somehow the fonts for the words “next”, “decideNext”, “seq” and “invoc()” are in boldface even though they should be the same as for the word “sentinel.”
P132: (PRINTERS TYPO, appears OK in galley proof file Ch06-P370591 but not in printed book) “The value of before.seq 1 mod n+” should be “The value of before.seq + 1 mod n”
P132 and P133: Changes in red “Now imagine that Thread3 immediately announces its node. Then Thread 7 helps to appendThread3’s node, but again pauses immediately after setting Thread 3’s node’s sequence number to 3, but before adding it to head[]. Now Thread 3 wakes up. It will not enter the main loop because its node’s sequence number is non zero…”
P 131 / figure 6.6 line 23 "current" should be "Node current".
p 131
p 131
p 134
p. 134
p 134
p 137 / In Figure 6.6 line 12
Node help = announce[(before.seq + 1) % n];
In Figure 6.6 line 17
Node after = before.decideNext.decide(prefer);
P132: Figure 6.7 caption: similar changes as in text, thread numbers in red “Notice that Thread 5’s own entry inthe head[] array is not yet set to its announced node. Next, Thread 3 announces its node and
Thread 7 helps to appendThread 3’s node, setting Thread 3’s node’s sequence number
to 3. Now Thread 3 wakes up. It will not enter the main loop because its node’s sequencenumber is non zero….”
In Figure 6.8 line 12
Node help = announce[(before.seq + 1) % n];
figure 6.8 line 23 "current" should be "Node current".
In Figure 6.8 line 17
Node after = before.decideNext.decide(prefer);
“Exercise 77. Propose a way to fix the universal construction of Figure 6.8 to work for objects with nondeterministic sequential specifications.”
Exercise 82 first line: "made to the lock-free protocol" should be "made to the wait-free protocol"
P 135 / announced). thread C => announced). Thread C
p. 132 / The value of before.seq 1 mod n+" => "The value of (before.seq + 1) mod n”

Chapter 7

p.146 / first line of section: "simple TTASLock" should be "TASLock"
p. 148
P149
p. 153 / In Figure7.5 Line 7 should be:
maxDelay = max;
line -8: the the -> the
Fig. 7.9 omit line 6
P153: Fig. 7.9 line 2 should be “2 AtomicReference<Qnode> tail;
p. 149 / the the critical section => the critical section
P. 151 / Replace “shows the CLHLock class's fields, constructor, and QNode class” with “shows the CLHLock class's fields and constructor”
P 151 / its its own => its own
p. 155 / Fig. 7.12 replace “queue” in line 5 with “tail”
p. 155 / Fig 7.13, change line 33 to
// wait until successor fills in the next field
p. 157
p. 157 / First paragraph of Section 7.6, the reference to Pragma 8.2.3 should be to Pragma 8.2.2.
“field is null, the associated thread has either not acquired the lock or has released it” should be “field is null, the associated thread has either not acquired the lock or has not released it yet”
p. 161 / Enqueues that node in the queue (Line 12) => Enqueues that node in the queue (Line 8)
p. 162 / In Fig. 7.21, Line 6 is redundant and can be omitted.
p. 163 / In Fig. 7.23, remove Line 4.
p. 165 / “the thread-local myPred field” =>“the thread-local myNode field”
p. 165 / “node from myPred” =>“node from myNode”
p .174 / Exercise 86 should be: (may require rewrap through page 176, please advise)
First barrier implementation: We have a counter protected by a test-and-test-and-set lock. Each thread locks the counter, increments it, releases the lock, and spins, rereading the counter until it reaches n
Second barrier implementation: We have an n-element array b, all 0. Thread zero sets b[0] to 1. Every thread i, for 0 < i <= n-1, spins until b[i-1] is 1, sets b[i] to 1, and waits until b[i+1] becomes 2, at which point it proceeds to leave the barrier. Thread b[n-1], upon detecting that b[n-1] is 1, sets b[n-1] to 2 and leaves the barrier.
Compare (in ten lines) the behavior of these two implementations on a bus-based cache-coherent architecture. Explain which approach you expect will perform better under low load and high load.
p. 175
p. 175 / In Exercise 86, “Second barrier implementation: We have an n-element boolean array b, all false. Thread zero sets b[0] to true. Every thread i, for 0 < i <= n, spins until b[i-1]
is true, sets b[i] to true, and waits until b[n-1] is true”
In Exercise 89, Remove “Notice that” and begin with “In the HCLHLock lock…”.
p. 175
p. 176 / Fig 7.33 replace Line 3 with
AtomicReference<Qnode> tail = new Qnode();
Exercise 91: “whether any thread is holding a lock (but does not actually acquire the lock)”

Chapter 8

p. 181In the caption for Fig. 8.4, change
However, thread D manages to acquire the critical section lock first,
and so both A and B spin until C leaves the critical section.
to
However, thread D manages to acquire the critical section lock first,
and so both A and B spin until D leaves the critical section.

p 182 / The constructor for LockedQueue should be:
public LockedQueue(int capacity) {
items = (T[])new Object[capacity];
}
p.185 / In Figure 8.9, line 48 should be:
while (readers > 0 || writer) {
P185: instead of “both synchronize on the mutex and condition fields” should be “both synchronize on the lock and condition fields”
p.186. / Figure 8.11 should be:
private class ReadLock implements Lock {
public void lock() {
lock.lock();
try {
while (writer) {
condition.await();
}
readAcquires++;
} finally {
lock.unlock();
}
}
public void unlock() {
lock.lock();
try {
readReleases++;
if (readAcquires == readReleases)
condition.signalAll();
} finally {
lock.unlock();
}
}
}
Figure 8.12 should be
private class WriteLock implements Lock {
public void lock() {
lock.lock();
try {
while (writer) {
condition.await();
}
writer = true;
while (readAcquires != readReleases) {
condition.await();
}
} finally {
lock.unlock();
}
}
public void unlock() {
writer = false;
condition.signalAll();
}
}
P 188 / Fig 8.13 replace lines 11-23 with
public void lock() {
int me = ThreadID.get();
lock.lock();
try {
if (owner == me) {
holdCount++;
return;
}
while (holdCount != 0) {
condition.await();
}
owner = me;
holdCount = 1;
} finally {
lock.unlock();
}
}
P 189
P196: / Line 2-3: initialize => initialize
Heading 9.2, Line 3: “Node<T> class has three fields. The item…”
change to:
“Node<T> class has three fields. \footnote{To be consistent with the Java memory model these fields and their modifications later in the text will need to be volatile, though we ignore this issue here for the sake of brevity.} The item…”
[If a footnote will fit, please renumber remaining footnotes. If easier to add a side-note to avoid rewrap and renumbering, please set that in a box in the left margin]
P 197 / any any ordered set => any ordered set

Chapter 9

p 204 / last line: "harder than in the course-grained" should be "coarse".
P 207 / Fig. 9.13, lines 52,53 change
Entry pred = this.head; // sentinel node;
Entry curr = pred.next;
to
Node pred = this.head; // sentinel node;
Node curr = pred.next;
}
p. 209 / nodes has been -> node has been
P211 / First graph Change
(Figs. 9.16 and 9.17) to (Figs 9.17 and 9.18)
(Fig. 9.18) to (Fig. 9.16)
"as Fig. 9.19 shows" to ""as Fig. 9.20 shows"
P212 / Change "the contains method (Fig. 9.20)" to the contains method (Fig. 9.19)"
p. 213. / [Can the following be done with rewrap up to page 221-222?]
Heading 9.8, starting line 7:
Replace:
Fig. 9.22 shows a Thread A attempting to add node a between nodes predA and currA. It sets a’s next field to currA, and then calls compareAndSet() to set predA’s next field to a. If B wants to remove currB from the list, it might call CompareAndSet() to set predB’s next field to currB’s successor. It is not hard to see that if these two threads try to remove these adjacent nodes concurrently, the list would end up with b not being removed. A similar situation for a pair of concurrent add() and remove() methods is depicted in the upper part of Fig. 9.22.
with:
In Fig. 9.22, part (a) shows a Thread A attempting to remove node node a while Thread B is adding a node b. Suppose A applies compareAndSet() to head.next, while B applies compareAndSet() to a.next. The net effect is that a is correctly deleted but b is not added to the list. In part (b) of the figure, Thread A attempts to remove a, the first node in the list, while B is about to remove b, where a points to b. Suppose A applies compareAndSet() to head.next, while B applies compareAndSet() to a.next. The net effect is to remove a, but not b.
If B wants to remove currB from the list, it might call compareAndSet() to set predB’s next field to currB’s successor. It is not hard to see that if these two threads try to remove these adjacent nodes concurrently, the list would end up with b not being removed. A similar situation for a pair of concurrent add() and remove() methods is depicted in the upper part
of Fig. 9.22.
p.201. / Last paragraph: “except for the initial head sentinel node, acquire the lock for a node only while holding the lock for its predecessor”
p. 204 / [near bottom of page]
Replace
"The same distinctions apply to remove(a) calls. A successful call (a present)
is linearized when the predecessor node is locked (Lines 36 or 42). A
successful call (a absent) is linearized when the node containing the next
higher key is locked (Lines 36 or 42). An unsuccessful call (a present) is linearized when the node containing a is locked."
With:
"The same distinctions apply to remove(a) calls. A successful call (a present)
is linearized when the predecessor node is locked (Lines 36 or 42). An
unsuccessful call (a absent) is linearized when the node containing the next
higher key is locked (Lines 36 or 42)." [delete last sentence in this paragraph]
p210: Figure 9.18 the indentation starting in line 9 is off, pred.lock should be appropriately aligned with the w of the while in line 6 and next lines aligned accordingly.
p211: Remove the paragraph starting with “Logical removals require a small…” and ending with “even if its predecessor is logically or physically deleted.”
p214: Figure 9.22: The LazyList class -> Figure 9.22: The LockFreeList class
p218: 40 curr = curr.next -> 40 curr = curr.next.getReference()
is the almost the same
again as when -> again when
p219: The Lock-free
P 206 / Fig. 9.11 change line 6 from
while (curr.key <= key) {
to
while (curr.key key) {
p.204
p.217
p.217
p. 218
p. 221 / Figure 9.9 caption: ``Because A must lock both head and a, and B must lock both a and b, they are guaranteed to conflict on a, forcing one call to wait for the other.''
In Line 16 "calls attemptMark() to mark currA as logically removed (Line 27)" should be "uses a compareAndSet() to attempt to mark currA as logically removed (Line 27)".
In Line 20 "If the attemptMark() call fails, remove() starts over." should be replaced by "If the compareAndSet() call fails, remove() starts over."
In Figure 9.26, Line 27, replace "27 snip = curr.next.attemptMark(succ,true);"
by "27 snip = curr.next.compareAndSet(succ, succ, false, true);"
p218, Fig 9.27, line 36: false{} -> false
In Exercise 118 “Explain why the following cannot happen…”
p. 217 / "It then calls compareAndSet() (Line 10)" => "It then calls compareAndSet() (Line 11)"
p.216 / paragraph 1 line 4. "The Window class's find() method” => "The find() method”
p. 225 / Fig 10.2 replace line 5 with
volatile head, tail;
p. 226 / Fig 10.3, replace line 24 with
tail.next = tail; tail = e;
Fig. 10.4, line 48 replace “size.getAndincrement()” with “size.getAndDecerement();”
p. 227 / Line 3, Replace “notify waiting enqueuers” with “notify waiting dequeuers”
P. 227 / Fig 10.5 replace 66 with
public volatile Node next;

Chapter 10