Study Guide Test 2 Spring 2005

STUDY GUIDE – TEST 2 –SPRING 2009

The test will cover

Relevant portions of chapters 4(communication), 5(Naming),6 (Synchronization) , 7(Consistency and Replication), and 8(Fault tolerance)
Topics covered in lecture/handouts but not so much in the textbook: synchronization in shared memory systems, data-race detection.
Papers: “A Fast File System for UNIX”, “Eraser: A Dynamic Race Detector for Multithreaded Programs”, “Racetrack:Efficient Detection of Data Race Conditions via Adaptive Tracking”,
Homework assignment

Review your notes, the textbook, and the class notes which are posted on the Web.

Be prepared to answer questions that require analysis as well as memorization! Be able to work problems of the type given in the last homework assignment Here are some sample questions that will identify areas to concentrate on when you study.

Disclaimer: The guide is an aid to your study; it is not necessarily all-inclusive.

Communication (Ch. 4: May have some overlap with coverage of last test)

What is the difference between reliable and unreliable communication?
What are the three copy operations involved in buffered message passing?
In synchronous message communication, what are the three points at which the sender can synchronize?
What is the difference between transient and persistent communication?
What is a Remote Procedure Call? Be able to describe the steps in an RPC Send or Reply.
What is the difference between synchronous and asynchronous RPC? State the advantages of each.
Why are RPC-based middleware solutions to distributed system communication better than solutions based on low-level message passing? Hint: Consider some of the problems that RPC solves.
Be able to identify the following terms: socket, MPI, persistent communication, transient communication.

Distributed System Principles: Naming, Consistency, Fault Tolerance

Define name resolution
Know what an access point and an address are, and understand their relation to location-independent naming
Know the characteristics of an identifier (in the context of naming)
How do flat naming systems differ from structured naming systems? Give an example of each that we have discussed in class.
How are attribute-based naming systems different from both flat and structured naming systems?
Understand the fundamentals of the Chord algorithm: how keys are matched to nodes (succ functions), purpose of a finger table and how it is used, how Chord satisfies the requirement for location-independent naming.
What are two reasons for data replication?
How are replication and caching related to system scalability, fault tolerance, performance, etc.
Consistency models: what they are, why they are needed
Consistent ordering of operations as applied to replica updates
Sequential consistency versus “strict”, also called “UNIX” consistency
Causal consistency, be able to explain in terms of causally related events. Be able to relate to the discussion of causality in Chapter 6.
Fault tolerance: definitions of fault tolerance, failure, and error and fault.
Failure types
Process groups, failure masking, and replication.
Simple “voting” procedure to ensure access to most-recent version of a replica.

Synchronization, Mutual Exclusion, Transactions, Data-race Detection

Definitions: mutual exclusion, critical section, data race (as defined in the Eraser paper), other relevant terms
Know the characteristics of the producer/consumer and readers/writers problem. Understand, but do not memorize, the solution to the P/C problem as given in the notes. Ditto for the R/W solution.
Know how the P and V operations on semaphores work. Be able to state the algorithm that was presented in class, and know how to apply it in situations such as the homework problems or class examples.
Review problems from the notes that employ semaphores for mutual exclusion or other forms of synchronization. Be familiar with the kinds of problems that can occur if proper synchronization isn’t used. See examples in the notes such as the missing bank account update. Understand examples such as the following:
Assume that the two threads in the following code fragment run concurrently. Furthermore, assume that they share global variables X and Y as well as semaphores mutex and diffmutex. The shared objects are initialized as shown. Give the final values of X and Y after both threads have terminated. If more than one set of answers is possible, show them and explain
X = 1: Y = 2; mutex = 1; diffmutex = 0;
Thread 1: P(mutex); Thread2: P(diffmutex);

P(mutex);
Y = X + 5;X= Y + 5;
X = X + Y;Y= X + Y;
Vl(mutex);V(mutex);
V(diffmutex);

What is Lamport’s happened-before relation? Know the three components of the definition, the difference between causally related events and concurrent events, and the difference between the happened-before relation and the total ordering relation.
Be able to determine logical clock values according to Lamport’s method
What is the biggest shortcoming associated with Lamport’s logical clocks?
How do vector clocks address the shortcoming of the Lamport clock approach?
Be able to determine vector clock values and be able to state, from clock values alone, whether a → b, b → c, or a || b. (Did a happen-before b, b happen-before a, or are the two events concurrent?)
Understand the four distributed mutual exclusion algorithms presented in the text:
a)simple centralized algorithm
b)decentralized algorithm, based on Distributed Hash Tables
c) Ricart-Agrawala distributed algorithm
(be sure you understand the fault-tolerance issues here)
d) token ring algorithm
For the four algorithms given, be able to explain (not just state)
a) how a process knows when it can enter the critical section
b) the main problems associated with this algorithm
c)how fault tolerant the algorithm is (justify your answer) – no requirement to choose one that is “best”; be able to discuss tradeoffs
Be able to compare the algorithms in terms of performance, chance of deadlock/starvation, fault tolerance, etc.
Be able to define “transaction”, list, explain, and demonstrate understanding of the four ACID properties, and describe what it means for a transaction to “commit”.* not covered
What are some similarities and differences between transactions and critical sections?*not covered
What is the main problem addressed by RaceTrack and Eraser? Briefly, how do the two algorithms approach the problem?
What is the difference between dynamic data race detectionand static data race detection?
Lockset analysis, as implemented in Eraser, can be used to determine if every shared variable is protected by a mutual exclusion lock. Why do the authors believe their approach is better than approaches based on “happens-before”?
Be able to define the sets “locks-held” and C(v); know how they are used in the Eraser algorithm.
Know what a threadset is (in RaceTrack) and one is used to determine whether an access is concurrent
How does RaceTrack improve the efficiency of dynamic data-race detection?
Compare Eraser and RaceTrack with respect to their ability to detect data races: do either or both issue false alarms? Do either or both miss any potential data races?
Make a list of various applications that employ vector timestamps or Lamport logical clock values. (we have discussed several) Understand what role the logical clocks have in each case.
Why might a distributed system need an election algorithm?
Be able to describe the Bully algorithm and the Ring algorithm. In each case, how does a processor know that it has been elected coordinator?
How do wireless election algorithms differ from more traditional algorithms in terms of which node is eventually selected as the new coordinator?

File Systems

Know the characteristics of a disk (sectors, tracks, cylinders) and the relation between a file block and a disk sector.
What are the three components of a disk access? (seek, rotational delay, data transmission time)
Which of these components would we usually seek to minimize if we want to improve performance (disk read/write times)?
What is the difference between sequential and random (direct) access patterns for files?
What is an i-node?
What are some techniques for improving read/write performance in a file system?
Why did FFS introduce cylinder groups? (Consider both performance and reliability)
Give an argument in favor of large block sizes in file systems and an argument against large block sizes. How did FFS solve this issue?
What are the major issues that must be addressed in a distributed file system? What is network transparency, how can it be achieved?