Transactions

1.  Transaction Concept

·  A transaction is a set of operations that collectively perform a single logical function.

E.g., transfer of funds from a checking account to a savings account

¨  a single logical function from the customer's standpoint

¨  to achieve this function, two operations are involved: (1) decrease the checking account by an X amount; and (2) increases the savings account by the same X amount

·  A transaction must see a consistent database.

In the funds transfer example, suppose the checking account balance before the execution of the transaction is $1000.00 and the amount to be transferred is $300.00. Then, the checking account balance should remain $1000.00 before it is changed to $700.00. It should not be that before the transaction reads the balance, the balance is $1000.00 and after reading the balance but before changing it to $700.00, it becomes $1200.00 (probably updated by another transaction).

When the transaction is committed, the database must also be consistent. Note that the database may not be consistent during the execution of the transaction.

·  Two main issues to deal with:

-- failures of various kinds, such as hardware failures and system crashes

-- concurrent execution of multiple transactions

2.  ACID Properties

To preserve the integrity of data, the database system must ensure the following properties of the transactions:

·  Atomicity : Either all operations of the transaction are properly reflected in the database or none are.

·  Consistency : Execution of a transaction in isolation preserves the consistency of the database.

·  Isolation : Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executing transactions. That is, for every pair of transactions Ti and Tj , it appears to Ti that either Tj finished execution before Ti started or Tj started execution after Ti finished (even though they may be executing concurrently -- their operations are interleaved)

·  Durability : After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

These properties are often referred to as the ACID properties.

Suppose X is a data item in database. The following are two operations to access item X:

--read(X) : transfer the data item X from the database to a local buffer belonging to the transaction that executed the read operation

--write(X) : transfer the data item X from the local buffer of the transaction that executed the write back to the database

Example: (Fund transfer) Transaction to transfer $50 from account A to account B:

1.  read(A)

2.  A := A - 50

3.  write(A)

4.  read(A)

5.  B := B + 50

6.  write(B)

·  consistency: the sum of balances of account A and account B is unchanged by the execution of the transaction. Note that the database is temporarily inconsistent after step 3 but before step 6.

·  atomicity : two updates are performed at steps 3 and 6. If the transaction fails after step 3 and before step 6, the system should ensure that the updates are not reflected in the database, else an inconsistency will result.

·  durability : once the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist despite failures

·  isolation : if between steps 3 and 6, another transaction is allowed to access the partially updated database, it will see an inconsistent database (the sum of balances of account A and account B will be less than it should be). The database system should ensure that this inconsistent database state be not seen by another transaction.

This can be trivially achieved by executing transactions serially, that is, one after the other. However, running multiple transactions concurrently increase the throughput.

3.  Transaction State

A transaction must be in one of the following states (using the transaction model in book):

·  Active : the initial state; the transaction stays in this state while it is executing

·  Partially committed : after the final statement has been executed

·  Failed : normal execution can no longer proceed

·  Aborted: after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:

-- restart the transaction, only if no internal logical error

-- kill the transaction

·  Committed, after successful completion

4.  Implementation of Atomicity and Durability

The recovery-management component of a database system implements the support for atomicity and durability. A simple but extremely inefficient scheme is the shadow-database scheme:

¨  assume that only one transaction is active at a time.

¨  A pointer called db_pointer always points to the current consistent copy of the database

¨  All updates are made on a shadow copy of the database and db_pointer is made to point to the updated shadow copy only after the transaction reaches partial commit and all updated pages have been flushed to disk

¨  In case transaction fails, old consistent copy pointed to by db_pointer can be used, and the shadow copy can be deleted.

We will see there are better schemes for ensuring atomicity and durability later.

5.  Concurrent Executions

·  Multiple transactions are allowed to run concurrently in the system. Advantages are:

¨  increased processor and disk utilization, leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk

¨  reduced average response time for transactions : short transactions need not wait behind long ones

·  concurrency control schemes -- mechanisms to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

Will consider concurrency-control schemes later.

6.  Schedules

·  A schedule of a set of transactions is an ordering of the instructions of the transactions such that

¨  it consists of all instructions of each transaction of this set

¨  it preserves the order in which the instructions appear in each individual transaction

Example: (a) Let T1 transfer $50 from account A to account B, and T2 transfer 10% of the balance from account A to account B. The following schedule (schedule I) is a serial schedule, in which T1 is followed by T2 .

(b)  Let T1 and T2 be the transactions defined previously. The following schedule (schedule II) is not a serial schedule, but it is equivalent to schedule I

Note that in both schedules I and II, the sum of the balances of A and B is preserved.

(c) The following concurrent schedule (schedule III) does not preserve the value of the sum of the balances of A and B.

7.  Serializability

·  Basic assumption: each transaction preserves database consistency

·  Thus, serial execution of a set of transactions preserves database consistency

·  A (possible concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of

(i)  conflict serializability

(ii)  view serializability

·  To focus on the main ideas, in our discussions, we ignore operations other than read and write instructions, and assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. That is, the simplified schedules consist of only read and write instructions.

Conflict Serializability

·  Instructions Ii and Ij of transactions Ti and Tj respectively conflict if and only if there exists some item Q accessed by both Instructions Ii and Ij and at least one of these instructions wrote Q.

·  (i) Ii = read(Q) and Ij = read(Q) : Ii and Ij don't conflict

(ii) Ii = read(Q) and Ij = write(Q) : Ii and Ij conflict

(iii) Ii = write(Q) and Ij = read(Q) : Ii and Ij conflict

(iv) Ii = write(Q) and Ij = write(Q) : Ii and Ij conflict

·  Intuitively, a conflict between Ii and Ij forces a temporal order between them. If Ii and Ij are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule

·  If a schedule S can be transformed into a schedule S' by a series of swaps of non-conflicting instructions, we say that S and S' are conflict equivalent.

·  We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

·  Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either the serial schedule
<T3, T4 > or the serial schedule <T4, T3 >.

·  Schedule III below can be transformed into Schedule I , a serial schedule where T1 is followed by T2 , by a series of swaps of non-conflicting instructions. Therefore, schedule III is conflict serializable.

View Serializability

·  Let S and S' be two schedules with the same set of transactions. S and S' are view equivalent if the following three conditions are met:

(a) For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must in schedule S' also read the initial value of Q.

(b)  For each data item Q, if transaction Tj executes read(Q) in schedule S, and that value was produced by transaction Ti (if any), then transaction Tj must in schedule S' also read the value of Q that was produced by transaction Ti .

(c)  For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S'.

·  A schedule S is view serializable if it is view equivalent to a serial schedule.

·  Every conflict serializable schedule is also view serializable.

·  Schedule IV below is a schedule which is view serializable but not conflict serializable.

It is view equivalent to the serial schedule T3, T4, T6 .

·  In schedule IV, T4 and T6 each performs a write(Q) without having performed a read(Q) operation. Such a write operation is called a blind writes.

It can be shown that blind writes appear in every view-serializable schedule that is not conflict serializable.

8.  Testing for Serializability

When we want to schedule the operations of a number of transactions, we need to show that in the resulting schedule, each transaction has the isolation property; that is, the resulting schedule is serializable.

(i)  Testing for Conflict Serializability

Consider some schedule of a set of transactons T1, T2, …, Tn. Let S be a schedule of this set of transactions. First, we construct a directed graph G = (V, E), called a precedence graph, from S. The set of vertices, V, consists of all the transactions T1, T2, …, Tn. The set of directed edges, E, consists of all edges Ti ® Tj for which one of the following three conditions hold: for some data item X

(a)  Ti executes write(X) before Tj executes read(X)

(b)  Ti executes read(X) before Tj executes write(X)

(c)  Ti executes write(X) before Tj executes write(X)

If an edge Ti ® Tj appears in the precedence graph, then in any serial schedule S' equivalent to S, Ti must appear before Tj .

Example: The following is a schedule for transactions T1, T2, T3, T4, T5

Precedence Graph

¨  A schedule is conflict serializable if and only if its precedence graph is acyclic (does not have a cycle).

¨  There are cycle-detection algorithms that take time of order n2 , where n is the number of vertices in the graph.

¨  If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph.

Based on the precedence graph of the preceding page, a serializability order of the transactions is

T5, T1, T3, T2, T4 . Note that the serializability order is not unique. Another possible serializability order is T1, T5, T3, T2, T4

*** Usually, we use the following short hand notation to represent the read(X) operation of transaction Tj : Rj(X) ( or rj(X) ); and for write(X) operation of Tj, use Wj(X) (or wj(X)).

Thus, the schedule of the previous example can be represented as

R2(X), R1(Y), R1(Z), R5(V), R5(W), W5(W), R2(Y), W2(Y), W3(Z), R1(U), R4(Y), W4(Y), R4(Z), W4(Z), R1(U), W1(U)

(b) Test for View Serializability

A modified form of the precedence graph, called a labeled precedence graph, can be used to test for view serializability. This technique is described in the third edition but not in the fourth edition.

9