Advanced Database, Fall 2003 class project

Name: Siu Yau

ReadMe for RepCRec Simulator

0) Running & Compiling the project:

a) Requirements:

The simulator is written in java and distributed as java class files, so a JRE to is required to run it and a java compiler is needed to compile it.

b) Getting the class files.

The source and class files are distributed in a repcrec.jar file. This README and the repcrec.jar file can be found on

c) Running the project

To run the project using interactive line mode, issue:

java -cp repcrec.jar repcrec.RepCRecMain

To run the project using a file as input, use:

java -cp repcrec.jar repcrec.RepCRecMain <filename>

When the system finishes initialization, it will print:

Initialisation completed

to the standard output. In interactive mode, the program is then able to accept inputs, in file mode, the program will start reading from the file.

To terminate the program, enter

exit

to the input stream (either type it to the standard input, or put that in the file).

d) Compiling the project

To check the source code for the project, you need to unjar the jar file:

jar -xvf repcrec.jar

Then clean the class files already present:

rm */*.class

Compile the main function repcrec/RepCRecMain.java:

javac repcrec/RepCRecMain.java

You can then run the project without the -cp flag:

java repcrec.RepCRecMain [filename]

1) Result format:

When an operation is completed, the system will show the transaction name, the statusof the operation and the site that returned the result:

Transaction TRANSACTIONNAME: status - [result] by site SITENUMBER

However, begin(),beginRO(), fail() and recover() will not return a status. Rather, if fail() causes some transaction to fail, it will just print out a message.

Also, blocked operations will not return a status immediately. Rather, the system will return the result after the operation has been completed.

Some example return formats are given here (commands are shown in italics, comments shown in bold, the returned results are in normal font):

begin(T3)

begin(T100)

begin(T120) ; begins don't have results

begin(T130)

R(T3,x8)

Transaction T3: Success - 80 by site 10

W(T100,x12,31415962)

Transaction T100: Success - by site 3

W(T120,x12,12304)

Transaction T120: Killed - by site 3

W(T130,x1,123)

Transaction T130: Success - by site 2

R(T100,x1) ; This operation is blocked by lock conflict.

end(T130)

Transaction T130: Success - by site 3

; a few ticks are needed for the 2-phase commit to complete. Other operations can happen during this time.

Transaction T100: Success - 123 by site 2 ; This result is for the R(T100,x1) operation that had been blocked.

fail(1)

Aborting transaction T100 because it was serviced by site 1 which is now down.

Aborting transaction T3 because it was serviced by site 1 which is now down.

recover(1)

2) dump(var) format:

A data item, when dumped, shows the history of all its committed values.

ITEMNAME: VALUE1STARTTIMESTAMP1, ENDTIMESTAMP1>; VALUE2STARTTIMESTAMP2, ENDTIMESTAMP2;...

Where itemname is the name of the data item, value_x is the value, starttimestampx is the first timestamp when the value is valid for this item, and endtimestampx is the first timestamp when the value is not valid for this item.

For example:

Site 1:

SharedX : Je<24,37>; Je<37,39>; Ya<39,66>; Je<99,107>; I<117,129>; Je<129,2147483647>;

Site 2:

SharedX : Je<24,37>; Je<37,39>; Ya<39,99>; I<117,129>; Je<129,2147483647>;

Site 3:

SharedX : Je<24,37>; Je<37,39>; Ya<39,98>; Je<98,108>; I<117,133>; Je<133,2147483647>;

This is the values of a variable SharedX on three sites. It has been updated on timestep 37, 29, 98 and 129. The "gaps" where the endtimestamp does not correspond to the starttimestamp is due to the site being down before the update. Site 2, for example, missed the update on timestamp 98 altogether (because it was down during the lifetime of the transaction that did the update).

3) dump(site) format:

Dump(site) contains two parts, the first part is variables, the next part is transactions.

a) Variables: all the data item are shown in their dump(var) format, plus their lock
holders:

data item in dump(var) format | WRITELOCKHODER | READLOCKHOLDERS

Where writelockholder is the transaction names that holds a lock on this variable, and readlockholders is a comma-delimited list of transaction names that holds a lock on this variable.

b) Transactions: all transactions alive on a site is shown, along with the variable they are blocked on:

TRANSACTIONNAME (BIRTHDATE) [blocked on <VARIABLENAME>] [Read Only], ...

where birthdate is the timestamp that the transaction was started. The VARIABLENAMElist shows you which variable the transaction is blocked on, if any.

For example:

Site 2:

x2: 20<3,14>; | |

x1: 10<3,2147483647>; | T3 |

x20: 200<3,14>; | |

x18: 180<3,14>; | |

x16: 160<3,14>; | |

x14: 140<3,14>; | |

x8: 80<3,14>; | |

x12: 120<3,14>; | |

x6: 60<3,14>; | |

x11: 110<3,2147483647>; | | T4, T3,

x10: 100<3,14>; | |

x4: 40<3,14>; | |

TRO(19) Read Only , T2(20) blocked on x1, T3(21), T4(27),

This output means there are 4 transactions started on site 2, T3 has write lock on x1 and read lock on x11. T4 has read lock on x11. T2 is blocked on x1, whose write lock is held by T3 whosebirthdate is later (younger) than T2. The shared variables are valid until timestamp 14, thatwas when the site came up after a failure. However, x1 and x11 are available because they are notshared.

The read only transaction, TRO, is started on timestamp 19 and doesn't hold any locks, since it is read only.

A failed site does not respond to dump site requests.

4) Blocked Transactions:

There are two reasons why a transaction is blocked. If a transaction is blocked due to lock conflict, no information will be given to the output unless you use dumpsite() to find out whichsite the transaction is on and that it is blocked on a variable. If a transaction is blocked because an operation cannot be serviced by any sites, that operation will be retried in each subsequent tick, and a message will be printed to the output in each try:

fail(2)

begin(T1)

R(T1,x1)

No sites can service this request: R(T1, x1)

No sites can service this request: R(T1, x1); a blank line input

No sites can service this request: R(T1, x1); a blank line input

recover(2)

Transaction T1: Success - 10 by site 2

5) Architectural overview

The RepCRec protocol is simulated with a variable number of sites, each with its own complete lock, data, and transaction managers. These sites communicate with a central transaction manager via a lock-step mechanism.

a) Sites

Each site (site/Site.java) has its own self-contained data, lock and transaction managers. Other than exposing the first phase of two phase locking, it is no different than a normal DBMS.

The lock managers use a wait-die protocol in which older transactions wait for younger ones, and younger ones abort themselves when in lock conflict with older transactions. The data manager uses shadow copy for transaction’s writes, and flushes them out to the permanent copy when the transaction commits. It also keeps a log for transactions and their shadow copies when they are first-phase committed in the two phase commit protocol.

b) RepCRec TM

The central TM (repcrec/RepCRec.java) translates the user’s operation to operations on each of the sites. In every time-step, the central TM remember the user’s operations, translate the operations to the site’s operations, send out a vector of operations to each of the up sites, and receive a vector of results from the sites. It will then de-multiplex the results to the appropriate users operations. Once the lock-step with the sites is finished, it will go through all the user operations that have finished, and output the result.

Some of the translation that need to take place are translating commits to two-phase commits, translating writes to a write request to each site, translating read to read request to one site which can service the variable.

6) Known bugs/peculiarities:

a) Site-variable Serviceability bug for read only transactions

We keep a list of variables at any one point in time that tells the central TM which sites can service which variables.We use this list is to prevent the TM from sending requests to sites that cannot service that request. However we do not keep a history of serviceability. That means, as long as the variable is serviceable by a site at the current time, the TM assumes its whole history is serviceable. This works with normal transactions, which uses only the latest copies, but will not work for read only transactions. If a read only transaction wants to read a data from a site which didn’t catch a shared variable’s update, it could get a wrong answer or an error telling you the site doesn’t know about the value of the variable at the given timestamp.

It didn’t used to be a problem because we did not used to allow a transaction to be started on a site with a timestamp in which the site is down. That means, if a transaction is started on a site, we know the site was up when the transaction was started. Now, in order to accommodate test cases where all site fail and go back up, we needed to allow “back-dated” transaction start-up (i.e., allow transactions to be started on sites with a timestamp that is earlier than the current time). This will cause problems for read only transactions.

The solution is to keep a whole history of serviceability of each variable-site-timestamp tuple in the central TM, rather than just the variable-site tuple.

We’re not doing that because we’re out of time, and it will be a big fix.
b) As mentioned before, sites that are down does not respond to dump() requests.

c) As mentioned before, begin(), beginRO(), fail(), recover() will not return a message.

d) As mentioned before, operations that blocked will not return a message immediately. It will only return the result when the operation completes.