Performance Evaluation on a Real-Time Database

Suhee Kim1 Sang H. Son2 John A. Stankovic2

1Hoseo University, Korea,

2 University of Virginia, {son, stankovic}@cs.virginia.edu

March 6, 2002

Abstract

We have implemented an object-oriented real-time database system called BeeHive. Using BeeHive, the performance of two data-deadline cognizant scheduling policies, called EDDF and EDF-DC, and a baseline EDF policy all with/without admission control are evaluated through extensive experiments. The ranges where data-deadline cognizant scheduling policies are effective and where admission control plays a significant role are identified. These results represent one of the few sets of experimental results found for an implemented real-time database; almost all previous results are via simulation.

1. Introduction

As more and more applications interact in real-time, the need for real-time databases is growing. For example, airplane tracking applications require timely updating of airplane locations, and stock trading requires timely update of stock prices. These applications also have the need for transaction semantics, hence real-time databases are good choices.

Most real-time database research has evaluated their new ideas via simulation. Also, in much of the past work the only real-time issue addressed is that transactions have deadlines [1,2,4,5,6,7,8,9,12,14,16]. In general, though, transactions that process real-time data must use timely and relatively consistent data in order to have correct results. Data read by a transaction must be valid when the transaction completes, which leads to another constraint on completion time, in addition to a transaction's deadline. This constraint is referred to as data-deadline. In this paper we describe the implementation of an object-oriented real-time database system, called BeeHive, which embodies a more complete model of a real-time database than those systems which only consider transaction deadlines. The BeeHive architecture includes sensor transactions with periodic updates, user transactions with deadlines, sensor data with validity intervals after which the data is not usable, a multiprocessor implementation, and a software architecture with and without admission control. In addition, we evaluate three real-time scheduling policies and admission control on this system rather than via simulation. In this way, actual overheads and OS system costs are accounted for and are shown to affect the results. Our performance results also show that in systems without admission control that the policies that explicitly address data validity intervals perform extremely well. However, when admission control (which accounts for both cpu and I/O) is used, then there is less need for data oriented time cognizant policies. Overall, the ranges where the two data-deadline cognizant scheduling policies are effective and where the admission controller plays a role are identified.

The remainder of this paper organized as follows. Section 2 presents the BeeHive system architecture. Section 3 describes the transaction model that is considered in the study. Section 4 outlines the transaction scheduling policies that have been implemented and evaluated. Section 5 describes the admission control model. Section 6 analyzes the results of the experiments. Section 7 summarizes the conclusions of the paper.

2. BeeHive System Architecture

BeeHive is a real-time database system which consists of the BeeHive database server, a transaction thread pool, and the BeeKeeper resource manager (see Figure 1). The BeeHive database server is extended from a database called the Scalable Heterogeneous Object Repository (SHORE). SHORE is a database system that was developed by the Computer Science Department at the University of Wisconsin. It provides several core database functionalities including ACID transactions, concurrency control, and disk IO management. The main extensions we added include support for real-time transactions, data validity intervals, and admission control.

In BeeHive, client applications send a transaction request to the database through the BeeKeeper system, which in turn communicates with the BeeHive database server. Based on the semantic requirements of the transaction request, the BeeKeeper first determines whether the transaction request is to be admitted or not and then how the admitted transaction is to be scheduled. Once the transaction starts running, database accesses are supported through remote procedure calls (RPC) to the BeeHive database server.

Figure 1 BeeHive Architecture

2.1 BeeHive Database Server

The BeeHive database server consists of the SHORE database server, a listener thread, an IO transaction thread pool, a real-time IO scheduler, and a cleaner task (see Figure 2). The listener thread accepts client RPC connections and allocates a transaction thread from the pool for each connecting client. Each transaction thread in the thread pool processes RPC requests for transaction operation from its associated client, and passes the necessary information to the SHORE server to perform the work. The SHORE server provides functions such as create, read, write, delete, commit and abort. The real-time scheduler module prioritizes the IO transaction threads in the IO transaction pool periodically based on the specified scheduling policies. The cleaner module is in charge of freeing the unused memory resources.

Figure 2 BeeHive Database Server

2.2 BeeKeeper Resource Manager

The BeeKeeper is the resource management process. This process executes on processor 1 of the multiprocessor; the database server executes on processor 2. The following are the main components of the BeeKeeper (see Figure 3).

Service Mapper: The Service Mapper performs the mapping of qualitative resource requests into quantitative requests for physical resources. The service mapper generates a uniform internal representation of the multiple (service-dependent) requests from applications. The Mapper passes the transaction to the Admission Controller.

 Admission Controller: The Admission Controller performs the tests to determine if BeeHive has sufficient resources to support the service requirements of an incoming transaction without compromising the service guarantees made to currently active applications. The Admission Controller admits the incoming transaction based on CPU and IO utilizations of the admitted transactions and system response times of the completed transactions.

 Real-Time Scheduler: The scheduler is responsible for managing the priority of the transactions’ threads in transaction thread pool. The scheduler prioritizes the transaction threads in the transaction pool periodically based on the specified scheduling policies. Note that these threads run in the BeeKeeper process and are distinct from the threads found in the BeeHive database server process.

 Object Manager: The Object Manager provides the interface between the BeeKeeper and the BeeHive database. The Object Manager provides support for retrieving, adding, removing, and updating objects in the BeeHive database. All transaction requests are processed by the Object Manager. The Object Manager uses a remote procedure call (RPC) to connect to the server. A transaction is encapsulated in a BeeHive object.

 Monitor: The Monitor calculates various statistics such as global resource utilization and system real-time performance, and updates the system state table with this information.

3. Transaction Model

BeeHive uses a firm real-time database system in which transactions that have missed their deadlines add no value to the system and are aborted. There are two kinds of transactions: user transactions and sensor transactions. User transactions access both temporal and non-temporal data stored in the database and perform some computations while sensor transactions are responsible for updating temporal objects periodically. All transactions (user and sensor transactions) are predefined. Some attributes of temporal object (X) and transaction (T) that are useful in this study are defined in Table 1.

Figure 3 BeeKeeper Resource Manager

Table 1 Attributes of Temporal Objects and Transactions

Notation / Description / Notation / Description
AVI(X) / Absolute data validity interval of X / EET(T) / Expected execution time of T
AVIb(X) / Beginning of AVI(X) / ERT(T) / Execution response time of T
AVIe(X) / End of AVI(X) / CPUtime(T) / CPU time required to executeT
A(T) / Arrival time of T / IOtime(T) / IO time required to execute T
DL(T ) / Deadline of T / UP(T) / Update period of a sensor transaction T
RDL(T) / Relative deadline of T / LAVI / Length of absolute data validity interval
DDLt(T) / Data-deadline of T at time t / SF / Deadline slack factor of user transactions
PRt(T) / Priority of T at time t

3.1 Sensor Transactions

Sensor transactions are periodic and write-only transactions that update temporal objects. Each temporal object in the database has a corresponding sensor transaction to update it periodically.

Sensor transaction design varies by its deadline assignment principle and scheduling priority.There are two traditional approaches for maintaining data validity, namely one-one andhalf-half principles. The one-one principle was used in the simulation study by Haritsa et. al.[7] and Huang[8]. The one-one principle sets the period and the relative deadline of a sensor transaction equal to the length of the validity interval of its temporal object. Since the gap of execution of two consecutive instances of a transaction can be larger than the validity interval of the corresponding object in the one-one principle, the data canbe invalid. In the half-half principle, the period and the relative deadline of a sensor transaction are set to one-half of the length of the absolute validity interval of the data[21]. The half-half policy is used in our study because under this principle temporal data consistency can be supported if the corresponding sensor transaction meets its deadline.

3.2 User Transactions

User transactions access temporal/non-temporal objects in the database and perform computations. Two sets of user transactions were developed for the experiments. There are 8 kinds of transactions in each set. Note that the database contains airplane and airport objects and the transactions perform tasks such as reading the locations of planes, updating those locations, and finding all planes within an area of the airport. The relative deadline RDL of a user transaction T is set using the following formula: RDL(T) = SF * EET(T).Some important attributes of a transaction include deadlineDL, estimated execution timeEET, CPU time requiredCPUtime, IO time requiredIOtime and arrival time A to the system. Table 2 shows the settings for sensor transactions and user transactions.

Table 2 Sensor Transactions and User Transactions

Sensor Transactions / Update frequency / UP(T ) = * LAVI of object to be updated, periodic
Relative deadline / RDL(T ) = * LAVIof object to be updated.
Deadline / DL(T ) = A(T )+ RDL(T )
User Transactions / Relative deadline / RDL(T )= SF * EET(T )
Deadline / DL(T ) = A(T )+ RDL(T )

4. Real-Time Scheduling Policies

The scheduling algorithm should maximize the number of user transactions that meet their deadlines while maintaining temporal consistency. Sensor transactions are scheduled by earliest deadline first policy, and they have higher priority than user transactions. Three real-time scheduling policies for user transactions are as follows.

 Earliest Deadline First (EDF)

This is the traditional policy where the priority of a transaction is its deadline. This policy is not cognizant of data-deadline in scheduling. Transactions are aborted at the commit time if any data object it read is invalid.

PRt(T) = DL(T)

 Earliest Data-Deadline First (EDDF)

This policy assigns the minimum of the data-deadline and the deadline of a transaction T to be the priority of the transaction.

PRt(T) = min(DL(T), DDLt(T))

This policy is cognizant of data-deadline in scheduling. Whenever DDLt(T) changes, it is reflected in the scheduling, depending on the values of DL(T) and DDLt(T). The data-deadline of a transaction T at time t, DDLt(T), is defined in a pseudo code as follows.

Initially DDLt(T) is undefined.

When the transaction T reads a temporal object X at time t,

if DDLt(T) is undefined,

DDLt(T) = AVIe(X)

else

new DDLt(T) = min(old DDLt(T) , AVIe(X)).

 Earliest Deadline First with Data Validity Check (EDF-DC)

This policy is the same as the EDF policy, except that when the data-deadline of the transaction T is invalid it is aborted. In the EDF policy, the data-deadline of a transaction T is checked at commit time and the transaction is aborted if it is invalid. In the experiments, the priorities of user transactions are assigned using these three policies and the performance of the two data-deadline cognizant policies are compared with that of the baseline EDF policy.

5. Admission Control

When a real-time database system is overloaded, many transactions may miss their deadlines. If these same transactions were not admitted to the system in the first place, the limited resources in the system would not have been wasted on these transactions. The experiments are further extended to use admission control and the ranges where admission control is effective are identified.

5.1 Admission Control Model

We have developed an admission controller for user transactions, whichconsiders both resource utilizations and system response times. All transactions are predefined. For each transaction the expected execution time (EET = CPUtime+IOtime) is pre-analyzed. By calculating CPU and IO utilizations for transactions in the system, the admission controller first determines whether BeeHive can support the service requirement of an incoming user transaction. Once the transaction is admitted into the system, it is placed in the EDF queue to wait for execution. When its turn comes for execution, the deadline of the transaction is compared with the average execution response time of its type. If it has enough time to meet the deadline, it starts execution. If not, it leaves the system and is considered as to have missed its deadline.

 Step 1: Admission Control by CPU and IO Utilizations

CPU utilization and IO utilization of a transaction Tin the system are estimated as follows. If there are n transactions in the system and new transaction Tnewarrives at time t, then

Percentage of CPU utilization = ( + ) * 100

and

Percentage of IO utilization = ( + ) *100

If the percentage of CPU utilization < CPU-threshold and the percentage of IO utilization < IO-threshold, then admit new transaction.If not, reject it.

CPU-threshold and IO-threshold are constants to be chosen for the thresholds of CPU and IO utilizations in each experiment. To be more precise, we need to estimate the remaining CPU time and the remaining IO time of the transactionT at time t. These estimations are expensive and difficult to measure as we found via the implementation. Instead, we simply use the original CPUtime and IOtime of the transactionT and do not try to account for transactions’ remaining requirements. This works and does not require much overhead.

 Step 2: Admission Control by Execution Response Time

System response time of a transaction is the sum of the waiting time and the execution response time of the transaction. If the slack time from its deadline is less than its expected execution time, it is highly likely to miss its deadline after wasting resources in the system. In this study, when its turn comes to execute at time t, the slack time from its deadline is compared with the average execution response time of its type.

Let ERT(T) denote the execution response time of the transaction T. If n transactions of this type have completed their executions at time t, then

the average execution response time is .

If (DL(T) –t) c *(where c is a constant), then start execution. Otherwise, the transaction is aborted. The constant c is chosen by tuning.

5.2 Thresholds of the Admission Controller

To calculate the percentages of CPU utilization and IO utilization in Step 1, Beekeeper needs to keep track of both user transactions and sensor transactions in the system. In preliminary tests, it was observed that the calculations of these utilizations for sensor transactions slow down the system significantly and many sensor transactions miss their deadlines. Consequently many user transactions read stale data and miss their data-deadlines. To improve overall system performance, the calculations of the percentages of CPU utilization and IO utilization for sensor transactions are excluded. A fixed pair of values for CPU-threshold and IO-threshold may not work well for various workloads. These values are tuned for each workload. Given a workload, a few pairs of values for CPU-threshold and IO-threshold are tested, and a good pair of values is chosen for the actual experiments. The constant c in Step 2 was chosen to be 0.9 in the experiments.

6. Performance Evaluation

This section presents the experimental setup and the assumptions used in the experiments. The workload model for transactions (user transactions and sensor transactions) is introduced. Using diverse workloads, several experiments are performed with/without admission control. We evaluate the performances of the experiments and identify the ranges where data-deadline cognizant scheduling policies are effective and where admission control plays a role.

The primary performance metric used in the experiments is miss ratio, i.e., the miss ratio of user transactions that miss their deadlines, which is a traditional metric used to evaluate performance in real-time database systems.