Caching Management of Mobile DBMS

Jenq-Foung Yao Margaret H. Dunham

Department of Mathematics and Computer ScienceDepartment of Computer Science and Engineering

Georgia College & State UniversitySouthern Methodist University

Milledgeville, GA 31061Dallas, TX 75275

Email: mail:

Phone: (912) 445-1626Phone: (214) 768-3087

Fax: (912) 445-2602Fax: (214) 768-3085

Abstract

Unlike a traditional client-server network, a mobile computing environment has a very limited bandwidth in a wireless link. Thus, one design goal of caching management in a mobile computing environment is to reduce the use of wireless links. This is the primary objective for this research. Quota data and private data mechanisms are used in our design so that an MU user is able to query and update data from the local DBMS[1] without cache coherence problems. The effect of the two mechanisms is to increase the hit ratio. An agent on an MU along with a program on a base station are used to handle the caching management, including prefetching/hoarding, cache use, cache replacement, and cache-miss handling. The simulation results clearly indicate that our approaches are improvements to the previous research.

Keywords:Caching, Mobile Computing, Mobile DBMS, Mobile Unit (MU), Database, Agent, User Profile, Validation Report (VR)

INTRODUCTION

For the past ten years, personal computer technology has been progressing at an astonishing rate. The size of a PC is becoming smaller, and the capacity of software and hardware functionality is increasing. Simultaneously, the technologies of cellular communications, satellite services and wireless LAN are rapidly expanding. These state-of-art technologies of PC and wireless have brought about a new breed of technology called mobile computing (MC). Several mobile computing examples have been discussed in [7] and [1].

Most people acknowledge that the mobile environment is an expansion of distributed systems. Mobile units (MUs) and the interfacing devices (base stations that may interact with MUs) are added to the existing distributed systems (see Figure 1). This is a client/server network setting. The servers would be on some fixed hosts or base stations, and the clients could be fixed hosts or mobile units. The mobile units are frequently disconnected for some periods because of the expansive wireless connection, bandwidth competition, and limited battery power. To allow users to access resources at all times no matter which mode they are in, many research issues need to be dealt with. The data caching/replication on an MU is one of the important methods that can help to resolve this problem.

PREVIOUS WORKS

In the present research, caching management is handled in two different levels - the file system level and the DBMS level. Issues on the file system level have been addressed widely [11] [22] [12] [21] [17]. Some of the approaches on the file system level have developed real systems that have been used daily, such as Coda [22]. These research efforts on the file system level have some shortcomings. The major one is that all of them explicitly exclude a DBMS. In addition, they use the optimistic replication control. This kind of control allows WRITE operation on different partitions (locations). Committing data in a timely fashion is not important in these systems. Data are allowed to have several different versions on the different partitions, and later will be integrated (and committed). In the academic environment of these approaches, users rarely write to the same file at the same time.

Most of the previous works in mobile DBMS [1] [2] [13] [3] [25] concentrated on the time window “w” and the size of Invalidation Report (IR). These researches impacted the wireless link usage to a certain degree. However, these previous approaches assumed read-only operation on the local cache, just one of the possibilities in which to use cache data on an MU. These previous researches also uplinked to the fixed network for the cache-miss data, which also is just one of the cache-miss handling possibilities.

Only a few researchers at the DBMS level have dealt with the update issue on MUs. Chan, Si, and Leong proposed a mobile caching mechanism based on an object-oriented paradigm [5]. Conceptually their approach is based on an idea that is similar to ours. That is, to cache the frequently accessed database items in MUs to improve performance of database queries and availability of database data items for query processing during disconnection. This is a concept called hot spot [2]. This concept states that frequently accessed data are likely to be accessed again in the future.

The other research, which dealt with WRITE operations, is in [24]. In this research, virtual resources are pre-allocated on an MU so that the MU has its own potential share of data. The research is based on a trucking distribution system where each truck is pre-assigned an amount of load. When a truck has actually loaded goods, it then reports the actual load to the database server. Only the aggregate quantity data have been dealt with. The approach is very similar to research proposed by O'Neil [19]. O'Neil proposed an escrow transactional method that pre-allocates some fixed portion of an aggregate quantity data to a transaction prior to its commit. When the time comes to commit the transaction, there ought to be enough value of this data item available due to the pre-allocation. The whole mechanism of the approach in [19] takes place in a centralized DBMS.

3. OBJECTIVES FOR THE RESEARCH

The objectives of this research are to provide some solutions for unaddressed issues in the DBMS area. These issues include how to handle WRITE operations on the local cache, different techniques to handle cache-misses, how to deal with cache coherence, etc. We will fully address the issues in the next two sections. Performance evaluations based on simulation results are then discussed in section six.

4. MODEL OF CACHING MANAGEMENT

A client-agent-server architecture is used in our model. The rationale is that we would like to build a mobile DBMS on top of existing DBMSs. The agent is an interface between a database server and a mobile client. All the additional functionality to the existing DBMSs is built in the agent. That is, among other things the agent handles all cache prefetch, cache update, cache-miss handling, and cache coherence. The agent includes an MU agent on an MU and a VR[2] handler on a base station. The data prefetching can be done either through a wireful link or through a wireless link. However, the wireful link is always the preference as long as the situations allow. The cache will be stored in the local disk of an MU, and organized in the format of relations. These relations are deemed part of the local RDBMS and can be queried via the local RDBMS.

4.1. Model Assumptions

Assumptions in our research are as follows:

Only the issues at the DBMS level will be dealt with in this research.
The environment is a typical client-server network. The server is on a fixed network. The client could be a mobile unit or a fixed host; we mainly deal with issues on a mobile unit.
A user is able to use the MU for an extended time frame. Therefore, data can be cached there for long periods.
An MU has an independent local DBMS. This local DBMS and the database server on the fixed network all support the relational models. SQL on the MU can query both the private RDBMS and the RDBMS on the database server. An MU agent serves as a query interface among them.
All the MUs are portable laptop computers. We assume they are as powerful as their desktop counterparts.
The data prefetching can be done either through a wireful link or through a wireless link. However, the wireful link is always the preference as long as the situations allow. In addition, a mass prefetching is preferable in the case of low network traffic, such as overnight.
The cache will be stored in the local disk of an MU, and organized in the format of relations. These relations are deemed part of the local RDBMS on the MU and can be queried via the local RDBMS.
We assume that downlink and uplink channels have the same bandwidth capacity for the evaluation purpose.
The granularity of the fetching data is a portion of relation.

4.2. Caching Granularity

The caching granularity is one of the key factors in caching management systems. Most of the mobile systems at the Operating system level use a file or a group of files (cluster or replica) as the caching granularity. Using a file as the granularity is not an appropriate choice. On the DBMS level, the caching granularities are usually an attribute [5], an object [5], a page [4], a tuple [9], or a semantic region [6]. Attribute caching and tuple caching create undesirable overheads due to the large number of independent cache attributes/tuples. On the other hand, object caching, page caching and semantic caching reduce these overhead problems by collecting data in a group of tuples. The static grouping in page caching and object caching is lacking of flexibility compared to the attribute/tuple caching. Semantic caching provides the flexibility, permitting the dynamic grouping to the requests of the present queries. However, how a semantic caching can be implemented in detail is still unclear, such as who is in charge of the cache update [6].

We propose a portion of a relation as the caching granularity. This portion of the relation contains a group of tuples from the original relation. These tuples are extracted from the original relation with query operators SELECTION() and PROJECTION. We also preserve the primary key’s attributes in a cache relation. Therefore, our approach is not exactly like that of the updateable snapshot. We call our approach “Morsel Caching” because we cache a portion of a base relation as a cache relation. We then call this caching granularity “Cache relation”. The cache relation is defined in Definition 2. One may view a cache relation as a special case of updateable snapshot.

Definition 1Let a database D = {Ri} be a set of base relations. For every base relation Ri, let Ai stand for the set of its attributes. Let Aik be the set of attributes that are included in the primary key of Ri. A user morsel, UM, is a tuple <UA, UP, UC>, where UC = UAUP (Rj) where Rj is one of the relations in D; UA Aj and Ajk UA; Up = P1 P2 P3 …  Pn where Pj is a conjunctive of simple predicates, i.e. Pj = bj1 bj2 bj3 …  bjl, each bjt is a simple predicate.

Definition 2A Cache Relation, CR, is the smallest relation which contains a set of Uc’s from Definition 1 and all associated with the same base relation.

The motivation why we do not have a join included to form a cache relation is to make the cache-miss handling and the update synchronizing issues much easier.

Note that we use a cache relation as the caching granularity in prefetching and in cache replacement. The cache update granularity, however, is only a subset of attributes of a tuple owing to the fact that the update takes place via a wireless link whose bandwidth is limited. This is different from other approaches, which use the same granularity for prefetching, cache replacement and cache update. Thus, our approaches are much more flexible, in that the granularity is not fixed but dynamic for different occasions.

4.3. User Profile

There are several previous approaches that use a mechanism called “hoarding profile”. The hoarding profile lets users choose their preference data to cache. This is the most direct and effective way to cache data that users need. The drawback is that it needs human involvement and people may not know what they will really use. Hence, only the sophisticated users are able to provide the most effective cache data. A classic example is Coda [22], which uses the “hoard profile” command scripts to update the “hoard database”. Data are fetched to the cache based on the hoard database. Another example is Thor [10], which uses an object-oriented query language to describe the hoarding profile. Its applications are similar to Coda’s.

Algorithm Extract-Cache-Relation:

Extract-Cache-Relation(user-profile){

/* This algorithm extracts data from the base relation and inserts it into the cache relation */

For each entry of the user profile {
if (cache_flag = 0) then { /* the corresponding cache relation is not exist */
{ create a cache relation; }
set cache_flag = 1; /* mark the cache relation as cached */
For (each attribute j that is not in the cache relation) do
{ add a column for the attribute into the cache relation; }
extract data from the base relation and insert the data into the cache relation;
}
}

We adapt this concept as part of our caching mechanism. In our model, data are fetched to the cache based on the data access frequency and a user hint. A user hint contains the projected needs of a user as specified in a user profile, and the relations that are presently used. The user profile is created prior to the very first prefetching. We let a user create his user profile by running a simple program which prompts the user to enter the information. This information includes the name of a cache relation, the name of the related base relation, the attributes of the primary key, the attributes of the cache relation, and a set of criteria that will be used to create the cache relation. The program then organizes the entered information in the format shown in Figure 2. A user profile is an input to another program (see Algorithm Extract-Cache-Relation). The output of the program is a set of cache relations. Hence incorporating a user profile with the program could extract a portion of base relations into cache relations. Alternatively, the DBA could perform these tasks for users. Should a user lack a user profile for whatever reasons, the cache relations would be the base relations on the database server. That is, if a user chooses not to have a user profile, the cache relations would be the same as the base relations. Which base relations will be fetched in the prefetching stage is based on the relation access frequency. Cache relations are fetched onto an MU based on the user profile during the prefetching stage. A cache relation created with the user profile has a different name from the base relation. It is up to the user to name a cache relation in the user profile. If a cache relation is not created with a user profile, then the cache relation would share the same name with the original relation. In this case, a cache relation is like a replication of the base relation.

The program (Extract-Cache-Relation) will be run against the user profile at the very first prefetching. This is a one-time deal. Once the user profile has been created and has been used to extract cache relations, it can also be used to assist in handling cache-misses. When a cache-miss occurs, the MU agent can look up the user profile and trace back to the base relation. If there is no entry for the cache relation, then the MU agent needs to create a new entry for the missing relation based on the cache-miss query. If the cache-miss query involves a join, each relation involved in the join will be an entry that needs to be created. That is, if three relations involve a join, three entries will be created. The new entry is created in a temporary user profile. The MU agent then runs the extract-cache-relation program against the temporary user profile. Once this has been done, the temporary user profile will be appended to the user profile. In the future, if the user decides to cache more relations, the procedure is similar to the case of a cache-miss.

This user profile is kept on the MU that a user is using. From time to time, it will be backed up to the database server. Each entry could be used to create a user morsel (see Definition 1 in Section 4.2). There are six parameters for each entry. The first parameter is the cache flag bit. The ON bit (1) means the cache relation exists, and the OFF bit (0) means the cache relation does not exist. Initially, all the cache flags are set to OFF (0) bit. The second parameter is the name of the base relation from which that data will be extracted. The third parameter is a name for the new cache relation. It is the user’s choice for the name. The fourth parameter is a set of attributes of the primary key from the base relation. The fifth parameter is a set of attributes from the base relation that will be included in the cache relation. The last parameter is a set of criteria that is used to select a user morsel. These criteria are in the same format of the user morsel’s criteria (Up), which are defined in Definition 1 in Section 4.2.

To extract and insert data into the new cache relation with SQL, the MU agent first submits an INSERT operation to the server to insert the data to a temporary base relation with the same name as the cache relation. Note that the data is SELECTed from the corresponding permanent relations and inserted into the temporary after it has been CREATEd. This relation then is moved to the local DBMS as a cache relation.

Sometimes an existing cache relation needs to be modified, such as adding more columns for some new attributes. This case happens when a cache relation needs to be extended, as when another entry of the user profile attempts to create the same cache relation. We only allow one cache relation to be extracted from a base relation. Thus, creating another cache relation is not allowed. The solution to this problem is to add the new attributes in this entry of the user profile to the existing cache relation. To add a column for a new attribute into an existing relation, use the ALTER command in SQL. After adding new attributes into an existing cache relation, use UPDATE command in SQL to insert values for the new attributes of the cache relation.