LEVELS OF DISTRIBUTION TRANSPERENCY
Structure
4.0 Objectives
4.1 Introduction
4.2 Reference Architecture for distributed databases
4.3 Types of data fragmentation
4.3.1 Horizontal Fragmentation
4.3.2 Derived Horizontal Fragmentation
4.3.3 Vertical Fragmentation
4.3.4 Mixed Fragmentation
4.4 Integrity Constraints in Distributed Databases
4.5 Summary
4.0 Objectives: In this unit we will learn the following topics:
- Reference Architecture for Distributed Databases
- Types of Data Fragmentation
- Integrity constraints in Distributed databases
4.1 Introduction:
In this unit we suggest reference architecture for the distributed database. This architecture allows us to determine the different levels of transparency, which are conceptually relevant to understand distributed databases. Also the mapping between the different levels is defined. Here we have used the relational model and relational algebra for this purpose. The distributed access primitives are represented using the SQL statements, as it is user friendly. The emphasis is given to the fact that how the SQL primitives reference the objects which constitute the database. The different types of fragmentation methods are discussed. Also the integrity constraints in the distributed transaction are explained.
4.2 Reference Architecture for Distributed Databases:
Here we have suggested reference architecture for the distributed databases as shown in the fig.4.1.The different levels are conceptually helpful to understand the functioning of the whole system. The various stages of the architecture are as follows.
- Global Schema: defines all the data, which are contained in the distributed database as if it is a centralized system. Here a set of global relations is used.
- Fragmentation Schema: Each global relation is split into several non-overlapping portions that are called as Fragments. The mapping between the global relations and fragments is defined in the Fragmentation Schema. It is a one to many relation such that several fragments correspond to one global relation but only one global relation corresponds to one fragment. They are indicated as Ri, the ith fragment of the global relation R.
- Allocation Schema: The fragments are really the logical portions of the global relation, which are physically dispersed at different sites of the network. This schema defines at which site(s) a fragment is allocated. It is to be noted that depending upon the requirement more than one fragment may be allocated at a site. So this mapping determines whether the system is a Redundant system or a Non redundant system.
- Local Mapping Schema: We have already described the relationships between the objects at the three top levels of this architecture. These three levels are site independent; therefore, they do not depend on the data model of the local DBMSs. At a lower level, it is necessary to map the physical images to the objects that are manipulated by the local DBMSs. This mapping is called a local mapping schema and depends on the type of local DBMS; therefore in a heterogeneous system we have different types of local mappings at different sites.
This architecture provides a very general conceptual framework for understanding distributed databases. The three most important objectives that motivate the features of this architecture are the separation of data fragmentation and allocation, the control of redundancy, and the independence from local DBMSs.
- Separating the concept of data fragmentation from the concept of data allocation:This separation allows us to distinguish two different levels of distribution transparency, namely fragmentation transparency and location transparency. Fragmentation transparency is the highest degree of transparency and consists of the fact that the user or application programmer works on global relations. Location transparency is a lower degree of transparency and requires the user or application programmer to work on fragments instead of global relations; however, he or she does not know where the fragments are located.
- Explicit control of redundancy:The reference architecture provides explicit control of redundancy at the fragment level.
- Independence from local DBMSs: This feature, called local mapping transparency, allows us to study several problems of distributed database management without having to take into account the specific data models of local DBMSs. Another type of transparency, which is strictly related to location transparency, is replication transparency. Replication transparency means that the user is unaware of the replication of fragments.
4.3 Types Of Data Fragmentation:
Two different types fragmentation: Horizontal and Vertical fragmentation can decompose the global relations into fragments. We will first consider these two types of fragmentation separately and then consider the more complex fragmentation, which can be obtained by applying a composition of both.
In all types of fragmentation, a fragment can be defined by an expression in a relational language (we will use relational algebra), which takes global relation as operands and produces the fragment as result. For example, if a global relation contains data about employees, a fragment which contains only data about employees who work at department D1 can be obviously defined by a selection operation on the global relation.
Some rules, which must be followed when defining fragments:
- Completeness condition: All the data of the global relation must be mapped into the fragments; i.e., it must not happen that a data item that belongs to a global relation does not belong to any fragment.
- Reconstruction condition: It must always be possible to reconstruct each global relation from its fragments. The necessity of this condition is obvious in fact, only fragments are stored in the distributed database, and global relation have to be built through this reconstruction operation if necessary.
- Disjoint condition: It is convenient that fragments be disjoint, so that the replication of data can be controlled explicitly at the allocation level.
4.3.1 Horizontal Fragmentation: Horizontal fragmentation consists of partitioning the tuples of a global relation into subsets; this is clearly useful in distributed databases, where each subset can contain data that have common geographical properties. It can be defined by expressing each fragment as a selection operation on the global relation.
Example: let a global relation be
SUPPLIER (SNUM, NAME, CITY)
Then the horizontal fragmentation can be defined in the following way:
SUPPLIER1 = SLCITY =”Mysore” SUPPLIER
SUPPLIER2= SLCITY =”Shimoga” SUPPLIER
Now let us verify whether this fragmentation fulfills the conditions stated earlier.
The completeness condition: If “Mysore” and “Shimoga” are the only possible values of the CITY attribute, then it satisfies this condition.
The reconstruction condition: can be verified easily, because it is always possible to reconstruct the SUPPLIER global relation through the following operation.
SUPPLIER = SUPPLIER1 UN SUPPLIER2
The disjoint ness condition is clearly verified.
Qualification: The predicate, which is used in the selection operation and defines a fragment, is called asQualification. For instance, in the above example the qualifications
q1 : CITY = “Mysore”
q2 : CITY = “Shimoga”
We can generalize from the above example that in order to satisfy the completeness condition, the set of qualifications of all fragments must be complete, at least with respect to the set of allowed values. The reconstruction condition is always satisfied through the union operation, and the disjoint ness condition requires that qualifications be mutually exclusive.
4.3.2 Derived Horizontal Fragmentation: This is a type of fragmentation, which is derived from the horizontal fragmentation of another relation.
Example: Consider a global relation
SUPPLY (SNUM, PNUM, DEPTNUM, QUAN)
where SNUM is a supplier number. If it is required that a fragment has to contain the tuples for suppliers, which are in a given city, and then we have to go for derived fragmentation. A semi-join operation with the fragments SUPLIER1 and SUPLIER2 is needed in order to determine the tuples of SUPPLY, which correspond to the suppliers in a given city. The derived fragmentation of SUPPLY can be therefore defined as follows:
SUPPLY1 = SUPPLY SJSNUM=SNUMSUPPLIER1
SUPPLY2= SUPPLY SJSNUM=SNUMSUPPLIER2
The reconstruction of the global relation SUPPLY can be performed through the union operation as was shown for SUPPLIER.
The completeness of the above fragmentation requires that there be no supplier numbers in the SUPPLY relation, which are not contained also in the SUPPLIER relation. This is a typical, and reasonable, integrity constraint for this database and usually is called as the referential integrity constraint.
The disjoint ness condition is satisfied if a tuple of the SUPPLY relation does not correspond to two tuples of the SUPPLIER relation that belong to two different fragments. In this case this condition is easily verified, because the supplier numbers are unique keys of the SUPPLIER relation.
4.3.3 Vertical Fragmentation:
The Vertical fragmentation of a global relation is the subdivision of its attributes into groups; fragments are obtained by projecting the global relation over each group. This can be useful in distributed databases where each group of attributes can contain data that have common geographical properties. The fragmentation is correct; if each attribute is mapped into at least one attribute of the fragments; moreover, it must be possible to reconstruct the original relation by joining the fragments together.
Example: Consider a global relation
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
A vertical fragmentation of this relation can be defines as
EMP1=PJ EMPNUM,NAME, MGRNUM, DEPTNUM EMP
EMP2=PJ EMPNUM, SAL,TAX EMP
The reconstruction of relation EMP can be obtained as
EMP=EMP1JN EMPNUM=EMPNUM EMP2
This is because; EMPNUM is a key of EMP.
Let us draw some important points to be noted from this example.
- The purpose of including the key of the global relation into each fragment is to ensure the reconstruction property.
- An alterative way to provide the reconstruction property is to generate tuple identifiers that are used as system-controlled keys. This can be convenient in order to avoid the replication of large keys; moreover, users cannot modify tuple identifiers.
Let us finally consider the problem of fragment disjoint ness. First, we have seen that at least the key should be replicated in all fragments in order to allow reconstruction. In fact, if we include the same attribute in two different vertical fragments, we know exactly that the column that corresponds to this attribute.
For example, consider the following vertical fragmentation of relation EMP:
EMP1 = PJEMPNUM,NAME,MGRNUM,DEPTNUMEMP
EMP2 = PJEMPNUM,NAME,SAL,TAXEMP
The attribute NAME is replicated in both fragments. We can remove this attribute when we reconstruct relation EMP through an additional projection operation.
EMP = EMP1 JNEMPNUM=EMPNUMPJEMPNUM, SAL, TAX EMP2
4.3.4 Mixed Fragmentation: The fragments that are obtained by the above fragmentation operations are relations themselves, so that it is possible to apply the fragmentation operations recursively, provided that the correctness conditions are satisfied each time. The reconstruction can be obtained by applying the reconstruction rules in reverse order.
Example: Consider the same global relation
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
The following is a mixed fragmentation, which is obtained by applying the vertical fragmentation of the previous example, followed by a horizontal fragmentation on DEPTNUM:
EMP1 = SLDEPTNUM10 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP2 = SL10<DEPTNUM20 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP3 = SLDEPTNUM>20 PJEMPNUM, NAME, MGRNUM, DEPTNUM EMP
EMP4 = PJEMPNUM, NAME, SAL , TAX EMP
Fig.4.3 The fragmentation tree of relation EMP
The reconstruction of relation EMP is defined by the following expression:
EMP = UN (EMP1, EMP2, EMP3) JNEMPNUM = EMPNUM
PJEMPNUM, SAL, TAX EMP4
A fragmentation tree can conveniently represent mixed fragmentation (as shown in the above figure). In a fragmentation tree, the root corresponds to a global relation, the leaves corresponds to the leaves correspond to the fragments, and the intermediate nodes correspond to the intermediate results of the fragment-defining expressions.
The EXAMPLE_DDB:
The following codes shows the global and fragmentation schemata of EXAMPLE_DDB. Most of the global relations of EXAMPLE_DDB and their fragmentation have been already introduced. A DEPT relation, horizontally fragmented into three fragments on the value of the DEPTNUM attribute, is added.
Global schema
EMP (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
DEPT (DEPTNUM, NAME, AREA, MGRNUM)
SUPPLIER (SNUM, PNUM, DEPTNUM, QNUM)
Fragmentation schema
EMP1 = SLDEPTNUM10 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP2 = SL10<DEPTNUM20 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP3 = SLDEPTNUM >20 PJEMPNUM, NAME, MGRNUM, DEPTNUM(EMP)
EMP4 = PJEMPNUM, NAME, SAL, TAX(EMP)
DEPT1= SLDEPTNUM10(DEPT)
DEPT2= SL10<DEPTNUM20(DEPT)
DEPT3= SLDEPTNUM>20(DEPT)
SUPPLIER1 = SLCITY = “SF” (SUPPLIER)
SUPPLIER2 = SLCITY = “LA” (SUPPLIER)
SUPPLY1 = SUPPLY SJSNUM=SNUMSUPPLIER1
SUPPLY2 = SUPPLY SJSNUM=SNUMSUPPLIER2
4.4 Integrity Constraints In Distributed Databases:
When an update performed by a database application violates an integrity constraint, the application is rejected and thus the correctness of data is preserved. A typical example of integrity constraint is referential integrity, which requires that all values of a given attribute of a relation exist also in some other relation. This constraint is particularly useful in distributed databases, for ensuring the correctness of derived fragmentation. For example, since the SUPPLY relation has a fragmentation which is derived from that of SUPPLIER relation by means of a semi-join on the SUPNUM attribute, it is required that all values of SUPNUM in SUPPLY be present also in SUPPLIER.
Integrity constraints can be enforced automatically by adding to application programs some code for testing whether the constraint is violated. If so, the program execution is suspended and all actions already performed by it are cancelled, if necessary.
One of the most serious disadvantages of integrity constraints is the loss in performance that is due to the execution of the integrity tests; this loss is very important in distributed databases. The major problems in applying integrity checking might increase the need of accessing remote sites. It is necessary to consider also integrity checking in the design of the distribution of database.
4.5Summary:
In this unit we have studied reference architecture for distributed database. Also the different types of fragmentation techniques are discussed. We have also seen some demonstration examples. Some ideas about integrity constraints are given.