CSC 312Database Systems12/15/2018
Database Design and Normalization
The goal of relational database design is to generate a set of relation schemas that allow us to
-store information without redundancy
- using tables each concentrate on a coherent subject
-retrieve information easily
- lossless
- dependence-reserve
In certain normal forms
Codd’s Three Normal Forms
1NF: A relation is in 1NF if and only if, in every legal value of that relation, every tuple contains exactly one value for each attribute.
2NF: A relation is in 2NF if and only if it is in 1NF and every non-key attribute is irreducibly dependent on the primary key.
3NF: A relation is in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.
[Notes: Definitions of 2NF & 3NF assuming only one candidate key which is used as the primary key.]
Codd’s 3NF definition does not treat the following general case satisfactorily: the case that
- Had two or more candidate keys, such that
- The candidate keys were composite, and
- They overlapped (i.e., had at least one attribute in common).
It was therefore replaced by a stronger definition, due to Boyce and Codd, which catered for this case.
BCNF: A relation is in BCNF if and only if every nontrivial, left-irreducible functional dependency has a candidate key as its determinant.
[The terms functional dependency, determinant, nontrivial, and left-irreducible will be discussed thereafter.]
1NF
Neither of the following two is in 1NF.
Customer_Name / Customer_Street / Customer_City / Account_NumberHayes / Main / Harrison / A-102
Johnson / Alma / Palo Alto / A-101 / A-201
Jones / Main / Harrison / A-217
Lindsay / Park / Pittsfield / A-222
Smith / North / Rye / A-215
Turner / Putnam / Stamford / A-305
This is not even a relation. (One “tuple” has one more attribute than the others.)
Customer_Name / Customer_Street / Customer_City / Account_NumberHayes / Main / Harrison / A-102
Johnson / Alma / Palo Alt / A-101, A-201
Jones / Main / Harrison / A-217
Lindsay / Park / Pittsfield / A-222
Smith / North / Rye / A-215
Turner / Putnam / Stamford / A-305
The value for the Account_Number attribute is not atomic.
Whether a set is atomic or not may vary with different perspectives:
-integer is atomic, if deemed as a number as a whole
-if deemed as a sequence of numbers (i.e. digits), then it is not atomic
Likewise, A-101, CSC312, even 1001 (if deemed as 1000 + 1), may be atomic or not.
Pitfalls of RDB Design
-Repetition of information
- Update anomalies
-Inability to represent certain information
- Insertion anomalies
- Deletion anomalies
The problem is that the relation is not coherent and hence the nonkey attributes are not irreducibly and non-transitively dependent on the primary key.
Functional Dependency
The notion of FD generalizes the notion of superkey. If X and y are both arbitrary subset of relation r(R), we say that X functionally determines Y, or X Y, if and only if X value in r has associated with it precisely one Y value in r. In relational algebraic symbols, if for any two tuples t1 and t2 in r such that t1[X] = t2[X], it is also the case that t1[Y] = t2[Y].
The determinant and the dependent: if X Y holds on R, one may refer to X and Y as the determinant and the dependent, respectively.
Terms
-r satisfies F.
-F holds on R.
Trivial FD: X Y is trivial if Y is a subset of X.
Closure of a Set of Dependencies
Assuming A, B, C, and D are sets of attributes of r(R)
Armstrong’s Axioms:
- reflexivity: B is a subset of A then A B (trivial)
- augmentation: A B then ACBC
- transitivity: A B and B C then A C
These three rules are sound (generates only correct FD’s) and complete (given F can derive F+)
Extended rules
- self-determination: A A
- decomposition: A BC then A B and A C
- union: A B and A C then A BC
- composition: A B and C D then AC BD
- pseudotransitivity: A B and CB D then AC D
Example
Assume we are given with a relation r(A, B, C, D, E, F) and FD’s
A BC
B E
CD EF
Show that the AD F holds for R and thus a member of the closure of the given set.
Closure of a Set of FD’s
Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F.
The set of all functional dependencies logically implied by F is the closure of F.
We denote the closure of F by F+.
See figure 7.6 on page 266 [Silberschatz] for the algorithm.
The computation of F+is not efficient, and there is little use to compute it. We will computer the following closure of a set of attributes in R under F.
Closure of Attribute Sets
We denote the closure of under F as +. With +, we can determine whether an FD X Y is in F+ by checking if Y is in Y+.
See figure 7.7 on page 267 [Silberschatz] for the algorithm.
Test to see if a subset K of R is a superkey: K is a superkey if and only if K+ contains every attribute in R.
Compute + is an alternative way to computing F+, but again computing F+ is not that important.
Canonical Cover (Irreducible Sets of Dependencies)
An extraneous attributeA may appear on either left- or right-side of an FD X Y
-Left: F logically implies (F – {X Y}) U {(X – A) Y}
-Right: F logically implies (F – {X Y}) U {X (Y– A)}
Example
Fc and F logically imply each other, and
-no FD in Fc contains extraneous attributes
-Each left-side of a FD in Fc in unique
Decomposition
Desirable properties
-Lossless
- Decompose R into R1 and R2 such that R1 ^ R2 is a key in either R1 or R2
-Dependency preserving
- With F’ = F1 U F2 U ... U Fn, F’+ = F+
-No repetition of information
1