Database Design and Normalization

CSC 312Database Systems12/15/2018

Database Design and Normalization

The goal of relational database design is to generate a set of relation schemas that allow us to

-store information without redundancy

using tables each concentrate on a coherent subject

-retrieve information easily

lossless
dependence-reserve

In certain normal forms

Codd’s Three Normal Forms

1NF: A relation is in 1NF if and only if, in every legal value of that relation, every tuple contains exactly one value for each attribute.

2NF: A relation is in 2NF if and only if it is in 1NF and every non-key attribute is irreducibly dependent on the primary key.

3NF: A relation is in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key.

[Notes: Definitions of 2NF & 3NF assuming only one candidate key which is used as the primary key.]

Codd’s 3NF definition does not treat the following general case satisfactorily: the case that

Had two or more candidate keys, such that
The candidate keys were composite, and
They overlapped (i.e., had at least one attribute in common).

It was therefore replaced by a stronger definition, due to Boyce and Codd, which catered for this case.

BCNF: A relation is in BCNF if and only if every nontrivial, left-irreducible functional dependency has a candidate key as its determinant.

[The terms functional dependency, determinant, nontrivial, and left-irreducible will be discussed thereafter.]

1NF

Neither of the following two is in 1NF.

Customer_Name / Customer_Street / Customer_City / Account_Number
Hayes / Main / Harrison / A-102
Johnson / Alma / Palo Alto / A-101 / A-201
Jones / Main / Harrison / A-217
Lindsay / Park / Pittsfield / A-222
Smith / North / Rye / A-215
Turner / Putnam / Stamford / A-305

This is not even a relation. (One “tuple” has one more attribute than the others.)

Customer_Name / Customer_Street / Customer_City / Account_Number
Hayes / Main / Harrison / A-102
Johnson / Alma / Palo Alt / A-101, A-201
Jones / Main / Harrison / A-217
Lindsay / Park / Pittsfield / A-222
Smith / North / Rye / A-215
Turner / Putnam / Stamford / A-305

The value for the Account_Number attribute is not atomic.

Whether a set is atomic or not may vary with different perspectives:

-integer is atomic, if deemed as a number as a whole

-if deemed as a sequence of numbers (i.e. digits), then it is not atomic

Likewise, A-101, CSC312, even 1001 (if deemed as 1000 + 1), may be atomic or not.

Pitfalls of RDB Design

-Repetition of information

Update anomalies

-Inability to represent certain information

Insertion anomalies
Deletion anomalies

The problem is that the relation is not coherent and hence the nonkey attributes are not irreducibly and non-transitively dependent on the primary key.

Functional Dependency

The notion of FD generalizes the notion of superkey. If X and y are both arbitrary subset of relation r(R), we say that X functionally determines Y, or X  Y, if and only if X value in r has associated with it precisely one Y value in r. In relational algebraic symbols, if for any two tuples t1 and t2 in r such that t1[X] = t2[X], it is also the case that t1[Y] = t2[Y].

The determinant and the dependent: if X  Y holds on R, one may refer to X and Y as the determinant and the dependent, respectively.

Terms

-r satisfies F.

-F holds on R.

Trivial FD: X  Y is trivial if Y is a subset of X.

Closure of a Set of Dependencies

Assuming A, B, C, and D are sets of attributes of r(R)

Armstrong’s Axioms:

reflexivity: B is a subset of A then A  B (trivial)
augmentation: A  B then ACBC
transitivity: A  B and B  C then A  C

These three rules are sound (generates only correct FD’s) and complete (given F can derive F+)

Extended rules

self-determination: A  A
decomposition: A  BC then A  B and A  C
union: A  B and A  C then A  BC
composition: A  B and C  D then AC  BD
pseudotransitivity: A  B and CB  D then AC  D

Example

Assume we are given with a relation r(A, B, C, D, E, F) and FD’s

A  BC

B  E

CD  EF

Show that the AD  F holds for R and thus a member of the closure of the given set.

Closure of a Set of FD’s

Given a set F set of functional dependencies, there are certain other functional dependencies that are logically implied by F.

The set of all functional dependencies logically implied by F is the closure of F.

We denote the closure of F by F+.

See figure 7.6 on page 266 [Silberschatz] for the algorithm.

The computation of F+is not efficient, and there is little use to compute it. We will computer the following closure of a set  of attributes in R under F.

Closure of Attribute Sets

We denote the closure of  under F as  +. With +, we can determine whether an FD X  Y is in F+ by checking if Y is in Y+.

See figure 7.7 on page 267 [Silberschatz] for the algorithm.

Test to see if a subset K of R is a superkey: K is a superkey if and only if K+ contains every attribute in R.

Compute  + is an alternative way to computing F+, but again computing F+ is not that important.

Canonical Cover (Irreducible Sets of Dependencies)

An extraneous attributeA may appear on either left- or right-side of an FD X  Y

-Left: F logically implies (F – {X  Y}) U {(X – A)  Y}

-Right: F logically implies (F – {X  Y}) U {X  (Y– A)}

Example

Fc and F logically imply each other, and

-no FD in Fc contains extraneous attributes

-Each left-side of a FD in Fc in unique

Decomposition

Desirable properties

-Lossless

Decompose R into R1 and R2 such that R1 ^ R2 is a key in either R1 or R2

-Dependency preserving

With F’ = F1 U F2 U ... U Fn, F’+ = F+

-No repetition of information