The Normalization Process

The normalization process is a set of steps that enable you to identify the existence of potential problems in the design of a relational database. The process also supplies the methods for correcting these problems. The process involves converting tables into various types of normal forms. A table in a particular normal form possesses a desirable collection of properties. There are several notmal forms, the most common being first normal form (1NF), second normal form (2NF), third normal form (3NF) and fourth normal form (4NF). They form a progression in which a table that is in 1NF is better than a table that is NOT in 1NF, a table that is in 2NF is better than a table that is in 1NF, and so on. The goal of this process is to allow you to take a collection of tables, and produce a new collection of tables that represent the same information, but is free of problems. The crucial concepts that are fundamental to the understanding of the normalization process are functional dependence and keys.

A table that contains a repeating group, or multiple entries for a single record, is called an unnormalized relation. The table indicated below is considered to be an unnormalized relation because there are multiple entries for products being ordered.

SENECA STUDENT SUPPLIESORDER FORM
ORDER:40311
ORDER DATE:March 3, 1999
CUSTOMER #:12CUST NAME:School of CS
PRODUCT#PRODUCT DESCRIPTIONQTY ORDERED
304All-purpose gadget7
633Widgit1
684Super gismo4

To begin the process of normalization, it is necessary to list all the attributes of this “order form” in the form of a relationship.

Relational Notation is as follows:

[ ]: contains the list of attributes for the relation

A,B : attribute or attributes that are the Primary Key

{ }: attribute or group of attributes that have more than one value for a single value of a primary key. (we also can use ( ) to enclose repeating groups).

Thus, we would write the unnormalized relation for the Seneca Student Supplies Order Form as follows:

ORDERFORM [Order No, Order Date, Customer No, Customer Name, {Product No, Product

Description, Qty Ordered} ]

Removing the repeating group/multi-valued dependency leads to the creation of tables that are in first normal form (1NF).

Unnormalized ->1st Normal Form

•Unnormalized relation:ORDERFORM [Order No, Order Date, Customer No, Customer Name, {Product No, Product Description, Qty Ordered} ]

•Restate original unnormalized relation without repeating group:

ORDERFORM [ Order No, Order Date, Customer No, Customer Name ]

•Create new relation consisting of key of original relation and attributes within repeating group and add to key to ensure uniqueness:

ORDERLINE [ Order No, Product No, Product Description, Qty Ordered ]

2nd Normal Form

•A relation is in 2NF when the entire primary key is needed to determine the value of each non-key attribute (i.e. relation has no partial dependencies – attributes whose values can be determined by knowing only part of the key)

1NF (1st Normal Form) -> 2NF (2nd Normal Form)

1NF Relations: ORDER ( ORDER_NUM, ORDER_DATE, CUST_NUM, CUST_NAME );

ORDERLINE ( ORDER_NUM, PRODUCT_NUM, PRODUCT_DESC, NUM_ORDERED )

Create new relation(s) consisting of attributes which are partial dependencies with the primary key of the new relation being the part of the primary key which determines the value of these attributes

2NF: PRODUCT (PRODUCT_NUM, PRODUCT_DESC)

Restate original relations without partially dependent attributes

2NF: ORDER ( ORDER_NUM, ORDER_DATE, CUST_NUM, CUST_NAME ); ORDERLINE ( ORDER_NUM, PRODUCT_NUM, NUM_ORDERED )

3Rd Normal Form

•A relation is in 3NF when the primary key and nothing but the primary key can be used to determine the value of each non-key attribute (i.e. relation has no transitive dependencies – attributes whose values can be determined by knowing something other than the key)

2NF -> 3NF

2NF Relations: PRODUCT (PRODUCT_NUM, PRODUCT_DESC); ORDER ( ORDER_NUM, ORDER_DATE, CUST_NUM, CUST_NAME ); ORDERLINE ( ORDER_NUM, PRODUCT_NUM, NUM_ORDERED )

Create new relations consisting of the attributes which are transitively determined by the primary key and make the primary key of these new relation(s) the attribute that actually determines the value of these attributes

3NF: CUSTOMER (CUST_NUM, CUST_NAME )

Restate original relations without transitively dependent attributes

3NF: PRODUCT (PRODUCT_NUM, PRODUCT_DESC); ORDER ( ORDER_NUM, ORDER_DATE, CUST_NUM); ORDERLINE ( ORDER_NUM, PRODUCT_NUM, NUM_ORDERED )

Original relation will now contain a foreign key – a non-key attribute that relates to the primary key of the new relation

Resulting 3NF Relations for Userview

Set of 3NF Relations for the Order Form Userview:

ORDER ( ORDER_NUM, ORDER_DATE, CUST_NUM)

ORDERLINE ( ORDER_NUM, PRODUCT_NUM, NUM_ORDERED )

CUSTOMER (CUST_NUM, CUST_NAME )

PRODUCT (PRODUCT_NUM, PRODUCT_DESC)

1 unnormalized userview will always result in 1 or more relations in 1NF

Each 1NF relation will result in 1 or more 2NF relations

Each 2NF relation will result in 1 or more 3NF relations

You can never lose (ie not include) an attribute – it must be found in one of the relations at each step!

Normalize Remaining Userviews

•Normalization process is then applied to each remaining userview (eg invoice, inventory screen, …)

•A set of 3NF relations is produced for each userview

•Then 3NF relations from each userview are integrated to form one complete set of relations for the application