Normalisation

KEY RULES FOR NORMALISED DATA WHEN SUPPLIED WITH AN ERD

  1. All attributes must be atomic
  2. All attributes must be in only one entity, the only exception to this is a primary key which can be a foreign key in a link entity
  3. All attributes must depend wholly on the primary key (or keys) for the entity

Why Do We Use This Process?

  1. Minimises the duplication of data.
  2. The normalised data enables the data model to be mapped onto a wide variety of different database designs.
  3. The final tables in 3rd normal form (there are three stages to normalisation) provide the flexibility to enable data to be extracted efficiently.

Normalisation can be seen as a bottom-up process. The result of normalisation is a database in which all possible entities have been identified and separated, each being represented by its own table.

All attributes should be atomic, i.e. all fields in each record must contain only a single data value, not a group of values.

Normalisation process

There are 3 stages:

Produce a first normal form (1NF) of the model.

Refining the 1NF into a second normal form (2NF)

Further refining the 2NF into a third formal form (3NF)

To illustrate the normalisation process we will use a student table:

Student Number / Surname / Forename / Number / Sex / Tutor / Tutor
Code / Unit / Assess-or Code / Assessor name / Date
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 1(4) / SMB / Bryant,S / 7/10/98
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 2(3) / JCB / Bentley,J / 3/12/98
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 3(2) / MVD / Doash,M / 23/2/99
2 / Boateng / Ben / 823063 / M / Bryant,S / SMB / 1(4) / SMB / Bryant,J / 25/10/98
2 / Boateng / Ben / 823063 / M / Bryant,S / SMB / 3(2) / MVD / Doash,M / 6/3/99

First normal form – eliminate repeating groups of attributes

A table is in 1NF if it has no repeating ATTRIBUTES or groups of attributes and all data items are atomic.

The problem with the student table is that as each student progresses through the course they will need to be credited with each unit they achieve. This means that for each student, a group of fields (Unit, assessor code and assessor name will need to be repeated as many times as the number of units achieved. Therefore we have a repeating group of attributes.

To remove the repeating groups it is clear that two entities exist, STUDENT and UNIT.

The fields can be separated into two tables:

STUDENT: (Student Number, Surname, Forename, Sex, Tutor, Tutor code, unit id, date achieved)

Student Number / Surname / Forename / Number / Sex / Tutor / Tutor
Code / Unit / Date
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 1(4) / 7/10/98
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 2(3) / 3/12/98
1 / Rajabali / Batul / 812865 / F / Doash,M / MVD / 3(2) / 23/2/99
2 / Boateng / Ben / 823063 / M / Bryant,S / SMB / 1(4) / 25/10/98
2 / Boateng / Ben / 823063 / M / Bryant,S / SMB / 3(2) / 6/3/99

UNIT: (Unit id, Assessor code, Assessor name)

Unit / Assessor Code / Assessor name
1(4) / SMB / Bryant,S
2(3) / JCB / Bentley,J
3(2) / MVD / Doash,M
1(4) / SMB / Bryant,J
3(2) / MVD / Doash,M

Where Student Number is the primary key for the STUDENT table and Unit id is the primary key for the UNIT table.

Second Normal Form – reducing duplicate data

Although the two entities are now separated, the data model still has problems with duplicate or redundant data.

A table is said to be in 2NF when it is in first normal form and no field that is not part of a composite primary key is dependant on only a portion of the primary key.

In other words all the fields in a table other than the primary key fields must be entirely to do with the primary key.

In the student table all the fields are to do with a student (Name, sex, tutor etc) except the unit id field, this it to do with the units not with the students. Therefore the unit id is not dependant on the primary key (the student number).

If the database were to be implemented in this form then the duplicated data would result in slower updating of data.

To put the ORIGINAL STUDENT table into 2NF, list all the primary keys and their combinations

Student Number

Unit ID

Student Number, Unit ID

Next place each field with its appropriate primary key:

Student Number, Surname, Forename, Sex, Tutor, Tutor Code

Unit ID, Assessor code, Assessor name

Student Number, Unit ID, date achieved

Now give each table an appropriate name:

Student (Student Number, Surname, Forename, Sex, Tutor, Tutor Code)

Unit (Unit Id, Assessor code, Assessor name)

Student Achievement (Student Number, Unit ID, Date Achieved)

Third normal form

A table is in third normal form (3NF) when it is in 2NF and every field, which is not part of the primary key is wholly dependant on that key.

A tutor and an assessor are simply members of staff, therefore in this example; the 3NF involves creating a separate table for tutors:

Student (Student Number, Surname, Forename, Sex, tutor code)

Unit (Unit Id, Assessor code)

Student Achievement (Student Number, Unit ID, Date Achieved)

Staff (Staff code, Staff name)

Stages listed in a table

List all attributes / 1st normal form
Remove repeating groups / 2nd normal form
Reduce duplicate data / 3rd normal form
Check all fields in entities depend wholly on the primary key
Student number
Forename
Surname
Sex
Tutor
Tutor code
Unit
Assessor code
Assessor name
Date / Student
Student number
Forename
Surname
Sex
Tutor
Tutor code
Unit
Date
Unit
Unit id
Assessor code
Assessor name / List primary keys and combinations
1.Student Number
2. Unit ID
3. Student Number, Unit ID
Each becomes an entity
Student
Student number
Forename
Surname
Sex
Tutor
Tutor code
Unit
Unit id
Assessor code
Assessor name
Student Achievement
Student number
Unit id
Date achieved / Student
Student number
Forename
Surname
Sex
Tutor code
Unit
Date
Unit
Unit id
Assessor code
Student Achievement
Student number
Unit id
Date achieved
Staff
Staff code
Staff name

Activity 8

In groups normalise to 3NF the data for the rapid results driving school.

List all attributes / 1st normal form
Remove repeating groups / 2nd normal form
Reduce duplicate data / 3rd normal form
Check all fields in entities depend wholly on the primary key
Pupil id
Pupil first name
Pupil Surname
Pupil address
Pupil Telephone number
Lesson id
Lesson date
Lesson time
Lesson duration
Lesson cost
Instructor id
Instructor first name
Instructor surname
Instructor mobile no / List primary keys and combinations
1.
2.
3.
Each becomes an entity