Part 2

Database Concepts

Case Study 1

[1]

Problems with a File System

redundant data

separate maintenance

varying formats/contents/standards

application program tied to data

little overall planning

Problems with handling

address change

new account/credit report

field length change

Database Approach

[2]

Why Database?

Advantages over Paper-based Methods

Compactness

Speed

Less drudgery

Currency

Centralized control

Advantages of Centralized Control

(via a Database Administrator - DBA)

Redundancy can be reduced

Inconsistency can be avoided

The data can be shared

Standards can be enforced

Security restrictions can be applied

Integrity can be maintained

Conflicting requirements can be balanced

[3]

Advantages / Disadvantages
Economy of scale / Size
Getting more information from same amount of data / Complexity
Data sharing / Cost
Balancing conflicting requirements / Requirements in conflict
Enforcing standards / Having to adhere to standards
Controlled redundancy / Higher impact of a failure
Security / Recovery more difficult
Consistency
Integrity
Flexibility
Responsiveness
Increased organizational productivity
Improved program maintenance
Data independence

Data Dependence

Data Dependence:

Application dictates

organization in secondary storage

access methods

Knowledge of that organization and access method

built into application logic

built into executable code

Impossible to change the storage structure or access methods without affecting the application

Why Data Dependence is Undesirable:

Different applications will need different views of the data

The database administrator must have the freedom to change the storage structure or access methods

Stored Field, Record, File

Stored Field:

smallest named unit of stored data

database will contain many occurrences or instances

classified into various types

Stored Record:

named collection of related stored fields

record occurrence consists of a group of related field occurrences

also classified into various types by the variation in fields contained in the record

Stored File:

named collection of all occurrences of one type of record

Data Independence

The Immunity of Applications to Changes in Storage Structure and Access Methods

Must Protect an Application from Variations in:

representation of numeric data (binary, BCD)

representation of character data (ASCII, EBCDIC)

units for numeric data (dollars, cents)

data coding (red, blue, green; 1, 2, 3; 01, 10, 11)

data materialization (for derived data)

structure of stored records (grouping fields into records)

structure of stored files (single volume, volume set, etc.)

Sample Database - Employees

Overview of Content:

The database contains organization, budget, and scheduling information for a software group that is developing an academic information system

Entities:

Employees - who have

a name

a job title

a manager who, in turn, is an employee

a hire date

an hourly billing rate

(possibly) a dollar annual bonus amount

membership in a department, which in turn has a name, location, and budget

a set of assigned tasks on projects

each task by each employee on each project has a time estimate in hours

each project has a name, description, budget, and due date

Sample Database - Departments and Projects

Entities (continued)

Departments - which have

a department number

a department name

a department location (room number)

an annual dollar budget

employees, who in turn have a name, job description, manager, hire date, hourly rate annual bonus, and a set of assigned tasks (as described above)

Projects - which have

a project name

a project description

a project budget

a project due date

a set of tasks, each of which is to be performed by one or more employees (who in turn have a name, job description, manager, hire date, ...) with a time estimate for each employee for each task

Sample Database - Tasks

Entities (continued)

Tasks - each of which have

the name of the employee working on the task (who in turn has name, job description, ...)

the name of the project that the task is related to (which in turn has name, description, ...)

the name of the task being performed

the time estimate (in hours) of how long an employee will work on a particular type of task for a particular project

Sample Data

(stated in relational form)

Employees - (Table name emp)

Ename / Job / Mgr / Hired / Rate / Bonus / DeptNo
allen / programmer / barger / 09-jun-1991 / 30.00 / 402
barger / supervisor / turner / 23-jan-1993 / 65.00 / 550.00 / 402
jones / programmer / radl / 20-feb-1991 / 35.00 / 401
king / clerk / barger / 22-feb-1991 / 18.00 / 402
martin / programmer / barger / 09-nov-1991 / 25.00 / 402
olson / analyst / radl / 28-apr-1991 / 55.00 / 0.00 / 401
pearson / programmer / radl / 01-may-1991 / 30.00 / 401
radl / supervisor / turner / 03-dec-1992 / 65.00 / 600.00 / 401
rogers / programmer / barger / 08-sep-1992 / 25.00 / 402
smith / programmer / barger / 17-dec-1990 / 35.00 / 402
sturm / clerk / radl / 23-sep-1992 / 18.00 / 401
thomas / analyst / barger / 03-dec-1992 / 50.00 / 0.00 / 402
turner / supervisor / 02-mar-1991 / 75.00 / 1000.00 / 400
vogel / consultant / turner / 17-nov-1991 / 80.00 / 400

Departments (Table name dept)

DeptNo / Dname / Loc / Dbudget
400 / programming / 200 / 150000.00
401 / financial / 200 / 275000.00
402 / academic / 100 / 390000.00
403 / support / 300 / 7000.00

Projects (Table name proj)

Project_id / Description / Pbudget / Due_date
admit / Admissions / 15000.00 / 07-apr-1998
alumni / Alumni development / 7500.00 / 30-jan-1999
billing / Student billing / 11000.00 / 30-jan-1998
budget / Budgeting / 12500.00 / 12-mar-1998
payroll / Payroll / 9000.00 / 15-may-1998
records / Students records / 6000.00 / 11-feb-1998

Tasks (Table name task)

Ename / Project_id / Tname / Hours
allen / admit / debug / 25
allen / admit / implement / 20
allen / billing / debug / 30
allen / billing / implement / 20
barger / admit / manage / 15
barger / alumni / manage / 10
barger / billing / manage / 8
barger / records / manage / 12
jones / billing / implement / 35
jones / budget / implement / 70
jones / payroll / debug / 40
king / admit / clerical / 25
king / alumni / clerical / 9
king / records / clerical / 15
martin / admit / implement / 30
olson / admit / design / 75
olson / alumni / design / 40
olson / billing / design / 20
olson / records / design / 45
pearson / budget / debug / 40
pearson / budget / implement / 60
pearson / payroll / implement / 80
radl / billing / design / 15
radl / billing / manage / 10
radl / budget / manage / 15
radl / payroll / manage / 20
rogers / records / debug / 20
rogers / records / design / 30
rogers / records / implement / 45
smith / alumni / debug / 30
smith / alumni / implement / 90
smith / billing / implement / 40
sturm / billing / clerical / 38
sturm / budget / clerical / 20
sturm / budget / debug / 20
sturm / payroll / clerical / 15
thomas / alumni / design / 5
thomas / billing / design / 45
thomas / budget / design / 40
thomas / payroll / design / 70
turner / billing / manage / 12
turner / budget / design / 45

Alternative Organization of Tasks

Note:

The above information could have been organized by employee, project, or task as follows:

By Employee:

Allen: / Admit / Implement / 20
Debug / 25
Billing / Implement / 20
Debug / 30
Barger: / Admit / Manage / 15
Records / Manage / 12
Billing / Manage / 8
Alumni / Manage / 10
Jones: / etc.

By Project:

Admit: / Design / Olson / 75
Implement / Allen / 20
Martin / 30
Debug / Allen / 25
Manage / Barger / 15
Clerical / King / 25
Records: / etc.

Alternative Organization for Tasks

By Task:

Design: / Olson / Admit / 75
Records / 45
Billing / 20
Alumni / 40
Rogers / Records / 30
Thomas / Billing / 45
Alumni / 5
Payroll / 70
Budget / 40
Radl / Billing / 15
Turner / Budget / 45
Implement / etc.

DBMS Definitions

DBMS - Database Management System

(usually a GDBMS - Generalized DBMS)

DDL - Data Definition Language

Language used to describe the logical database model or “schema”

DML - Data Manipulation Language

Language used to perform database operations such as queries and updates required for an application

PLI - Programming Language Interface

An extension to a conventional 3GL (COBOL, FORTRAN, C, etc.) that allows it to call the DBMS to perform operations (similar to the operations that might be performed in the DML) under control of a programming language

SCHEMA - The overall logical structure (data definition) or model of the stored database

SUB SCHEMA - The user's logical view or model of the database

DBMS Requirements

Data storage, retrieval, and update

(Get data in/Send data out)

Reporting

Ad hoc query handling

Data manipulation

Integrity services

(data type, legal values, format, constraints)

Transaction support

Concurrency control services

(database, file, table, record, field locking)

Recovery services

(backup, journaling, checkpointing)

Authorization services

(encryption, schemas, views, ACL's)

A user-accessible catalog

(data dictionary of files or tables, fields, etc.)

Support for data communications

Application development

Services to promote data independence

(not be aware of data structure, e.g., linked)

Suitable user/machine models

How Large is a Large Database?

Size Classifications:

Trivial

Training, Toys, Examples

Extremely Small

Tiny, In-memory

Very Small

Not worth an “industrial strength” database engine

Small

Lower than average size

Medium

“Sweet spot” in the market – majority of actual implementations

Large

Above average size

Very Large

Problematic size

Extremely Large

State-of-the-art large – only one or two examples in existence – impractical for all but the world’s largest enterprises

Aggregate Database Sizes Over History

Year
Size / 1955 / 1970 / 1985 / 2000 / 2015
Trivial / .2KB / .8KB / 4KB / .02MB / .1MB
Tiny / 1KB / .008MB / 60KB / .5MB / 4MB
V. Small / 5KB / .066MB / .001GB / 12MB / 160MB
Small / 25KB / .6MB / .013GB / .3GB / 7000MB
Medium / 130KB / 5MB / .2GB / 7GB / .3TB
Large / 600KB / 43MB / 3GB / 200GB / 12TB
V. Large / 3300KB / 400MB / 42GB / 5000GB / 530TB
Ex. Large / 17000KB / 3000MB / 620GB / 117000GB / 22000TB

Copyright © 1971-2004 Thomas P. SturmDatabase ConceptsPart 2, Page 1

[1]

Ref: Jan Harrington, Relational Database Management for Microcomputers

[2]

Ref: Jan Harrington, Relational Database Management for Microcomputers

[3]

Ref: C. J. Date, An Introduction to Database Systems