Part 2
Database Concepts
Case Study 1
[1]
Problems with a File System
redundant data
separate maintenance
varying formats/contents/standards
application program tied to data
little overall planning
Problems with handling
address change
new account/credit report
field length change
Database Approach
[2]
Why Database?
Advantages over Paper-based Methods
Compactness
Speed
Less drudgery
Currency
Centralized control
Advantages of Centralized Control
(via a Database Administrator - DBA)
Redundancy can be reduced
Inconsistency can be avoided
The data can be shared
Standards can be enforced
Security restrictions can be applied
Integrity can be maintained
Conflicting requirements can be balanced
[3]
Advantages / DisadvantagesEconomy of scale / Size
Getting more information from same amount of data / Complexity
Data sharing / Cost
Balancing conflicting requirements / Requirements in conflict
Enforcing standards / Having to adhere to standards
Controlled redundancy / Higher impact of a failure
Security / Recovery more difficult
Consistency
Integrity
Flexibility
Responsiveness
Increased organizational productivity
Improved program maintenance
Data independence
Data Dependence
Data Dependence:
Application dictates
organization in secondary storage
access methods
Knowledge of that organization and access method
built into application logic
built into executable code
Impossible to change the storage structure or access methods without affecting the application
Why Data Dependence is Undesirable:
Different applications will need different views of the data
The database administrator must have the freedom to change the storage structure or access methods
Stored Field, Record, File
Stored Field:
smallest named unit of stored data
database will contain many occurrences or instances
classified into various types
Stored Record:
named collection of related stored fields
record occurrence consists of a group of related field occurrences
also classified into various types by the variation in fields contained in the record
Stored File:
named collection of all occurrences of one type of record
Data Independence
The Immunity of Applications to Changes in Storage Structure and Access Methods
Must Protect an Application from Variations in:
representation of numeric data (binary, BCD)
representation of character data (ASCII, EBCDIC)
units for numeric data (dollars, cents)
data coding (red, blue, green; 1, 2, 3; 01, 10, 11)
data materialization (for derived data)
structure of stored records (grouping fields into records)
structure of stored files (single volume, volume set, etc.)
Sample Database - Employees
Overview of Content:
The database contains organization, budget, and scheduling information for a software group that is developing an academic information system
Entities:
Employees - who have
a name
a job title
a manager who, in turn, is an employee
a hire date
an hourly billing rate
(possibly) a dollar annual bonus amount
membership in a department, which in turn has a name, location, and budget
a set of assigned tasks on projects
each task by each employee on each project has a time estimate in hours
each project has a name, description, budget, and due date
Sample Database - Departments and Projects
Entities (continued)
Departments - which have
a department number
a department name
a department location (room number)
an annual dollar budget
employees, who in turn have a name, job description, manager, hire date, hourly rate annual bonus, and a set of assigned tasks (as described above)
Projects - which have
a project name
a project description
a project budget
a project due date
a set of tasks, each of which is to be performed by one or more employees (who in turn have a name, job description, manager, hire date, ...) with a time estimate for each employee for each task
Sample Database - Tasks
Entities (continued)
Tasks - each of which have
the name of the employee working on the task (who in turn has name, job description, ...)
the name of the project that the task is related to (which in turn has name, description, ...)
the name of the task being performed
the time estimate (in hours) of how long an employee will work on a particular type of task for a particular project
Sample Data
(stated in relational form)
Employees - (Table name emp)
Ename / Job / Mgr / Hired / Rate / Bonus / DeptNoallen / programmer / barger / 09-jun-1991 / 30.00 / 402
barger / supervisor / turner / 23-jan-1993 / 65.00 / 550.00 / 402
jones / programmer / radl / 20-feb-1991 / 35.00 / 401
king / clerk / barger / 22-feb-1991 / 18.00 / 402
martin / programmer / barger / 09-nov-1991 / 25.00 / 402
olson / analyst / radl / 28-apr-1991 / 55.00 / 0.00 / 401
pearson / programmer / radl / 01-may-1991 / 30.00 / 401
radl / supervisor / turner / 03-dec-1992 / 65.00 / 600.00 / 401
rogers / programmer / barger / 08-sep-1992 / 25.00 / 402
smith / programmer / barger / 17-dec-1990 / 35.00 / 402
sturm / clerk / radl / 23-sep-1992 / 18.00 / 401
thomas / analyst / barger / 03-dec-1992 / 50.00 / 0.00 / 402
turner / supervisor / 02-mar-1991 / 75.00 / 1000.00 / 400
vogel / consultant / turner / 17-nov-1991 / 80.00 / 400
Departments (Table name dept)
DeptNo / Dname / Loc / Dbudget400 / programming / 200 / 150000.00
401 / financial / 200 / 275000.00
402 / academic / 100 / 390000.00
403 / support / 300 / 7000.00
Projects (Table name proj)
Project_id / Description / Pbudget / Due_dateadmit / Admissions / 15000.00 / 07-apr-1998
alumni / Alumni development / 7500.00 / 30-jan-1999
billing / Student billing / 11000.00 / 30-jan-1998
budget / Budgeting / 12500.00 / 12-mar-1998
payroll / Payroll / 9000.00 / 15-may-1998
records / Students records / 6000.00 / 11-feb-1998
Tasks (Table name task)
Ename / Project_id / Tname / Hoursallen / admit / debug / 25
allen / admit / implement / 20
allen / billing / debug / 30
allen / billing / implement / 20
barger / admit / manage / 15
barger / alumni / manage / 10
barger / billing / manage / 8
barger / records / manage / 12
jones / billing / implement / 35
jones / budget / implement / 70
jones / payroll / debug / 40
king / admit / clerical / 25
king / alumni / clerical / 9
king / records / clerical / 15
martin / admit / implement / 30
olson / admit / design / 75
olson / alumni / design / 40
olson / billing / design / 20
olson / records / design / 45
pearson / budget / debug / 40
pearson / budget / implement / 60
pearson / payroll / implement / 80
radl / billing / design / 15
radl / billing / manage / 10
radl / budget / manage / 15
radl / payroll / manage / 20
rogers / records / debug / 20
rogers / records / design / 30
rogers / records / implement / 45
smith / alumni / debug / 30
smith / alumni / implement / 90
smith / billing / implement / 40
sturm / billing / clerical / 38
sturm / budget / clerical / 20
sturm / budget / debug / 20
sturm / payroll / clerical / 15
thomas / alumni / design / 5
thomas / billing / design / 45
thomas / budget / design / 40
thomas / payroll / design / 70
turner / billing / manage / 12
turner / budget / design / 45
Alternative Organization of Tasks
Note:
The above information could have been organized by employee, project, or task as follows:
By Employee:
Allen: / Admit / Implement / 20Debug / 25
Billing / Implement / 20
Debug / 30
Barger: / Admit / Manage / 15
Records / Manage / 12
Billing / Manage / 8
Alumni / Manage / 10
Jones: / etc.
By Project:
Admit: / Design / Olson / 75Implement / Allen / 20
Martin / 30
Debug / Allen / 25
Manage / Barger / 15
Clerical / King / 25
Records: / etc.
Alternative Organization for Tasks
By Task:
Design: / Olson / Admit / 75Records / 45
Billing / 20
Alumni / 40
Rogers / Records / 30
Thomas / Billing / 45
Alumni / 5
Payroll / 70
Budget / 40
Radl / Billing / 15
Turner / Budget / 45
Implement / etc.
DBMS Definitions
DBMS - Database Management System
(usually a GDBMS - Generalized DBMS)
DDL - Data Definition Language
Language used to describe the logical database model or “schema”
DML - Data Manipulation Language
Language used to perform database operations such as queries and updates required for an application
PLI - Programming Language Interface
An extension to a conventional 3GL (COBOL, FORTRAN, C, etc.) that allows it to call the DBMS to perform operations (similar to the operations that might be performed in the DML) under control of a programming language
SCHEMA - The overall logical structure (data definition) or model of the stored database
SUB SCHEMA - The user's logical view or model of the database
DBMS Requirements
Data storage, retrieval, and update
(Get data in/Send data out)
Reporting
Ad hoc query handling
Data manipulation
Integrity services
(data type, legal values, format, constraints)
Transaction support
Concurrency control services
(database, file, table, record, field locking)
Recovery services
(backup, journaling, checkpointing)
Authorization services
(encryption, schemas, views, ACL's)
A user-accessible catalog
(data dictionary of files or tables, fields, etc.)
Support for data communications
Application development
Services to promote data independence
(not be aware of data structure, e.g., linked)
Suitable user/machine models
How Large is a Large Database?
Size Classifications:
Trivial
Training, Toys, Examples
Extremely Small
Tiny, In-memory
Very Small
Not worth an “industrial strength” database engine
Small
Lower than average size
Medium
“Sweet spot” in the market – majority of actual implementations
Large
Above average size
Very Large
Problematic size
Extremely Large
State-of-the-art large – only one or two examples in existence – impractical for all but the world’s largest enterprises
Aggregate Database Sizes Over History
YearSize / 1955 / 1970 / 1985 / 2000 / 2015
Trivial / .2KB / .8KB / 4KB / .02MB / .1MB
Tiny / 1KB / .008MB / 60KB / .5MB / 4MB
V. Small / 5KB / .066MB / .001GB / 12MB / 160MB
Small / 25KB / .6MB / .013GB / .3GB / 7000MB
Medium / 130KB / 5MB / .2GB / 7GB / .3TB
Large / 600KB / 43MB / 3GB / 200GB / 12TB
V. Large / 3300KB / 400MB / 42GB / 5000GB / 530TB
Ex. Large / 17000KB / 3000MB / 620GB / 117000GB / 22000TB
Copyright © 1971-2004 Thomas P. SturmDatabase ConceptsPart 2, Page 1
[1]
Ref: Jan Harrington, Relational Database Management for Microcomputers
[2]
Ref: Jan Harrington, Relational Database Management for Microcomputers
[3]
Ref: C. J. Date, An Introduction to Database Systems