Part 1

Data Concepts

Why learn about Relational Databases?

Problem:

“Demand for new applications is outstripping the capability of data processing departments to implement programs.”

Solutions:

“Put end users into direct touch with the information stored in computers.”

“Increase the productivity of data processing professionals.”

How:

“Relational database management provides a practical foundation for both approaches.”[1]

Nolan's Stage Model

Stage / Applications / Personnel / Management
I / Clerical / Generalist / Lax
II / Expansion / Programmers / Lax Fails
III / On Hold / Management / Control
IV / Integrated / Analysts / Application
V / Database / DBA / Data
VI / Decision Support System (DSS) / Specialist / Resource

Stage “Nicknames”

I – Initiation

II – Contagion

III – Control

IV – Integration[2]

Data and Information

INFORMATION:

The meaning that a human assigns to data via the known conventions used in their representation.

DATA:

A formalized representation of facts, concepts, or instructions suitable for communication, interpretation, or processing by human or automatic means.

______

Ref.: Thomas P. Sturm, Data and File Structures

Information Storage and Retrieval

“The principal problem of human memory is not storage, but retrieval. The key to retrieval is organization, or in simpler terms, knowing where to find information that has been put into memory”

[3]

Database

WHAT IS A BASE?

the bottom of anything, considered as its support or foundation

the fundamental part of a thing

the chief ingredient of anything, viewed as its fundamental constituent

mathematically, the reference number used to construct a mathematical table

a point or line from which a start is made

base in its most general sense equals bottom, but, more specifically, implies a broad bottom by which something is held up or stabilized

______

Ref.: Thomas P Sturm, Data and File Structures

What is a Database?

Literally, a database is a base made up of data.

“A database is a collection of stored operational data used by the application systems of some particular enterprise.”

[4]

How Information is Stored

A previously assigned code

An implied context

A tacit understanding

A known algorithm

A known or deducible arrangement

EXAMPLE:

What can the set of bits 11010111 represent?

1.An ASCII character thus it has value W.

2.Digits in base 8 thus is represents 327 (base 8)

3.A positive decimal number coded in true binary thus it is equivalent to 215 (decimal)

4.A signed decimal number using one's complement binary thus it represents -40 (decimal)

5.A code for the employment status of 8 employees, so that all but the third and fifth employees are full-time, with the third and fifth being part-time

6.A pattern of dots to draw a picture. If the picture is configured as a 2 by 4 display it represents a picture of the character ]

[5]

Basic Operations on Data

Low Level

1.Create a data item

2.Assign a name to the data item

3.Assign a value to the data item

4.Delete or destroy the data item

5.Update the value of the data item

Intermediate Level

6.Select a data item based on its name, location, or value

7.Associate one data item with another (or in groups)

8.Concatenate one or more data items

High Level

9.Determine a predicate of a data item (test for a property)

10.Evaluate a function (in general involving diverse data items)

[6]

Attributes of Data Items

There are things about which data is collected entities

These entities can optionally have a name or names

There are certain things that it is desirable to describe about the entities. The various qualities of the entity that are to be described are referred to as attributes

For each of these attributes for each entity there is potentially a value

In physically storing some value there is a measure of size or extent of these values, whether fixed or variable

Regardless of how the data is conceived or stored, there is some structure in the data or stored values

Most important, the stored data items must have meaning

[7]

The Data Space

Entity, Attribute, Value

EXAMPLE:

For the entity “the car I drove on my first Sabbatical leave”

ATTRIBUTE / VALUE
Manufacturer / Ford
Model / Country Sedan
Body Type / Station Wagon
Model Year / 1973
Color / Blue
Owner / Thomas P. Sturm
Class / Passenger Car
License Number / NBGO
Licensing State / Minnesota

[8]

Descriptors / Identifiers

DESCRIPTOR:

A descriptor for an entity is an attribute/value pair.

IDENTIFIER:

An identifier is an attribute (or collection of attributes) whose value (or value combination) is different for each entity.

usually relegated to values necessarily different

where necessary, an identifier can be made up of the concatenation of two attributes (which should be thought of as yet another attribute)

RETRIEVAL

can be based on:

identifier (for an identifier, find some descriptors)

descriptor (for a descriptor, find some identifiers of entities possessing the descriptor)

location (for a particular location, retrieve the data that is stored there)

absolute location

relative location

Data Base Design

Impossible to model all of reality

Select an appropriate subset along all three axes

Select which interrelationships to preserve

Abstract entities and relationships into classes in a way suitable for

machine representation

human interpretation

Organize, code, and structure the stored data

Create convenient access paths

User
|
Model / suitable for human
| / interpretation
...
|
Model / suitable for machine
| / representation
Disk Pack

[9]

In a Large Database …

Anything that Can Occur … Will

Names:

- need not be alphabetic

701 Associates LLC

701 Building

701 Executive Suites

all in Minneapolis, MN phone book… sorted under SEVEN!

- names can be long

Welsh town:Llanfairpwllgwyngyllgogerychwyrndrobullllantysiliogogogoch

Numbers:

- can be larger than you might expect

Robert Earl Hughes weighed 1069 lbs.

Oldest age at death (ignoring the Old Testament) 144 years – survived a 100+ year jail sentence

- can have a “hidden structure”

Minnesota driver’s license begins with the Soundex code for the last name

Social security number begins with a code telling where it was issued

Duplicate Values:

- there are 28 U.S. cities named Holland

Structure vs. Content

Structure A

t / h / e / c / a / t / i / s
b / l / a / c / k

Structure B

0 / 1 / 0 / 2 / 0 / 4 / 0 / 3 / 0 / 0
t / h / e
c / a / t
b / l / a / c / k
i / s

[10]

Static vs. Dynamic Mismatch

STATIC MISMATCH

inherent structure of the DATA

inherent structure of the STORAGE MEDIA

DYNAMIC MISMATCH

DATA STORAGE structure

MAIN MEMORY structure

[11]

Decision Support System

Corporate and Local Databases

The Usage of Data

Data

Produces

Information

Is used to makeIs used to create

something of

DecisionsValue

Data takes

Time to collect
name
credit history
Space to store
social security number
emergency contact
Effort to maintain
birthdate
account balance

Relevance ratio must be computed

Entity Selection

Wanted:

A Fluorochemical Database containing:

Chemicals

Chemical properties

Chemical reactions

Products

Raw materials

New plant construction

Competitors

University research

Patents

Literature

Technologies used

Classification

Marketing

Engineering

Customers

“Rumors”

Sources of Information

Chemical Abstracts Service (CAS)

10 million structures

8.5 million documents

430,000 fluorine private registry files ($ 2 $ 3/compound)

MDL

MACCS compounds

REACCS reactions

Derwint

10's of thousands of documents

Fine Chemicals Directory (FCD)

60,000 substances

Copyright © 1971-2004 Thomas P. SturmData ConceptsPart 1, Page 1

[1]

Ref: E. F. Codd, CACM

[2]

Ref. Richard L. Nolan, CACM, Vol 16, No. 7 (July, 1973), pgs 399 ff.

[3]

Ref: Jerome Bruner, On Knowing

[4]

Ref: C. J. Date, An Introduction to Database Systems

[5]

Ref: Thomas P. Sturm, Data and File Structures

[6]

Ref: Thomas P. Sturm, Data and File Structures

[7]

Ref. Thomas P. Sturm, Data and File Structures

[8]

Ref. Thomas P. Sturm, Data and File Structures

[9]

Ref: Thomas P. Sturm, Data and File Structures

[10]

Ref. Thomas P. Sturm, Data and File Structures

[11]

Ref: Thomas P. Sturm, Data and File Structures