Part 1
Data Concepts
Why learn about Relational Databases?
Problem:
“Demand for new applications is outstripping the capability of data processing departments to implement programs.”
Solutions:
“Put end users into direct touch with the information stored in computers.”
“Increase the productivity of data processing professionals.”
How:
“Relational database management provides a practical foundation for both approaches.”[1]
Nolan's Stage Model
Stage / Applications / Personnel / ManagementI / Clerical / Generalist / Lax
II / Expansion / Programmers / Lax Fails
III / On Hold / Management / Control
IV / Integrated / Analysts / Application
V / Database / DBA / Data
VI / Decision Support System (DSS) / Specialist / Resource
Stage “Nicknames”
I – Initiation
II – Contagion
III – Control
IV – Integration[2]
Data and Information
INFORMATION:
The meaning that a human assigns to data via the known conventions used in their representation.
DATA:
A formalized representation of facts, concepts, or instructions suitable for communication, interpretation, or processing by human or automatic means.
______
Ref.: Thomas P. Sturm, Data and File Structures
Information Storage and Retrieval
“The principal problem of human memory is not storage, but retrieval. The key to retrieval is organization, or in simpler terms, knowing where to find information that has been put into memory”
[3]
Database
WHAT IS A BASE?
the bottom of anything, considered as its support or foundation
the fundamental part of a thing
the chief ingredient of anything, viewed as its fundamental constituent
mathematically, the reference number used to construct a mathematical table
a point or line from which a start is made
base in its most general sense equals bottom, but, more specifically, implies a broad bottom by which something is held up or stabilized
______
Ref.: Thomas P Sturm, Data and File Structures
What is a Database?
Literally, a database is a base made up of data.
“A database is a collection of stored operational data used by the application systems of some particular enterprise.”
[4]
How Information is Stored
A previously assigned code
An implied context
A tacit understanding
A known algorithm
A known or deducible arrangement
EXAMPLE:
What can the set of bits 11010111 represent?
1.An ASCII character thus it has value W.
2.Digits in base 8 thus is represents 327 (base 8)
3.A positive decimal number coded in true binary thus it is equivalent to 215 (decimal)
4.A signed decimal number using one's complement binary thus it represents -40 (decimal)
5.A code for the employment status of 8 employees, so that all but the third and fifth employees are full-time, with the third and fifth being part-time
6.A pattern of dots to draw a picture. If the picture is configured as a 2 by 4 display it represents a picture of the character ]
[5]
Basic Operations on Data
Low Level
1.Create a data item
2.Assign a name to the data item
3.Assign a value to the data item
4.Delete or destroy the data item
5.Update the value of the data item
Intermediate Level
6.Select a data item based on its name, location, or value
7.Associate one data item with another (or in groups)
8.Concatenate one or more data items
High Level
9.Determine a predicate of a data item (test for a property)
10.Evaluate a function (in general involving diverse data items)
[6]
Attributes of Data Items
There are things about which data is collected entities
These entities can optionally have a name or names
There are certain things that it is desirable to describe about the entities. The various qualities of the entity that are to be described are referred to as attributes
For each of these attributes for each entity there is potentially a value
In physically storing some value there is a measure of size or extent of these values, whether fixed or variable
Regardless of how the data is conceived or stored, there is some structure in the data or stored values
Most important, the stored data items must have meaning
[7]
The Data Space
Entity, Attribute, Value
EXAMPLE:
For the entity “the car I drove on my first Sabbatical leave”
ATTRIBUTE / VALUEManufacturer / Ford
Model / Country Sedan
Body Type / Station Wagon
Model Year / 1973
Color / Blue
Owner / Thomas P. Sturm
Class / Passenger Car
License Number / NBGO
Licensing State / Minnesota
[8]
Descriptors / Identifiers
DESCRIPTOR:
A descriptor for an entity is an attribute/value pair.
IDENTIFIER:
An identifier is an attribute (or collection of attributes) whose value (or value combination) is different for each entity.
usually relegated to values necessarily different
where necessary, an identifier can be made up of the concatenation of two attributes (which should be thought of as yet another attribute)
RETRIEVAL
can be based on:
identifier (for an identifier, find some descriptors)
descriptor (for a descriptor, find some identifiers of entities possessing the descriptor)
location (for a particular location, retrieve the data that is stored there)
absolute location
relative location
Data Base Design
Impossible to model all of reality
Select an appropriate subset along all three axes
Select which interrelationships to preserve
Abstract entities and relationships into classes in a way suitable for
machine representation
human interpretation
Organize, code, and structure the stored data
Create convenient access paths
User|
Model / suitable for human
| / interpretation
...
|
Model / suitable for machine
| / representation
Disk Pack
[9]
In a Large Database …
Anything that Can Occur … Will
Names:
- need not be alphabetic
701 Associates LLC
701 Building
701 Executive Suites
all in Minneapolis, MN phone book… sorted under SEVEN!
- names can be long
Welsh town:Llanfairpwllgwyngyllgogerychwyrndrobullllantysiliogogogoch
Numbers:
- can be larger than you might expect
Robert Earl Hughes weighed 1069 lbs.
Oldest age at death (ignoring the Old Testament) 144 years – survived a 100+ year jail sentence
- can have a “hidden structure”
Minnesota driver’s license begins with the Soundex code for the last name
Social security number begins with a code telling where it was issued
Duplicate Values:
- there are 28 U.S. cities named Holland
Structure vs. Content
Structure A
t / h / e / c / a / t / i / sb / l / a / c / k
Structure B
0 / 1 / 0 / 2 / 0 / 4 / 0 / 3 / 0 / 0t / h / e
c / a / t
b / l / a / c / k
i / s
[10]
Static vs. Dynamic Mismatch
STATIC MISMATCH
inherent structure of the DATA
inherent structure of the STORAGE MEDIA
DYNAMIC MISMATCH
DATA STORAGE structure
MAIN MEMORY structure
[11]
Decision Support System
Corporate and Local Databases
The Usage of Data
Data
Produces
Information
Is used to makeIs used to create
something of
DecisionsValue
Data takes
Time to collect
name
credit history
Space to store
social security number
emergency contact
Effort to maintain
birthdate
account balance
Relevance ratio must be computed
Entity Selection
Wanted:
A Fluorochemical Database containing:
Chemicals
Chemical properties
Chemical reactions
Products
Raw materials
New plant construction
Competitors
University research
Patents
Literature
Technologies used
Classification
Marketing
Engineering
Customers
“Rumors”
Sources of Information
Chemical Abstracts Service (CAS)
10 million structures
8.5 million documents
430,000 fluorine private registry files ($ 2 $ 3/compound)
MDL
MACCS compounds
REACCS reactions
Derwint
10's of thousands of documents
Fine Chemicals Directory (FCD)
60,000 substances
Copyright © 1971-2004 Thomas P. SturmData ConceptsPart 1, Page 1
[1]
Ref: E. F. Codd, CACM
[2]
Ref. Richard L. Nolan, CACM, Vol 16, No. 7 (July, 1973), pgs 399 ff.
[3]
Ref: Jerome Bruner, On Knowing
[4]
Ref: C. J. Date, An Introduction to Database Systems
[5]
Ref: Thomas P. Sturm, Data and File Structures
[6]
Ref: Thomas P. Sturm, Data and File Structures
[7]
Ref. Thomas P. Sturm, Data and File Structures
[8]
Ref. Thomas P. Sturm, Data and File Structures
[9]
Ref: Thomas P. Sturm, Data and File Structures
[10]
Ref. Thomas P. Sturm, Data and File Structures
[11]
Ref: Thomas P. Sturm, Data and File Structures