ER Diagrams: What Do I Want

In theory, there is no difference between practise and theory.

In practise, there is.

ER Diagrams: what do I want?

Tables: how do I get them from an ER diagram?

Normalization: why bother?

Indexes: what are they?

Database design steps:

Create ER model
Convert ER model to relational model
Normalize relational model
Define attribute data types, default values, integrity rules, …
Build database using MS Access
Populate with test data
ER Diagrams:

Entities: distinct ‘things’ in the enterprise
Ex: managers, employees, projects, …
generally a noun
Relationships: meaningful interactions between entities
Ex: managers manage employees
generally a verb
Attributes: properties of entities and relationships
ex: an employee has name, IRD number, address, …

An ER diagram contains all three (that is, don’t give me just an E diagram, or an R diagram)

An ACCESS database diagram is
not an ER diagram:

ER diagram also usually specifies cardinality: the number of relationships in which a given entity can appear

One relationship
Each employee has one manager
Variable number (N, M) of relationships
Each manager can have many employees
A maximum number of relationships
An employee is involved in at most 4 projects

…And you can specify other subtleties in relationships as well: optional, mutually exclusive, subtypes….

There are many different formats for ER diagrams. Your diagrams must contain a key/legend that lists the diagrammatic element types that you are using.
Each individual object is an entity, the collection of like entities is an entity set

ex: one person is a tuple in a table, the entire set is the PERSONS entity set
usually entity labels are plural (Employees, Projects, …)

ER diagrams are touted as a tool for communication between DB developers and clients
Maybe
ER diagrams are one form of readable documentation for the DB
So they should be readable: use readable names
Entity, relationship names should be the same as the tables
There should be a clear relationship between the ER diagram and the tables you construct

Convert ER diagram to relational model (DB tables):

Each entity: a table (relation), primary key

Each relationship either:

oa table, containing the primary keys of the relevant entities (foreign keys)

represent by making the primary key of one table the foreign key of another

Each attribute: a column in a table

Then normalize tables to make them well-structured

Then if necessary merge tables to remove redundancies, and re-normalize

Choosing data types for attributes (columns):

Get appropriate data types from:

Requirement’s DD
Existing forms, screens & reports
Talking to client

Select data types to:

Minimize storage space
Numbers take up less space than text: does a product code really have to start P1253, or can the code be 1253?
How long does a text string really have to be? (but make sure it’s long enough…)
Represent all possible values
Is SMALLINT big enough?
Remember that data tends to grow; plan for future needs
Support all data manipulations
Should invoice date be DATE or CHAR(6)? Do you need to compare/subtract dates? Transfer dates between systems?
In bigger systems, may need to look at more exotic data types (image, graphic, large text, …)

Keys, primary and foreign:

each table has a primary key: a set of attributes that serve as the unique identifier for the relation. Often, but not always, the primary key is a single attribute.
Salespeople: SpersonID, region
Sales: Customer_ID, Date, SpersonID

Foreign key: an attribute that appears as a nonkey attribute in one relation and a primary key attribute in another
SpersonID is a foreign key in Sales
Referential integrity: value of foreign key depends on value of same attribute in another table
SpersonID in Sales must be one of values in Salespeople

Other bits and bobs:

NULL nor not?

DB lets you enforce whether or not a value can be absent

Simple integrity constraints

Force range checking

Default values

Particularly handy with NOT NULL

Normalization

Formulas for ensuring that your attributes are divided into tables so as to reduce data redundancy, reduce potential for update errors
6 or so normal forms
most commercial DBs are in 3NF or higher (3NF a minimum)
for your DB deliverable, you need to explain what NF you have attained in your DB

Why not make all DBs in 6NF? Why are some DBs in 2NF?

The need for speed: DBA sometimes will deliberately increase redundancy and potential for inconsistencies to achieve better query processing performance

First attempts at achieving better performance:

Caching: 5% to 20% of the data (the more the better!)

80% of activity takes place on 20% of data—another example of the 80/20 rule

Indexes: primary keys, foreign keys, anything else that looks useful (remembering that indexes take up memory—the time/space tradeoff)

• granularity of locking

table, database locks more likely to cause contention than page, tuple locks

but: page, tuple locks require more overhead to process

type of security

no security checks, or mandatory control systems, execute more quickly

but: no security is dangerous

mandatory systems less flexible than discretionary control

•rip out high overhead options such as integrity constraints, views

Going to a lower NF is a last resort! don't do this unless you have to!

…and I don’t think that any of you will have to—you’ll need to REALLY convince me on this one

And be sure to clearly document why you have moved to a lower NF—as conditions change, the DBA may be able to increase the normality

Ex: the most common query type changes

Indexes: an aside

An index on a database sorts the tuples in the table by one (or more) of the columns:

•Can define indexes to speed up access

primary index: usually on the primary key (in Access, automatically done on the key).

Sorting is done physically, on the table actually stored in memory.

Sorting means that the index attribute(s) can be retrieved far more efficiently during queries.

Secondary index: placed by DBA on any attributes that the DBA thinks will be frequently accessed by queries.

Not a physical sorting: an index is kept of the tuple values.

•unique index  unique values only

•duplicates index  duplicate values ok