In theory, there is no difference between practise and theory.

In practise, there is.

  • ER Diagrams: what do I want?
  • Tables: how do I get them from an ER diagram?
  • Normalization: why bother?
  • Indexes: what are they?

Database design steps:

  1. Create ER model
  2. Convert ER model to relational model
  3. Normalize relational model
  4. Define attribute data types, default values, integrity rules, …
  5. Build database using MS Access
  6. Populate with test data
    ER Diagrams:
  • Entities: distinct ‘things’ in the enterprise
  • Ex: managers, employees, projects, …
  • generally a noun
  • Relationships: meaningful interactions between entities
  • Ex: managers manage employees
  • generally a verb
  • Attributes: properties of entities and relationships
  • ex: an employee has name, IRD number, address, …

An ER diagram contains all three (that is, don’t give me just an E diagram, or an R diagram)

An ACCESS database diagram is
not an ER diagram:

ER diagram also usually specifies cardinality: the number of relationships in which a given entity can appear

  • One relationship
  • Each employee has one manager
  • Variable number (N, M) of relationships
  • Each manager can have many employees
  • A maximum number of relationships
  • An employee is involved in at most 4 projects

…And you can specify other subtleties in relationships as well: optional, mutually exclusive, subtypes….

There are many different formats for ER diagrams. Your diagrams must contain a key/legend that lists the diagrammatic element types that you are using.
Each individual object is an entity, the collection of like entities is an entity set

  • ex: one person is a tuple in a table, the entire set is the PERSONS entity set
  • usually entity labels are plural (Employees, Projects, …)
  • ER diagrams are touted as a tool for communication between DB developers and clients
  • Maybe
  • ER diagrams are one form of readable documentation for the DB
  • So they should be readable: use readable names
  • Entity, relationship names should be the same as the tables
  • There should be a clear relationship between the ER diagram and the tables you construct

Convert ER diagram to relational model (DB tables):

Each entity: a table (relation), primary key

Each relationship either:

oa table, containing the primary keys of the relevant entities (foreign keys)

  • represent by making the primary key of one table the foreign key of another

Each attribute: a column in a table

Then normalize tables to make them well-structured

Then if necessary merge tables to remove redundancies, and re-normalize

Choosing data types for attributes (columns):

Get appropriate data types from:

  • Requirement’s DD
  • Existing forms, screens & reports
  • Talking to client

Select data types to:

  • Minimize storage space
  • Numbers take up less space than text: does a product code really have to start P1253, or can the code be 1253?
  • How long does a text string really have to be? (but make sure it’s long enough…)
  • Represent all possible values
  • Is SMALLINT big enough?
  • Remember that data tends to grow; plan for future needs
  • Support all data manipulations
  • Should invoice date be DATE or CHAR(6)? Do you need to compare/subtract dates? Transfer dates between systems?
  • In bigger systems, may need to look at more exotic data types (image, graphic, large text, …)

Keys, primary and foreign:

  • each table has a primary key: a set of attributes that serve as the unique identifier for the relation. Often, but not always, the primary key is a single attribute.
  • Salespeople: SpersonID, region
  • Sales: Customer_ID, Date, SpersonID
  • Foreign key: an attribute that appears as a nonkey attribute in one relation and a primary key attribute in another
  • SpersonID is a foreign key in Sales
  • Referential integrity: value of foreign key depends on value of same attribute in another table
  • SpersonID in Sales must be one of values in Salespeople

Other bits and bobs:

  • NULL nor not?

DB lets you enforce whether or not a value can be absent

  • Simple integrity constraints

Force range checking

  • Default values

Particularly handy with NOT NULL

Normalization

  • Formulas for ensuring that your attributes are divided into tables so as to reduce data redundancy, reduce potential for update errors
  • 6 or so normal forms
  • most commercial DBs are in 3NF or higher (3NF a minimum)
  • for your DB deliverable, you need to explain what NF you have attained in your DB

Why not make all DBs in 6NF? Why are some DBs in 2NF?

The need for speed: DBA sometimes will deliberately increase redundancy and potential for inconsistencies to achieve better query processing performance

First attempts at achieving better performance:

  • Caching: 5% to 20% of the data (the more the better!)

80% of activity takes place on 20% of data—another example of the 80/20 rule

  • Indexes: primary keys, foreign keys, anything else that looks useful (remembering that indexes take up memory—the time/space tradeoff)

• granularity of locking

table, database locks more likely to cause contention than page, tuple locks

but: page, tuple locks require more overhead to process

  • type of security

no security checks, or mandatory control systems, execute more quickly

but: no security is dangerous

mandatory systems less flexible than discretionary control

•rip out high overhead options such as integrity constraints, views

Going to a lower NF is a last resort! don't do this unless you have to!

…and I don’t think that any of you will have to—you’ll need to REALLY convince me on this one

And be sure to clearly document why you have moved to a lower NF—as conditions change, the DBA may be able to increase the normality

Ex: the most common query type changes

Indexes: an aside

An index on a database sorts the tuples in the table by one (or more) of the columns:

•Can define indexes to speed up access

  • primary index: usually on the primary key (in Access, automatically done on the key).

Sorting is done physically, on the table actually stored in memory.

Sorting means that the index attribute(s) can be retrieved far more efficiently during queries.

  • Secondary index: placed by DBA on any attributes that the DBA thinks will be frequently accessed by queries.

Not a physical sorting: an index is kept of the tuple values.

•unique index  unique values only

•duplicates index  duplicate values ok