Database Design Steps

Cho, Joonmyun 18-10-3

Database Design

Database Design Steps

Logical Data Modeling

Relational Database Design

Conceptual Model(ER/EER)-to-Relational Mapping

ER-to-Relational Mapping Algorithms

EER-to-Relational Mapping Algorithms

Normalization

이상현상

함수종속과기본정규형

다치종속과제4정규형

Database Design Steps

Logical Data Modeling

Build Skeletal User Views
Add Keys to User Views
Add Detail to User Views
Validate User Views through Normalization
Integrate User Views

Building and Tuning a Relational Database

Translate the Logical Data Structure
Translate the Logical Data Integrity

The two methodologies (logical data modeling and relational database design) in this book (Handbook of Relational Database Design) are consistent with the ideas in the ANSI/X3/SPARC Three-schema Architecture.

Logical Data Modeling

The most important constructs within a logical data model are entities and relationships

Set of criteria for an optimal Logical Data Model

Structural Validity

consistency with the way the business defines and organizes information

Simplicity

ease of understanding even by unskilled people (e.g. by users or by non-systems professionals)

Non-redundancy

inclusion of no extraneous information, in particular, representation of any one piece of information exactly once.

Steps in Logical Data Modeling

Build Skeletal User Views

Identify major entities (significant objects of interest)
Define relationship between entities

Add Keys to User Views

Determine primary and alternative keys (identifying properties of entities)
Determine foreign keys (identifying properties of relationships)
Determine key business rules (rules that govern the effects of insert, delete, and update operations on relationships)

Key business rules define conditions under which primary and foreign keys may be inserted, deleted, or updated.

You should establish one insert rule and one delete rule for each relationship. The insert rule determines valid conditions under which you may insert or update the foreign key in an entity occurrence.

Add Detail to User Views

Add remaining non-key attributes

Nonkey attributes are the descriptive detail that users naturally associate with the entities.

Validate User Views through Normalization

Validate normalization rules

Determine Additional Attribute Business Rules

Determine domains (constraints on valid values that attributes may assume)
Determine triggering operations (rules that govern the effects of insert, delete, and update operations on other entities or other attributes within the same entity)

Data Dictionary에 정리 해야 함. 즉, 개발 Spec. = ER Model + Data Dictionary

Domains verify whether values assigned to an attributes make “business sense”.

We use the term triggering operation to refer to the most generalized form of business rule, encompassing domains and key business rules as well as other types of attribute business rule.

Integrate User Views

Combine user views
Integrate with existing data models
Analyze for stability and growth

Relational Database Design

Relational database design is a process for transferring a logical data model into a relational database.

Logical Data Model include not only structural constructs (entities, relationships, attributes) but also integrity constructs (primary keys, foreign keys, domains, and other business rules)

Steps in Relational Database Design

Translate the Logical Data Structure

Identify tables
Identify columns
Adapt data structure to product environment

Translate the Logical Data Integrity

Design for business rules about entities
Design for business rules about relationships
Design for additional business rules about attributes.

Conceptual Model(ER/EER)-to-Relational Mapping

We show how a relational database schema can be derived from a conceptual schema developed using the ER Model

The COMPANY relational schema can be derived from the ER Schema of COMPANY database

EMPLOYEE

FNAME / MINIT / LNAME / SSN / BDATE / ADDRESS / SEX / SALARY / SUPERSSN / DNO

DEPARTMENT

DNAME / DNUMBER / MGRSSN / MGRSTARTDATE

DEPT_LOCATIONS

DNUMBER / DLOCATION

PROJECT

PNAME / PNUMBER / PLOCATION / DNUM

WORKS_ON

ESSN / PNO / HOURS

DEPENDENT

ESSN / DEPENDENT_NAME / SEX / BDATE / RELATIONSHIP

Many tools use ER diagrams or variations to develop the schema graphically, and then automatically convert it into a relational database schema in the DDL

ER-to-Relational Mapping Algorithms

STEP 1

For each regular entity type E in the ER schema, create a relation R that includes all the simple attributes of E.

Include only the simple component attributes of composite attribute.

Choose one of the key attributes of E as Primary Key for R.

STEP 2

For each weak entity type W in the ER schema with owner entity type E, create a relation R.

And include all simple attributes (or simple components of composite attributes) of W as attributes of R.

In addition, include as foreign key attributes of R the primary key attribute(s) of the relation(s) that correspond to the owner entity type(s); this takes care of the identifying relationship type of W.

The primary key of R is the combination of the primary key(s) of the owner(s) and the partial key of the weak entity type W, if any.

STEP 3

For each a binary 1:1 relationship type R in the ER schema, identify the relations S and T that correspond to the entity types participating in R.

Choose one of the relations -S, say- and include as foreign key in S the primary key of T.

Include all the simple attributes (or simple components of composite attributes) of the 1:1 relationship type R as attributes of S. It is better to choose an entity type with total participation in R in the role of S.

Notice that an alternative mapping of a 1:1 relationship type is possible by merging the two entity types and the relationship into a single relation; this is particularly appropriate when both participations are total and when the entity types do not participate in any other relationship types.

STEP 4

For each regular(nonweak) binary 1:N relationship type R, identify the relation S that represents the participating entity type at the N-side of the relationship type.

Include as foreign key in S the primary key of the relation T that represents the other entity type participating in R; this is because each entity instance on the N-side is related to at most one entity instance on the 1-side of the relationship type.

Include any simple attributes (or simple components of composite attributes) of the 1:N relationship type as attributes of S.

STEP 5

For each binary M:N relationship type R, create a new relation S to represent R.

Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types; their combination will form the primary key of S. Also include any simple attributes of the M:N relationship type (or simple components of composite attributes) as attributes of S.

Notice that we can always map 1:1 or 1:N relationships in a manner similar to M:N relationships. This alternative is particularly useful when few relationship instances exist, in order to avoid null values in foreign keys. In this case, the primary key of the “relationship” relation will be the foreign key of only one of the participating “entity” relations. For a 1:N relationship, this will be the entity relation on the N-side. For a 1:1 relationship, the entity relation with total participation (if any) is chosen.

STEP 6

For each multivalued attribute A, create a new relation R

that includes an attribute corresponding to A plus the primary key attribute K (as a foreign key in R) of the relation that represents the entity type or relationship type that has A as an attribute.

If the multivalued attribute is composite, we include its simple components.

STEP 7

For each n-ary relationship type R, n > 2, create a new relation S to represent R.

Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types. Also include any simple attributes of the n-ary relationship type (or simple components of composite attributes) as attributes of S.

The primary key of S is usually a combination of all the foreign keys that reference the relations representing the participating entity types.

However, if the participation constraint (min, max) of one of the entity types E participating in R has max = 1, then the primary key of S can be the single foreign key attribute that references the relation E corresponding to E; this is because, in this case, each entity e in E will participate in at most one relationship instance of R and hence can uniquely identify that relationship instance

Correspondence between ER and Relational Models

ER Model / Relational Model
entity type / “entity” relation
1:1 or 1:N relationship type / foreign key (or “relationship” relation)
M:N relationship type / “relationship” relation and two foreign keys
n-ary relationship type / “relationship” relation and n foreign keys
simple attribute / attribute
composite attribute / set of simple component attributes
multivalued attribute / relation and foreign key
value set / domain
key attribute / primary (or secondary) key

EER-to-Relational Mapping Algorithms

STEP 8

Convert each specialization with m subclasses {S1, S2, …, Sm} and (generalized) superclass C, where the attributes of C are {k, a1, …, am} and k is the (primary) key, into relation schemas using one of the four following options:

Option 8A

Create a relation L for C with attributes Attrs(L) = {k, a1, a2, …, an} and PK(L) = k. Create a relation Li rot each subclass Si, 1 =< i =< m, with the attributes Attrs(Li) = {k} U {attributes of Si} and PK(Li) = k.

Option 8B

Create a relation Li for each subclass Si, 1 =< i=< m, with the attributes Attrs(Li) = {attributes of Si} U {k, a1, …, an} and PK(Li) = k;

Option 8C

Create a single relation L with attributes Attrs(L) = {k, a1, …, an} U {attributes of S1} U {attributes of S2} U …. U {attributes of Sm} U {t} and PK(L) = k.

This option is for a specialization whose subclasses are disjoint, and t is a type attribute that indicates the subclass to which each tuple belongs, if any. This option has the potential for generating a large number of null values.

Option 8D

Create a single relation schema L with attributes Attrs(L) = {k, a1, …, an} U {attributes of S1} U … U {attributes of Sm} U {t1, t2, …, tm} and PK(L) = k.

This option is for a specialization whose subclass are overlapping (not disjoint), and each ti, 1 =< i =< m, is a boolean attribute indicating whether a tuple belong to subclass Si.

# In option 8A, an EQUIJOIN operation on the primary key between any Li and L produces all the specific and inherited attributes of the entities in Si.

Option 8A works for any constraints on the specialization: disjoint or overlapping, total or partial.

# When we have a multi-level specialization (or generalization) hierarchy or lattice, we do not have to follow the same mapping option for all the specializations. Instead, we can use one mapping option for part of the hierarchy or lattice and other options for other parts.

# Mapping of Shared Subclasses (in C++, multiple inheritanced subclasses) we can apply any of the options discussed above although usually Option 8A is used.

Mapping of Categories

A category is a subclass of the union of two or more superclasses that can have different keys because they can be of different entity types.

For mapping a category whose defining superclasses have different keys, it is customary to specify a new key attribute. called a surrogate key, when creating a relation to correspond to the category.

We also add the surrogate key attribute as foreign key to each relation corresponding to a superclass of the category, to specify the correspondences in values between the surrogate key and the key of each superclass.

Normalization

데이터베이스 설계는 현실세계를 정확하게 표현할 수 있는 데이터의 논리적 구조를 결정하는 것인데 이것이 곧 관계스킴 (relational scheme)의 설계 방법이다.

관계 데이터베이스의 설계문제는 바로 이 요소들로부터 릴레이션으로 변환할 때 어떠한 원리에 입각해서 해야 되는가를 규명하는데 있다. 즉, 효율적인 데이터처리, 데이터의 일관성유지가 목적이다.

We have assumed that attributes are grouped to form a relation schema by using the common sense of the database designer or by mapping a schema specified in the Entity-relationship model into a relational schema. However, we did not have any formal measure of why one grouping of attributes into a relation schema may be better than another. There was no measure of appropriateness of quality of the design other than the intuition of the designer.

There are two levels at which we can discuss the “goodness” of relation schemas. The first is the logical level, which refers to how the users interpret the relation schemas and the meaning of their attributes. The second is the manipulation (or storage) level, which refers to how the tuples in a base relation are stored and updated.

Let us contrast the idea of semantic modeling (and of the ER model in particular) with the normalization discipline.

The normalization discipline involves reducing large relations to smaller ones: it assumes that we have some small number of large relations as input, and it maps large relations into small ones.

The Two approaches (Top-down design and Normalization) complement each other:

Use the ER approach or some other top-down methodology to generate “large” relations representing regular entities, weak entities, etc. and then
Use the ideas of further normalization to break those “large” relations down into “small” ones.

이상현상

삭제이상

한 튜플을 삭제 함으로서 유지해야 될 정보까지도 삭제되는 연쇄삭제(triggered deletion) 현상

삽입이상

어떤 데이터를 삽입하려고 할 때 불필요하고 원하지 않는 데이터도 함께 삽입해야만 되는 현상

갱신이상

중복된 튜플들 중에서 일부 애트리뷰트 값을 갱신 시킴으로써 정보의 모순성(inconsistency)이 생기는 현상

애트리뷰트들 간의 종속성(dependency)를 분석해서 기본적으로 하나의 종속성은 하나의 릴레이션으로 표현되도록 분해하면 된다.

이 분해 과정을 정규화(normalization)라 한다.

함수종속과 기본 정규형

정규화 이론의 기본적인 아이디어는 서로 독립적인 관계는 별개의 릴레이션으로 표현한다는 것이다.

함수종속 (Functional Dependency)

어떤 릴레이션 R에서, 애트리뷰트 X의 값 각각에 대해 애트리뷰트 Y의 값이 오직 하나만 연관되어 있을 때Y는 X에 함수종속이라 하고R.X  R.Y로 표기한다.

여기서 X나 Y는 복합 애트리뷰트가 될 수도 있다.

함수종속 다이아그램 (Functional Dependency Diagram)

예) 수강.<학번, 과목번호수강.성적

수강.학번 수강.학년

기본 정규형

①제1정규형 (1NF: First Normal Form)

어떤 릴레이션 R의 모든 도메인이 원자값 (atomic value) 만으로 되어 있다면 제1정규형에 속한다.

②제2정규형 (2NF: Second Normal Form)

어떤 릴레이션 R이1NF이고 키에 속하지 않는 애트리뷰트 모두가 기본 키에 완전 함수종속이면 제2정규형에 속한다.

1NF 이면서2NF이 아닌 릴레이션은 언제나 적당한 프로젝션을 통하여 의미 상으로 동등한 두개의2NF의 릴레이션으로 분해할 수 있다.

관계R.A  R.B, R.B  R.C가 성립하면 논리적으로R.A  R.C가 성립한다. 이때 애트리뷰트 C는 A에 이행적 함수종속 (Transitive FD)이라고 한다.

③제3정규형 (3NF: Third Normal Form)

어떤 릴레이션 R가2NF이고 키에 속하지 않은 모든 애트리뷰트 들이 기본 키에 이행적 함수 종속이 아닐 때 제3정규형에 속한다.

제3정규형의 정의는 릴레이션이 복수의 후보 키를 가지고 있고, 후보 키들이 복합 애트리뷰트 들로 구성되고, 후부 키들이 설로 중첩되는 경우에는 적용할 수 없다.

④보이스/코드 정규형 (BCNF)

릴레이션 R의 모든 결정자 (determinant)가 후보 키 (candidate key)이면 릴레이션 R는 보이스/코드 정규형(BCNF)에 속한다.

다치종속과 제4정규형

제1정규형, 제2정규형, 제3정규형, 보이스/코드 정규형의 구분 기준 즉, 정규화 기준은 몇 개의 함수족속 관계를 하나의 릴레이션에 함께 표현하려 했기 때문에 문제가 발생하고 이의 해결을 위해 서로 독립적인 관계를 별개의 릴레이션으로 분해하는 것이었다.

그러나, 이상현상은 함수종속을 통해서만 일어나는 것은 아니고, 서로 무관한 것을 한 릴레이션에 표현하는 데서도 발생한다. 이러한 현상을 일으키는 새로운 유형의 종속관계를 다치종속(MVD: Multivalued Dependency)라 한다.

지금까지의 함수 종속은 이 다치종속의 특수 형태이다.

다치종속 (MVD: Multivalued Dependency)

A, B, C세 개의 애트리뷰트를 가진 릴레이션 R에서 애트리뷰트 상 (A, C) 값에 대응하는 B 값의 집합이 A값에만 종속되고 C값에는 독립이면 다치종속 R.A > R.B가 성립한다. 이때 A, B, C는 복합 애트리뷰트일 수도 있다.

또 릴레이션 R(A, B, C)에서 MVD R.A > R.B가 성립하면R.A > R.C도 동시에 성립한다는 것도 증명되었다.

릴레이션 R(A, B, C)에 MVD A > B|C가 존재하면 두 프로잭션R1(A, B)와 R2(A, C)로 무손실 분해될 수 있다.

제4정규형 (4NF: Fourth Normal Form)

릴레이션 R에 MVD A> B가 존재할 때 R의 모든 애트리뷰트 들도 또한 A에 함수종속 (즉, R의 모든 애트리뷰트 X에 대해 A  X이고 A가 후보키) 이면 릴레이션 R는 제4정규형에 속한다.

1/11