DataBase Management System

Introduction

Definition: A database management system is a collection of interrelated data and a set of programs to access those data. The primary goal of DBMS is to store and retrieve database information that is convenient and efficient.

File Processing System:

Typical file processing system stores permanent records in various files, and it requires application program to retrieve data from, and add records to, the appropriate file.

Traditional file processing system has several disadvantages:

Data redundancy and inconsistency: Since different programmers create files and application programs over a long period of time, the files may have been saved in various formats and same information might have been saved more than once. This redundancy leads to more storage and access cost. Also, data inconsistency may arise with the same data item having different values in files.

Difficulty in accessing data: As each application program is created by different programmers for retrieving information from system. A new requirement of information may need the system user to write a new program which is tedious and time consuming.

Data Isolation: Data are scattered across various files in different formats and hence writing a new program to retrieve data from these files is difficult.

Integrity Problems: The data values stored in system must satisfy certain types of consistency constraints. These constraints are implemented by adding appropriate code for to the application program. When new constraints are applied it is hard to implement them into application programs.

Atomicity problems: Atomicity is a desirable characteristic of any other transaction which is performed in system. It specifies that either the transaction is done completely or it is not taken at all, as incomplete transaction leads to inconsistent data in database.

Concurrent access anomalies:Many systems for the sake of performance allow multiple users to access the system simultaneously such that more than one user can access and manipulate a data item leading to inconsistent data. Such an erroneous situation is called as concurrent access anomalies.

Security problems: Not every user of the system should be able to access all the data in the system, which is hard to implement in traditional system.

Advantages of a DBMS :

Using a DBMS for managing data has several advantages :

Data Independence : Application programs should not be exposed to the details of data representation and storage. So, DBMS provides the abstract view of the data that hides such details.

Efficient data access : A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently.

Data Integrity and security : DBMS enforces various integrity constraints. For example before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded.

Data Administration : When several users share the data, centralizing the administration of data can offer significant improvements . Experienced professionals such as DBA can understand different groups accessing various data.

Concurrent access and Crash recovery : DBMS schedules concurrent access to the data such that each user can feel that he is accessing data individually.

Reduced Application Development : The DBMS supports important functions that are common to many applications. Hence the time taken for new application development is reduced.

The various database models

Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:

  • The hierarchical model: The data is sorted hierarchically, using a downward tree. This model uses pointers to navigate between stored data. It was the first DBMS model.
  • The network model: like the hierarchical model, this model uses pointers toward stored data. However, it does not necessarily use a downward tree structure.
  • The relational model (RDBMS, Relational database management system): The data is stored in two-dimensional tables (rows and columns). The data is manipulated based on the relational theory of mathematics.
  • The deductive model: Data is represented as a table, but is manipulated using predicate calculus.
  • The object model (ODBMS, object-oriented database management system): the data is stored in the form of objects, which are structures called classes that display the data within. The fields are instances of these classes

By the late 1990s, relational databases were the most commonly used (comprising about three-quarters of all databases).

DBMS Levels of Abstraction

The main objectives of DBMS are to store and retrieve information efficiently. all the users should be able to access same data. the designer use complex data structure to represent the data so that the data can be efficiently stored and retrieved. the developers hide the complexity from users through several levels of abstractions.

Data Independence :
Data independence means the internal structure of the database should be unaffected by changes to physical aspects of storage because of data independence, the database administrator can change the database storage structure without affecting the users view.
the different levels of data abstraction are:

Physical level

It is concerned with the physical storage of information on the storage device. It provides the internal view of the actual physical storage of data.

Logical Level

The logical level describes what data are stored in the database and what relationships exits among those data. Logical level describes the entire database in terms of small number of simple structures. The implementation of simple structure of the logical level may involve complex physical level structures.

View level

View level is the highest level of abstraction. it is the view that the individual user of the database has. there can be many view level abstractions of the same data. the different levels of data abstractions are:-

  1. Database Instances : Database change over time as information in inserted & deleted. The collection of information stored in the database at a particular moment is called an instance of database.
  2. Database Schema : The overall designs of database is called database schema. A schema is a collection of named objects and it provides a logical classification of objects in the database.

There are three kinds of schema defined at the three levels of abstraction :

Conceptual Schema : Conceptual schema defines the data in terms of the data model of the DBMS. It describes all the relations in database and the relationships among them.

Physical Schema : The physical scema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes.

External Schema : External schemas allow data access to be customized at the level of individual user or group of users. It consists of a collection of one or more views and relations from the conceptual schema. It is guided by end user requirements.

STRUCTURE OF DBMS

DBMS (Database Management System) acts as an interface between the user and the database. The user requests the DBMS to perform various operations (insert, delete, update and retrieval) on the database. The components of DBMS perform these requested operations on the database and provide necessary data to the users. The various components of DBMS are shown below: -

Fig. 2.1 Structure Of DBMS

1. DDL Compiler - Data Description Language compiler processes schema definitions specified in the DDL. It includes metadata information such as the name of the files, data items, storage details of each file, mapping information and constraints etc.

2. DML Compiler and Query optimizer - The DML commands such as insert, update, delete, retrieve from the application program are sent to the DML compiler for compilation into object code for database access. The object code is then optimized in the best way to execute a query by the query optimizer and then send to the data manager.

3. Data Manager - The Data Manager is the central software component of the DBMS also knows as Database Control System.

The Main Functions Of Data Manager Are: –

• Convert operations in user's Queries coming from the application programs or combination of DML Compiler and Query optimizer which is known as Query Processor from user's logical view to physical file system.

• Controls DBMS information access that is stored on disk.

• It also controls handling buffers in main memory.

• It also enforces constraints to maintain consistency and integrity of the data.

• It also synchronizes the simultaneous operations performed by the concurrent users.

• It also controls the backup and recovery operations.

4. Data Dictionary - Data Dictionary is a repository of description of data in the database. It contains information about

• Data - names of the tables, names of attributes of each table, length of attributes, and number of rows in each table.

• Relationships between database transactions and data items referenced by them which is useful in determining which transactions are affected when certain data definitions are changed.

• Constraints on data i.e. range of values permitted.

• Detailed information on physical database design such as storage structure, access paths, files and record sizes.

• Access Authorization - is the Description of database users their responsibilities and their access rights.

• Usage statistics such as frequency of query and transactions.

Data dictionary is used to actually control the data integrity, database operation and accuracy. It may be used as a important part of the DBMS.

Importance of Data Dictionary -

Data Dictionary is necessary in the databases due to following reasons:

• It improves the control of DBA over the information system and user's understanding of use ofthe system.

• It helps in documentating the database design process by storing documentation of the result of every design phase and design decisions.

• It helps in searching the views on the database definitions of those views.

• It provides great assistance in producing a report of which data elements (i.e. data values) are usedin all the programs.

• It promotes data independence i.e. by addition or modifications of structures in the database application program are not effected.

5. Data Files - It contains the data portion of the database.

6. Compiled DML - The DML complier converts the high level Queries into low level file access commands known as compiled DML.

7. End Users - They are already discussed in previous section.

Entity-Relationship Model

The Entity-Relation (E-R) model is a detailed, logical representation of data and related business rules in the organization. The E-R model is represented by the E-R diagram (ERD) which is an important tool used by systems analysts to develop databases. The ERD is also used as acommunication tool between database designer and end-users. In this module, students learn how to construct E-R diagrams.

The Entity-Relation Diagram

An entity-relation Diagram (ERD) is a detailed, logical representation of the

  • entities,
  • attributes, and
  • relations

of the data of an organization.It also represents the business rules (objectives, policies and procedures) which govern how data are handled and stored. TheERD was introduced by Chen in 1976.The ERD is derived by systems analysts during the analysis phase of the business requirements.

The E-R model is expressed in terms of:

  • ENTITIES (represented by rectangle ),
  • ATTRIBUTES (ellipse) of the entities and relation, and
  • RELATIONS or ASSOCIATION (lines and diamond).

Notation of E-R Modeling

Entities and Attributes
An entity type is a person, place, object or event or concept about which the organization wishes to maintain data. Represented by a rectangle.

Examples of entities are: employee, customer, order, item, student, building, equipment, account, course.

Entity typevs. Entity instance (occurrences)
An entity type (shown on the ERD as rectangle) represents a collection of all entities that share common properties or characteristics. The entity type is described once in the database using meta data definition. For example, there is only one EMPLOYEE type defined in the schema.

An entity instance is a single occurrence of an entity type -- for example

Mary, John and Tom are 3 instances of the EMPLOYEE type.

Strong entity (shown as a single line rectangle) is an entity that exists independently of other entity type, versus
Weak entity (Shown as a double-lined rectangle) its existence depends on some other entity type.For example, consider a database for tracking employees. Dependent persons (say, Mary and Bill) exist in the employee database because their father, who is an employee, exists in the Employee table.

Attributes
Each entity type has a set of attributes associated with it. Attributes are represented graphically as ellipses.Entities and their attributes can also be presented in a non-graphical (more mathematical) form as:
Entity_name [ attribute1, attribute 2,... {attribute5}...... ]
Where { } brackets represent multiples of an attribute (something we usually want to avoid).Example: STUDENT [ssn, name, major, address, GPA, {course_id}]

The example depicts a STUDENT entity with several attributes:
[ssn, name, major, address, GPA, and course_id ].
Where: ssn = social security number, GPA = grade point average.

The data in the table below illustrates threeinstances of the STUDENT entity type.

SSN / Name / Major / Address / GPA / Course1 / Course2 / Course3 / Course . . n
111-22-1111 / Smith Bill / IST / 100 Main St, Denver. CO / 3.0 / IT 100 / IT420 / Math 100
222-11-2333 / Brown Ann / Bus / 1 South St. Phila, PA 19122 / 2.2 / Eng 100 / IT100
333-11-1134 / Thomas Joe / IST / 231 Walnut St. NY, NY 11111

Notice the list of attributes (the courses). We will learn later that a proper database design precludes multivalued attributes, and that this table has to be 'normalized' before we can create the schema for this table.
Classifications (of kinds of) Attributes

Simple (or atomic) attribute can't be broken down into smaller components.
Example, the SSN and Major in the STUDENT entity above are simpleattributes

Composite attributes, can be broken down into component parts.
Example, the address attribute is made up of: street, city, state and zip code.
Single-valued attribute (represented by a single line ellipse) is an attribute which has only onevalue in a table Example: SSN for students in a University -- this number can only appear once -- SSNs are unique for each student

Multivalued attribute has multiple values per instance, such as: all courses
taken by a student, shown above. It is represented by double line ellipse
Stored attribute - a regular attribute

Derived attribute which is calculated from a stored attribute, and thus should not beshown on the ERD.

In our example, the AGE attribute could becalculated from the DOB (Date-of-birth) attribute.

A Special Attribute -- Keys/ Indexes

An key is an attribute or combination of attributes that uniquely identifies an individual instance ofan entity type

A key attribute is underlined on the ERD, as the ssn column in the Employee ERDabove.

Selection of the keys is an important part of the database design

  • It affects integrity validation and performance
  • Declaring an attribute as a key, and declaring it 'duplicates not allowed' we can prevent users form entering erroneous duplicate data
  • The key can also maintain integrity by linking the key with a key in another table.

A key can be an atomic attribute (such as: ssn) or made from several
attributes (composite key). We use a composite key, if a single key can'tuniquely identify the record

Example 1: Airline flight numbers -- same flight goes several times a week if not everyday. Use flight number and date as the key.

Example 2: Magazine Subsriber ID - use part of last name, date of expiration, and first initials.For example FRIE092301FL

Here are important rules for selecting a key:

. The key should be an attribute that doesn't change

Example: ssn,employee_ID,SKU (stock-keeping-unit), license plate number.

2. The key can't be a null. It must have a valid value

Example: Actual graduation date for a student would be a bad choice

3. The key should guaranteed to always have valid value

Example: SSN - T numbers get changed which often causes problems in a DB

4. Avoid using keys that have intelligence or codes built in

Example: A building code (whichmight later be subject to change)

5. Consider replacing composite attributes (with long names) by a short, surrogate key

Naming and Defining Attributes

Naming --

  • Use a noun and be sure it is unique
  • Settle on a standard for the organization (client) you are serving - lots of common examples in the book
  • Be consistent across entity types

Defining Attributes --

  • What is it and why is it important
  • If name provides obvious description, the description might be omitted
  • Definition should parallel the name
  • Make sure the description clearly indicates exactly what is included
  • Include aliases or alternative names in the description
  • Where appropriate, indicate the source of the data -- where the information comes from
  • Indicate if value is optional or required
  • For multi-valued attributes -- maximum and minimum number of occurrences etc

Sample ERD for employees database:

This ERD illustrates the attribute types. Let's assume that we are designing a database for tracking employees and their dependents. The EMPLOYEE is a regular entity and the DEPENDENT entity is a weak entity because it depends on the existence of the EMPLOYEE. The address is a composite attribute and the skill and phone are multi-valued attributes. (An employee has several skills and several phones).
The ERD also show some typical errors; the EMPLOYEE REPORT is an inappropriate entity because it can be derived from attributes in other entities.Similarly, the AGE attribute is redundant because it can be calculated from the DOB attribute, and the COMPANY entity is also irrelevant to this case.