CSC 532

Enterprise database management

Pranav Raj Sharma

Database management system is the backbone of most companies from small size to large sizethat handle their sales, inventory, day to day transaction, order history, purchase, financial data and many more. An enterprise DBMS is designed to manage large number of users, huge databases, and to integrate run multiple types of software applications. Enterprise DBMS offers several important features including a multi-processor support, parallel processing, distributed transactions, concurrency control, data security, high scalability, and high availability features such as clustering and replications. Enterprise database management system can directly communicate with frontend application such as SAP, Oracle E-business, J.D Edwards, Dynamics AX, Baan and so forth to perform daily operation of many small to large scale organizations. Some of largest DBMS include Oracle, Microsoft SQL Server, Informix, DB2, and Sybase. Today Oracle and Microsoft SQL Serverare two of the largestDBMS vendors competing for market share of medium to large organization. Therefore, database is one of the most crucial entity of software engineering.

Enterprise database management ensures database quality, and performance which serves both internal and external users. Continuous and growing data volumes, coupled with the legal and regulatory requirements to save data are impacting application performance and straining resources of the organizations.

In a typical database management system actual database is stored in a disk. In order to ensure fault tolerance and higher performance various Redundant Array of Independent Disks- RAID level are used in software engineering industry. RAID can be defined as redundant array of independent disk or redundant array of inexpensive disk. Some of the basic Raid level that is used in EDBMS is listed below.

  1. Raid 0: It offers block striping or data striping meaning data will be spread across various block in a given disk which promotes faster reads/write due to parallel processing. But there is no fault tolerance due to lack of redundancy meaning if a disk which contains database fails, data will be lost permanently.
  2. Raid 1: It offers redundancy of disk or mirroring but there is no data striping. Due to redundancy feature, it offers faster read but slower write because writing involves 2 writing in 2 disks at the same time. On a positive note it offers fault tolerance due to mirroring or redundant disk feature.
  3. Raid 0+1 or Raid 10: It offers fault tolerance and mirroring. It offers excellent performance due to parallel reads and writes. It also offers better durability due to its mirroring feature.
  4. Raid5: Just like Raid 0+1, Raid5 offers both fault tolerance and faster read/write access. It involves parity blocks distributed over all disk.

Most industry EDBMS recommends that we use RAID 10 or at least Raid 5 for parallel transaction processing and fault tolerance purposes especially for production environment.

In a typical DBMS, main memory/ buffer stores current data in use. Likewise, the actual databases are stored in disk sub system and the archived data & backup files are stored in offsite tape. Most Enterprise level Database Management system can do disk space and buffer management. Data must be in RAM (buffer) for DBMS to operate on it. Most transactions retrieve their data from main memory or buffer pool where the data pages are cached & flushed out on periodic basis. Buffer pool acts as a mediator between disk data and transaction manager. Because reading and writing data to & from disk can be very expensive, data pages stays in main memory or buffer pool for better access. Therefore, the larger the server memory, the better the faster the transaction will be. The size of main memory/ Ram is dictated by size of databases on the server, the amount of transactions throughput, and the size of data pages that gets stored on the buffer pool. The amount of data pages which get stored in buffer pool is called buffer cache. Therefore if the buffer cache is larger than main memory or the server RAM, data pages will get flushed out of memory very quickly which will lead to expensive query and lower page life expectancy of database pages. Low Page life expectancy is a good indicator of memory pressure. Page life expectancy can be defined as the amount of time data page stays on buffer pool before it gets flushed from the memory. In Microsoft SQL Server, if the average Page life Expectancy (PLE) is consistently less than 300, they recommend that we either increase the server memory or increase query efficiency in order to alleviate memory pressure.

All Enterprise DBMS have database object called Index which is basically an underlining structure in a database that determines how data is stored and retrieved in a database thereby expediting the query processing. Indexes are one of the most powerful tools for database administrator when it comes to troubleshooting slowness and performance tuning. When rows on indexes are not sorted such tables are called heap. The problem with heap table structure is it involves table scan for query processing. If a table is small, table scan is not very costly. However, if a table is substantially large, table scan will lead to high query wait time, blocking locks and higher disk I/O which will all lead to system slowness. What indexing does is it not only sort the database table but also avoid table scan by searching/ filtering records in a table based on user’s query.

Most commercial Database Management Systems have two major types of index: Clustered Index and Non-Clustered Index. A database table can have only one Clustered index. This is because in clustered index, the leaf node contains the actual physical data. So, the Clustered index will map to the physical data pages in tables. In non-clustered index, the logical structure of index is different from the actual physical structure of the data on tables. Therefore, each table can have several non-clustered indexes as opposed to single clustered index per table. In enterprise Database Management system, most indexes come with databases that are generally created by ERP applications. However, during a course of time, as more and more users are connected to databases and they run different queries/reports, new indexes must be added to satisfy user’s query.

Clustered indexes are typically created on one key column or column used frequently such as

one that is referenced by where clause. Generally speaking clustered indexes are suitable for

range queries where large amount of data is queried. Also query that uses order by or group by

clause can largely benefit from clustered indexes. Moreover, clustered index can be more

effective when data is accessed sequentially/frequently or when data needs to be sorted

because data is automatically sorted when clustered index is added to a table and doesn’t have

to be re-sorted. Queries that involve small data set will benefit from creation of non-clustered index. Nonclustered indexes can be added to address queries that are not covered by clustered index. For a given table, you can have only one clustered index at most. As a good practice, clustered index should be created before adding non-clustered indexes on a table. This is because Nonclustered index relies on location of index data within the clustered table(Source: Database Management Systems- Third Edition: Author RamaKrishnan, Gehrke).

In a database world, transaction is an atomic unit of work. Transactions are called queries when they request read only access. In other words, read only transaction is called query. Multiple concurrent transaction can interfere each other when they are both connected to the same database resource performing either read or write operation. Generally speaking when 2 or more concurrent transactions involve read operation from the same table; it should not result in conflict. However, when there is at least one read transaction and multiple write transaction trying to access same resource or multiple transactions trying to modifythe same table, it results in conflict. In order to maintain isolation level and eliminate transaction conflict, each database management system has feature called lock escalation and concurrency control which basically regulate transactions in order to reduce conflicts. When multiple concurrent read transactions access same table in a database they will use Shared lock (SLOCK). Likewise, when there is multiple read/write transactions, they will each transaction will use exclusive lock in order to maintain integrity of database. Concurrency control can be applied at various levels that include filed level, record level, page level, extent level, file level, and database level. In SQL Server there is 3 type of row versioning to reduce transaction conflicts which are: 1. Read committed isolation level 2. Read committed snapshot isolation level 3. Snapshot isolation level.

Regardless of which vendors you use for managing your data, any Enterprise Database Management system must guarantee the following four fundamental properties of transactions known as ACID properties ad defined below.

  1. Atomicity: Transaction is all or nothing. Transaction will both succeed to completion and commit the transaction. Else it will rollback if the transaction is incomplete or unsuccessful. The atomicity property of transaction is to ensure that integrity of a database.
  1. Consistency: Database Management System must take a transaction from one consistent state to another. Transaction should not be in intermediate state. To ensure consistency and integrity of a database, there are various constraint such as primary key constraint, not null constraint, check constraint, referential constraint that includes primary & foreign key.
  1. Isolation: EDBMS should give illusion to its user as being sole user of a database. In an enterprise environment, hundreds of thousands of user are connected at the same time. These users should not interfere with one another.
  1. Durability: In Enterprise database Management system, data once committed and written to disk must stay permanent unless somebody deletes them on purpose. In other words, data in database should stay permanent unless somebody deletes them.

Most Enterprise Database Management vendors such as IBM, Informix, Oracle, SQL Server uses a standard relational query language called SQL. SQL is a structured query language originally developed by IBM in the 1970’s which currently uses the latest SQL -99 standard which supports procedural construct (such as if, then, else) and Object Oriented constructs such as inheritance, polymorphism.

A typical database management system has following layers described below.

  1. Query optimization & execution layer

Produces most efficient query execution plan based on user queries.

  1. Database Operator layer

Implements physical data model operators such as relational operators; select, project, join.

  1. Buffer management

Deals with buffer and manages how databases access it. So basically buffer management layer partitions the main memory allocated to the DBMS into buffer page frame and brings data pages to and from disk as requested by file manager.

  1. Disk Space Management

Supports the file concept to higher layers and supports access paths to the data in those files such as indexes.

In today’s commercial world the colorful front end by itself wouldn't make any sense without the robust backend database. The advancement in database has made our lives easier.

References:

1. RamaKrishnan, Gehrke, “Database Management Systems,” Third Edition, 2009.

2. Microsoft Corporation

3. University of Pennsylvania

4. Microsoft Corporation

6. Wikipedia

5. Database Systems: The Complete Book, Stanford University:

Page 1