DB2 Basics

DB2 - Database 2 is an IBM product. It is MVS’s relational database management system. It exists as one or more subsystem within MVS.

DB2 Objects: Database, Table space, Table, Index Space, Index, Storage Group,

View, Synonym, Alias

STOGROUPS
  • For DB2 Datasets, we have 2 options for storage allocation –
  • Storage Group
  • User-Defined Vsam
  • Dataset allocation is performed by DB2 through IBM’s Data Facility Product (DFP)
  • Max no of volumes per Stogroup is 133 (Ideally 3 or 4) and all volumes must be of same type – 3380, 3390, etc.
  • CREATE STOGROUP TESTSG1

VOLUMES(‘VOL1’, ‘VOL2’,…) .

  • Default STOGROUP is SYSDEFLT. It is created when DB2 is installed. This should be avoided.
  • The underlying VSAM datasets are created and maintained by DB2 .
Using User_Defined VSAM datasets
  • 2 types of VSAM datasets can be used to represent DB2 Tablespaces and Indexspaces - ESDS AND LDS
  • These are not used as plain VSAM datasets but DB2 uses VSAM Media Manager to access them. Also DB2 does additional formatting of the datasets because of which they cant be treated like Standard Vsam.
  • DB2 can use LDS more efficiently as it has a 4k CI size and has no control information like ESDS.
  • DEFINE CLUSTER –

(NAME(vcat.dsndbc.ddddddd.ssssssss.I0001.Annn) –

LINEAR –

REUSE -

VOLUMES(vol1,vol2,…) -

CYLINDER(pri sec) –

SHAREOPTIONS(3 3)

) -

DATA -

(NAME(vcat.dsndbd.ddddddd.ssssssss.I0001.Annn))

vcat represents an ICF catalog

BUFFERPOOL

Data is first read from a table which is in a VSAM dataset and is moved into a bufferpool and from there it is sent to the requester.

There are totally 60 Bufferpool options – 50 4K Bufferpools (BP0 thru BP49) and 10 32K bufferpools – BP32K thru BP32K9.

Database -

  • The total collection of stored data is divied into a number of user databases and a few system databases.
  • It has a group of logically related Tablespaces and Indexspaces, which in turn contain tables and indexes respectively.
  • Database is the unit of START and STOP for the system administrator.
  • Default Database is DSNDB04 which is created during installation.
  • The parameters used for the creation are –
  •  STOGROUP - Default is SYSDEFLT

BUFFER POOL - Default is BP0

  • DBD is Database Descriptor – It is a control structure used by DB2 to manage the objects which are under the control of a given database.
  • Whenever any DB2 object in a database is created,altered or dropped, the DBD is modified .
  • The DBD contains a mapping of the tablespaces, tables and indexes defined in a database.
  • An X lock is acquired on a DBD during the execution of the DDL. So it is better to
  • Execute DDL when there is little or no activity.

CREATE DATABASE DATPRPDB

STOGROUP D3DPSMS

BUFFERPOOL BP3

INDEXBP BP4

CCSID EBCDIC;

Table Space –

  • Table space contains one or more tables. Index space contains exactly one index. A table and all of its indexes will be contained within a single database.
  • It is a dynamically extended collection of pages. Page is a block of physical storage and it is the unit of I/O. The pages are all of 4K size for index spaces and 32K or 4K for table spaces.
  • Table space is the unit of recovery and reorganization.

Types of table spaces –

Simple –

can contain one or more tables. Here within a single page, records from

more than one table can be interleaved.

Disadvantages – Affects Concurrent access, data availability, space

management

Partitioned –

can contain exactly one table which is so huge that it cant be maintained as a

single unit.

So it is partitioned based on the value ranges of one or more columns in a table. For this column or its combination, a clustering index must be defined. The key can be max 40 bytes . Columns in a partitioning index cannot be updated.

Each partition is independent of one another. Individual partitions can be associated with different storage groups.

To isolate specific data areas in dedicated datasets.

To improve data availability

To improve recoverability

To encourage parallelism (Query Parallelism is breaking the data access for a query into multiple I/O streams that is execute in parallel and which is useful in reducing the overall elapsed time)

For partitioned table space, individual partitions can be reorganized or

recovered

Segmented –

Can contain one or more tables. The table space is divided into segments. Each segment consists of n pages, where

n is a 4x be <= 64

and can contain records of only one table.

Advantages over Simple TS -

  • Concurrent access is possible as data in in different segments
  • Handles free space more efficiently.
  • Mass delete is more efficient as data is deleted using the space map and without reading it (unlike simple ts)
  • Space can be reclaimed from dropped tables immediately.

Parameters for Segmented TS are –

SEGSIZE – denotes the no of pages assigned to a segment

Parameters for Partitioned TS are –

NUMPARTS – no of partitions

LARGE is used to allocate as much as 1TB of data in a tablespace and

No of partitions can be max 254. If not, max storage amount is 64 GB and

max no of partitions is 64

LOCKSIZE – Denotes the type of locking for a particular tablespace.

Options are

TABLESPACE – Used when Shared access to multiple users is not required

or when the data is Read-Only

TABLE – Applicable only for Segmented TS and when Shared access is not required

PAGE - To facilitate shared access (i.e concurrency)

ROW - To facilitate shared access to data which could be on the same page.

Used only when Page Locks weren’t sufficient for concurrent access.

ANY – Means Let DB2 decide for each case, starting from Page Lock

LOCKMAX – Max no of Row or Page level Lock for any one user in a tablespace.

If this max is reached, the lock gets escalated to table or tablespace lock.

Values can be

0 -> lock escalation should never occur

SYSTEM -> defaults to the system-wide value specified in DSNZPARMS

Integer from 1 to 2,147,483,647

PRIQTY, SECQTY

PCTFREE – Denotes what % of each page should remain free for future inserts

FREEPAGE – Denotes the no of pages after which an empty page is available.

These 2 are useful in reducing the frequency of reorganization, reducing contention and increasing the efficiency of insertion

MAXROWS – Max no of rows on a tablespace page. Dflt is 255

CREATE TABLESPACE ACTSACCT IN STEVDB01

NUMPARTS 32

(PART 1 USING STOGROUP STEVESG

PRIQTY 252

SECQTY 252

ERASE NO

FREEPAGE 0

PCTFREE 0

TRACKMOD YES

COMPRESS YES

,PART 2 USING STOGROUP STEVESG

PRIQTY 252

SECQTY 252

ERASE NO

FREEPAGE 0

PCTFREE 0

TRACKMOD YES

COMPRESS YES

, ……………….

)

BUFFERPOOL BP3

LOCKSIZE PAGE

LOCKMAX 0

LOCKPART YES

CLOSE YES

CCSID EBCDIC;

CREATE TABLESPACE ACTSACCG IN TAC4DB01

USING STOGROUP D3DPSMS

PRIQTY 2880

SECQTY 7220

ERASE NO

FREEPAGE 0

PCTFREE 5

TRACKMOD NO

SEGSIZE 64

BUFFERPOOL BP3

LOCKSIZE PAGE

LOCKMAX 0

LOCKPART NO

CLOSE YES

COMPRESS YES

CCSID EBCDIC;

DB2 DATA TYPES:

Numeric -

Smallint - 2 byte (binary integer)

Integer - 4 byte (")

Decimal(p,q) - Packed Decimal; Total p digits and decimal q digits; (p+1) or (p+2)/2 bytes;p<54

Float(p) - If p<22, single precision, 4 bytes, else double precision, 8 bytes

Note: All have a DEFAULT value of 0

String -

Character - String of n bytes (fixed) where n < 255

Note: DEFAULT is blanks

Varchar - String of n bytes (variable) where exact length is n+2, 2 for the actual length

Max size < page size within the tablespac.

Note: DEFAULT is a null string

Date/Time -

Date - unsigned packed decimal in format yyyymmdd - 4 bytes

Time - """ hhmmss - 3 bytes

Timestamp -same - yyyymmddhhmmssnnnnnn - 10 bytes

Note: DEFAULT is CURRENT DATE/TIME/TIMESTAMP

Table –

  • Table that is physically stored within a table space contains one or more stored records.
  • There will be one stored record per row in the base table (the table that is visible to the user).
  • The stored record is a byte string containing
  • A prefix denoting the internal system identifier
  • For each field, a length prefix denoting the actual length, followed by the actual field value in a encoded form.
  • The stored records are internally addressed by RID – Record identifier – this consists of the page number within the table space and a byte offset from the foot of the page. This in turn contains the byte offset from the top of the page. This way when the records are rearranged within a page, the RID wont change. Only the local offset at the foot of the page will change.

Special Registers:

They are Zero Argument built-in Scalar functions, which return a scalar value.

1. USER - Returns the primary Authorisation ID.

2. CURRENT SQLID - Returns the Current Authorisation ID - This could be primary Id or secondary

ID assigned to a specific functional area.

3. CURRENT SERVER - Returns the ID of the current server (useful in Distributed Database mgmt)

4. CURRENT PACKAGESET - Returns ID of the collection that is currently in use.

5. CURRENT DATE/TIME/TIMESTAMP - Returns the Current Date/ Time /TimeStamp.

6. CURRENT TIMEZONE - Returns a time duration representing the displacement of the local

time zone from Greenwich Mean Time.

CREATE TABLE DEPARTMENT

(DEPTNOCHAR(3) NOT NULL

,DEPTNAME VARCHAR(29) NOT NULL

,MGRNOCHAR(6)

,ADMRDEPTCHAR(3)NOT NULL WITH DEFAULT '200'

,LOCATIONCHAR(16) NOT NULL WITH DEFAULT

,PRIMARY KEY(DEPTNO)

)

-----IN <DATABASE>.<TABLESPACE> (The default is used if not specified)

;

Note:

Table can also be created as

CREATE TABLE DUPDEPT1 LIKE DEPARTMENT;

CREATE TABLE DUPDEPT2 LIKE DEPARTMENT

(DEPT_PHONE CHAR(10));

CREATE TABLE EMPLOYEE

(EMPNO CHAR(6) NOT NULL

,FNAME VARCHAR(12) NOT NULL

,LNAME VARCHAR(15) NOT NULL

,WORKDEPT CHAR(3)

,PHONENO CHAR(7)

,HIREDATE DATE

,JOB CHAR(8)

,SEX CHAR(1)

,DOB DATE

,SALARY DECIMAL(9,2)

,BONUS DECIMAL(9,2)

,COMM DECIMAL(9,2)

,PRIMARY KEY(EMPNO))

WITH RESTRICT ON DROP --> this will avoid table from getting dropped.

;

ALTER TABLE:

ALTER TABLE EMPLOYEE

ADD TAX DECIMAL(9,2) ;

And this command is also used to add or drop constraint - primary/foreign keys/check

Types of Constraints:

(1)Check Constraint:

-->Used to enforce specific restriction on the values of a column.

This will be checked for every update/insert .

-->Can use condition operators,between,in,like,null and can have multiple condtions combined

using 'and' / 'or'

-->CONSTRAINT PHONE_CHK CHECK (PHONENO >= '0000000' AND PHONENO <= '9999999')

The 2nd operand can be another column. But they shd be of same data type as first.

-->Advantages:

(1)The basic business rules which need to be applied can be done at the database

level. And this can save additional programming in applications which modify that

data.

(2)It ensures consitency and data integrity -

as it avoids bypassing of rules by adhoc data modification

as the rules are applied everytime data is modified

-->Watch out the following- (these is not checked)

(1)If the conditions coded contradict each other,

like check(Phoneno > '1000000' and phoneno < '0999999')

(2)If the conditions coded contradict the defaults

(3)redundant conditions are allowed - this cud impact performance

(2)Referential Integrity:

-->Means of enuring data integrity between related tables using a parent-child relationship

The one with Primary key is parent. The one with foreign key is child table.

-->Alter Table EMPLOYEE

Foreign Key Fkey1 (workdept)

References Department ON DELETE CASCADE

Alter Table EMPLOYEE

Foreign Key Fkey2 (workdept)

References Department ON DELETE RESTRICT

Alter Table DEPARTMENT

Foreign Key Fkey3 (ADMRDEPT)

References Department ON DELETE CASCADE

--> A self referencing constraint shd always use 'ON DELETE CASCADE'

Alter Table DEPARTMENT

Foreign Key Fkey4 (MGRNO)

References Employee ON DELETE SET NULL

Note; The same constraint when enforced through the application, it consumes more

resources

(3) Primary Key is a unique identifier for each row in a table. It can be made of one or more columns.

These columns can never be NULL.

Note:A table can be defined without a primary key.

But it is available for use only if the uniqueness is enforced by defining a UNIQUE INDEX

on the same columns.

Other unique keys in the table are called alternate keys. To enforce them also, we need unique indexes. They also cant be NULL.

Defined as

UNIQUE(<COLUMN>) or <column> char(3) NOT NULL UNIQUE

Other Integrity Constraints:

(1) Checking of Data types

(2) Checking NOT NULL

(3) Based on unique index on columns not defined as primary key/ alternate key.

Index Space -

  • There is always a one-to-one correspondence between index and index space. Index space is automatically created when we create an index.
  • They can be partitioned if the index it contains is a clustering index for a partitioned table space. Others are simple.
  • It is the unit of recovery and reorganization. For partitioned Index space, individual partitions can be reorganized or recovered.

Index –

  • Index defines the logical ordering imposed on the stored data.
  • They are defined on one column or a combination of columns.
  • Useful for fast sequential access of the indexed data.
  • For each distinct value of the index, the pointers, i.e, the RIDs

Of all the stored records that have that value are stored.

  • Index Scan is used when an exhaustive search is to be done based on the index value. This is based on the sequence in which the indexed values are stored.
  • Table space scan is based on the physical sequence of the records.
  • Clustering Index – is one for which the records are physically maintained in the sequence defined by the index. The index controls the physical placement of the indexed recs.

CREATE UNIQUE INDEX SACM2.ACICACCT

ON SACM2.ACTBACCT

(ACCT_ID ASC )

CLUSTER

(PART 1 VALUES('4384340002425997')

USING STOGROUP DUDTSMS

PRIQTY 25200

SECQTY 7200

ERASE NO

FREEPAGE 31

PCTFREE 15

,PART 2 VALUES('4851270009632812')

USING STOGROUP DUDTSMS

PRIQTY 25200

SECQTY 7200

ERASE NO

FREEPAGE 31

PCTFREE 15

)

BUFFERPOOL BP4

CLOSE YES;

CREATE UNIQUE INDEX RACA.ACI1SPAA

ON RACA.ACTBSPAA

(ACCT_ID ASC )

USING STOGROUP D2DPSMS

PRIQTY 360000

SECQTY 131068

ERASE NO

FREEPAGE 0

PCTFREE 10

BUFFERPOOL BP4

CLOSE YES

PIECESIZE 2097152 K;

VIEWS

-> Views are virtual tables based on one or more base tables.

-> When data in the base table changes, these changes are also reflected in the view.

-> The view is executed only when it is accessed

->They must be defined for a specific use and for one of the following advantages they

provide –

  • To provide row and column level security

By limiting the select items to the columns to which the user should have access and/or by limiting the rows by using proper conditions in WHERE clause.

  • To ensure optimal access

By using join criteria and by using indexed columns in predicates

  • To ensure same calculation

By using data derivation formulas in the select list

  • To Mask complexity of a query from DB2 beginners
  • To Support Domains

Note:Domain refers to a valid range of values that a column can contain.

The Table CHECK Constraint is used to create the domains.

The WITH CHECK OPTION is used to ensure that the data modification using

Update/Insert conforms to the WHERE conditions specified in the view

definition.

This option is of 2 types-

WITH CASCADED CHECK OPTION – The check option is applied to the current view

and all the views it accesses regardless of whether it is specified or not.

WITH LOCAL CHECK OPTION – The check option is applied to the views where it is

Specified.

  • To rename the columns so that the user can understand them better.

-> In a View with

Joins

Functions

Distinct

Group By and Having

Cannot do Insert/ Update /Delete

In a View with

Derived Data

Constants

Without columns that don’t have default value

Cannot do Insert

CREATE VIEW HIGH_PAID_EMP

(EMP_NO, EMP_FIRST_NAME, EMP_LAST_NAME, DEPT, JOB, SALARY)

AS

SELECT EMPNO,FNAME,LNAME,WORKDEPT,JOB, SALARY

FROM EMP

WHERE SALARY > 75000

WITH CHECK OPTION

SYNONYMS

  • An alternative private name for a table or a view
  • It can be used only by its creator.
  • It cannot refer to a remote table.
  • When a table/view is dropped, all synonyms defined on it are also dropped

CREATE SYNONYM TESTSYS FOR <CREATOR>.<TABLENAM>

ALIASES

  • An alternative private name for a table or a view
  • It can be used by everyone.
  • It can refer to a remote table.
  • When a table/view is dropped, all aliases defined on it are also dropped
  • It provides a useful level of indirection between application programs and the data tables.

CREATE ALIAS TESTALS FOR <CREATOR>.<TABLENAM>

SQL – Structured Query Language

It is the standard query language used by many Relational database products like –

DB2, Oracle, Sybase,etc.

Features of SQL ( or sequel) –

  • It is a high level language as compared to procedural languages- because it provides a greater degree of abstraction . In procedural languages, we process record by record – we specify what data is to be accessed and how to access them.

Sql on the other hand requires that that programmer specify only what is needed, but cannot specify how to retrieve them. The optimal instructions for data navigation (called the access path) is determined by the database itself.

  • Sql is not only a query language – but it is also used to define data structures, insert, modify, delete data and control access to the data. The language is common to the different users – like DBAs, application programmers, etc.
  • All database operations using Sql are at Set-Level unlike record level processing using flat files. This includes querying a table and the result is in the form of a subset of the original table. Similarly, updates and deletes also can be done at set level.

Based on Functionality, Sql can be classified into 3 Categories -

2)Data Definition Language(DDL) – Creates and maintains physical data structures using CREATE, ALTER and DROP verbs.

3)Data Manipulation Language(DML) – Accesses and Modifies Data using INSERT, SELECT, UPDATE and DELETE.

4)Data Control Language(DCL) – Control Data Security using GRANT and REVOKE.

Based on the type of usage, Sql can be classified into 2 Categories -

1)Static Sql – It is embedded in an application program written in a high level

Language like Cobol.

2)Dynamic Sql – It could be embedded in an application program where it will be

Constructed at run-time. Or it could be executed online at a terminal.

Rules for Sql –

1)Every Query must access atleast one object and select atleast item.

2) The items selected can be one or more columns from a table, a literal, special

Register, an expression result, an embedded select that returns a single row.

2)The object accessed can be a table, a view, alias or full select.

Examples:

1)Getting the the maximum average salary and the dept that has that value.