DB2 Basics
DB2 - Database 2 is an IBM product. It is MVS’s relational database management system. It exists as one or more subsystem within MVS.
DB2 Objects: Database, Table space, Table, Index Space, Index, Storage Group,
View, Synonym, Alias
STOGROUPS
- For DB2 Datasets, we have 2 options for storage allocation –
- Storage Group
- User-Defined Vsam
- Dataset allocation is performed by DB2 through IBM’s Data Facility Product (DFP)
- Max no of volumes per Stogroup is 133 (Ideally 3 or 4) and all volumes must be of same type – 3380, 3390, etc.
- CREATE STOGROUP TESTSG1
VOLUMES(‘VOL1’, ‘VOL2’,…) .
- Default STOGROUP is SYSDEFLT. It is created when DB2 is installed. This should be avoided.
- The underlying VSAM datasets are created and maintained by DB2 .
Using User_Defined VSAM datasets
- 2 types of VSAM datasets can be used to represent DB2 Tablespaces and Indexspaces - ESDS AND LDS
- These are not used as plain VSAM datasets but DB2 uses VSAM Media Manager to access them. Also DB2 does additional formatting of the datasets because of which they cant be treated like Standard Vsam.
- DB2 can use LDS more efficiently as it has a 4k CI size and has no control information like ESDS.
- DEFINE CLUSTER –
(NAME(vcat.dsndbc.ddddddd.ssssssss.I0001.Annn) –
LINEAR –
REUSE -
VOLUMES(vol1,vol2,…) -
CYLINDER(pri sec) –
SHAREOPTIONS(3 3)
) -
DATA -
(NAME(vcat.dsndbd.ddddddd.ssssssss.I0001.Annn))
vcat represents an ICF catalog
BUFFERPOOL
Data is first read from a table which is in a VSAM dataset and is moved into a bufferpool and from there it is sent to the requester.
There are totally 60 Bufferpool options – 50 4K Bufferpools (BP0 thru BP49) and 10 32K bufferpools – BP32K thru BP32K9.
Database -
- The total collection of stored data is divied into a number of user databases and a few system databases.
- It has a group of logically related Tablespaces and Indexspaces, which in turn contain tables and indexes respectively.
- Database is the unit of START and STOP for the system administrator.
- Default Database is DSNDB04 which is created during installation.
- The parameters used for the creation are –
- STOGROUP - Default is SYSDEFLT
BUFFER POOL - Default is BP0
- DBD is Database Descriptor – It is a control structure used by DB2 to manage the objects which are under the control of a given database.
- Whenever any DB2 object in a database is created,altered or dropped, the DBD is modified .
- The DBD contains a mapping of the tablespaces, tables and indexes defined in a database.
- An X lock is acquired on a DBD during the execution of the DDL. So it is better to
- Execute DDL when there is little or no activity.
CREATE DATABASE DATPRPDB
STOGROUP D3DPSMS
BUFFERPOOL BP3
INDEXBP BP4
CCSID EBCDIC;
Table Space –
- Table space contains one or more tables. Index space contains exactly one index. A table and all of its indexes will be contained within a single database.
- It is a dynamically extended collection of pages. Page is a block of physical storage and it is the unit of I/O. The pages are all of 4K size for index spaces and 32K or 4K for table spaces.
- Table space is the unit of recovery and reorganization.
Types of table spaces –
Simple –
can contain one or more tables. Here within a single page, records from
more than one table can be interleaved.
Disadvantages – Affects Concurrent access, data availability, space
management
Partitioned –
can contain exactly one table which is so huge that it cant be maintained as a
single unit.
So it is partitioned based on the value ranges of one or more columns in a table. For this column or its combination, a clustering index must be defined. The key can be max 40 bytes . Columns in a partitioning index cannot be updated.
Each partition is independent of one another. Individual partitions can be associated with different storage groups.
To isolate specific data areas in dedicated datasets.
To improve data availability
To improve recoverability
To encourage parallelism (Query Parallelism is breaking the data access for a query into multiple I/O streams that is execute in parallel and which is useful in reducing the overall elapsed time)
For partitioned table space, individual partitions can be reorganized or
recovered
Segmented –
Can contain one or more tables. The table space is divided into segments. Each segment consists of n pages, where
n is a 4x be <= 64
and can contain records of only one table.
Advantages over Simple TS -
- Concurrent access is possible as data in in different segments
- Handles free space more efficiently.
- Mass delete is more efficient as data is deleted using the space map and without reading it (unlike simple ts)
- Space can be reclaimed from dropped tables immediately.
Parameters for Segmented TS are –
SEGSIZE – denotes the no of pages assigned to a segment
Parameters for Partitioned TS are –
NUMPARTS – no of partitions
LARGE is used to allocate as much as 1TB of data in a tablespace and
No of partitions can be max 254. If not, max storage amount is 64 GB and
max no of partitions is 64
LOCKSIZE – Denotes the type of locking for a particular tablespace.
Options are
TABLESPACE – Used when Shared access to multiple users is not required
or when the data is Read-Only
TABLE – Applicable only for Segmented TS and when Shared access is not required
PAGE - To facilitate shared access (i.e concurrency)
ROW - To facilitate shared access to data which could be on the same page.
Used only when Page Locks weren’t sufficient for concurrent access.
ANY – Means Let DB2 decide for each case, starting from Page Lock
LOCKMAX – Max no of Row or Page level Lock for any one user in a tablespace.
If this max is reached, the lock gets escalated to table or tablespace lock.
Values can be
0 -> lock escalation should never occur
SYSTEM -> defaults to the system-wide value specified in DSNZPARMS
Integer from 1 to 2,147,483,647
PRIQTY, SECQTY
PCTFREE – Denotes what % of each page should remain free for future inserts
FREEPAGE – Denotes the no of pages after which an empty page is available.
These 2 are useful in reducing the frequency of reorganization, reducing contention and increasing the efficiency of insertion
MAXROWS – Max no of rows on a tablespace page. Dflt is 255
CREATE TABLESPACE ACTSACCT IN STEVDB01
NUMPARTS 32
(PART 1 USING STOGROUP STEVESG
PRIQTY 252
SECQTY 252
ERASE NO
FREEPAGE 0
PCTFREE 0
TRACKMOD YES
COMPRESS YES
,PART 2 USING STOGROUP STEVESG
PRIQTY 252
SECQTY 252
ERASE NO
FREEPAGE 0
PCTFREE 0
TRACKMOD YES
COMPRESS YES
, ……………….
)
BUFFERPOOL BP3
LOCKSIZE PAGE
LOCKMAX 0
LOCKPART YES
CLOSE YES
CCSID EBCDIC;
CREATE TABLESPACE ACTSACCG IN TAC4DB01
USING STOGROUP D3DPSMS
PRIQTY 2880
SECQTY 7220
ERASE NO
FREEPAGE 0
PCTFREE 5
TRACKMOD NO
SEGSIZE 64
BUFFERPOOL BP3
LOCKSIZE PAGE
LOCKMAX 0
LOCKPART NO
CLOSE YES
COMPRESS YES
CCSID EBCDIC;
DB2 DATA TYPES:
Numeric -
Smallint - 2 byte (binary integer)
Integer - 4 byte (")
Decimal(p,q) - Packed Decimal; Total p digits and decimal q digits; (p+1) or (p+2)/2 bytes;p<54
Float(p) - If p<22, single precision, 4 bytes, else double precision, 8 bytes
Note: All have a DEFAULT value of 0
String -
Character - String of n bytes (fixed) where n < 255
Note: DEFAULT is blanks
Varchar - String of n bytes (variable) where exact length is n+2, 2 for the actual length
Max size < page size within the tablespac.
Note: DEFAULT is a null string
Date/Time -
Date - unsigned packed decimal in format yyyymmdd - 4 bytes
Time - """ hhmmss - 3 bytes
Timestamp -same - yyyymmddhhmmssnnnnnn - 10 bytes
Note: DEFAULT is CURRENT DATE/TIME/TIMESTAMP
Table –
- Table that is physically stored within a table space contains one or more stored records.
- There will be one stored record per row in the base table (the table that is visible to the user).
- The stored record is a byte string containing
- A prefix denoting the internal system identifier
- For each field, a length prefix denoting the actual length, followed by the actual field value in a encoded form.
- The stored records are internally addressed by RID – Record identifier – this consists of the page number within the table space and a byte offset from the foot of the page. This in turn contains the byte offset from the top of the page. This way when the records are rearranged within a page, the RID wont change. Only the local offset at the foot of the page will change.
Special Registers:
They are Zero Argument built-in Scalar functions, which return a scalar value.
1. USER - Returns the primary Authorisation ID.
2. CURRENT SQLID - Returns the Current Authorisation ID - This could be primary Id or secondary
ID assigned to a specific functional area.
3. CURRENT SERVER - Returns the ID of the current server (useful in Distributed Database mgmt)
4. CURRENT PACKAGESET - Returns ID of the collection that is currently in use.
5. CURRENT DATE/TIME/TIMESTAMP - Returns the Current Date/ Time /TimeStamp.
6. CURRENT TIMEZONE - Returns a time duration representing the displacement of the local
time zone from Greenwich Mean Time.
CREATE TABLE DEPARTMENT
(DEPTNOCHAR(3) NOT NULL
,DEPTNAME VARCHAR(29) NOT NULL
,MGRNOCHAR(6)
,ADMRDEPTCHAR(3)NOT NULL WITH DEFAULT '200'
,LOCATIONCHAR(16) NOT NULL WITH DEFAULT
,PRIMARY KEY(DEPTNO)
)
-----IN <DATABASE>.<TABLESPACE> (The default is used if not specified)
;
Note:
Table can also be created as
CREATE TABLE DUPDEPT1 LIKE DEPARTMENT;
CREATE TABLE DUPDEPT2 LIKE DEPARTMENT
(DEPT_PHONE CHAR(10));
CREATE TABLE EMPLOYEE
(EMPNO CHAR(6) NOT NULL
,FNAME VARCHAR(12) NOT NULL
,LNAME VARCHAR(15) NOT NULL
,WORKDEPT CHAR(3)
,PHONENO CHAR(7)
,HIREDATE DATE
,JOB CHAR(8)
,SEX CHAR(1)
,DOB DATE
,SALARY DECIMAL(9,2)
,BONUS DECIMAL(9,2)
,COMM DECIMAL(9,2)
,PRIMARY KEY(EMPNO))
WITH RESTRICT ON DROP --> this will avoid table from getting dropped.
;
ALTER TABLE:
ALTER TABLE EMPLOYEE
ADD TAX DECIMAL(9,2) ;
And this command is also used to add or drop constraint - primary/foreign keys/check
Types of Constraints:
(1)Check Constraint:
-->Used to enforce specific restriction on the values of a column.
This will be checked for every update/insert .
-->Can use condition operators,between,in,like,null and can have multiple condtions combined
using 'and' / 'or'
-->CONSTRAINT PHONE_CHK CHECK (PHONENO >= '0000000' AND PHONENO <= '9999999')
The 2nd operand can be another column. But they shd be of same data type as first.
-->Advantages:
(1)The basic business rules which need to be applied can be done at the database
level. And this can save additional programming in applications which modify that
data.
(2)It ensures consitency and data integrity -
as it avoids bypassing of rules by adhoc data modification
as the rules are applied everytime data is modified
-->Watch out the following- (these is not checked)
(1)If the conditions coded contradict each other,
like check(Phoneno > '1000000' and phoneno < '0999999')
(2)If the conditions coded contradict the defaults
(3)redundant conditions are allowed - this cud impact performance
(2)Referential Integrity:
-->Means of enuring data integrity between related tables using a parent-child relationship
The one with Primary key is parent. The one with foreign key is child table.
-->Alter Table EMPLOYEE
Foreign Key Fkey1 (workdept)
References Department ON DELETE CASCADE
Alter Table EMPLOYEE
Foreign Key Fkey2 (workdept)
References Department ON DELETE RESTRICT
Alter Table DEPARTMENT
Foreign Key Fkey3 (ADMRDEPT)
References Department ON DELETE CASCADE
--> A self referencing constraint shd always use 'ON DELETE CASCADE'
Alter Table DEPARTMENT
Foreign Key Fkey4 (MGRNO)
References Employee ON DELETE SET NULL
Note; The same constraint when enforced through the application, it consumes more
resources
(3) Primary Key is a unique identifier for each row in a table. It can be made of one or more columns.
These columns can never be NULL.
Note:A table can be defined without a primary key.
But it is available for use only if the uniqueness is enforced by defining a UNIQUE INDEX
on the same columns.
Other unique keys in the table are called alternate keys. To enforce them also, we need unique indexes. They also cant be NULL.
Defined as
UNIQUE(<COLUMN>) or <column> char(3) NOT NULL UNIQUE
Other Integrity Constraints:
(1) Checking of Data types
(2) Checking NOT NULL
(3) Based on unique index on columns not defined as primary key/ alternate key.
Index Space -
- There is always a one-to-one correspondence between index and index space. Index space is automatically created when we create an index.
- They can be partitioned if the index it contains is a clustering index for a partitioned table space. Others are simple.
- It is the unit of recovery and reorganization. For partitioned Index space, individual partitions can be reorganized or recovered.
Index –
- Index defines the logical ordering imposed on the stored data.
- They are defined on one column or a combination of columns.
- Useful for fast sequential access of the indexed data.
- For each distinct value of the index, the pointers, i.e, the RIDs
Of all the stored records that have that value are stored.
- Index Scan is used when an exhaustive search is to be done based on the index value. This is based on the sequence in which the indexed values are stored.
- Table space scan is based on the physical sequence of the records.
- Clustering Index – is one for which the records are physically maintained in the sequence defined by the index. The index controls the physical placement of the indexed recs.
CREATE UNIQUE INDEX SACM2.ACICACCT
ON SACM2.ACTBACCT
(ACCT_ID ASC )
CLUSTER
(PART 1 VALUES('4384340002425997')
USING STOGROUP DUDTSMS
PRIQTY 25200
SECQTY 7200
ERASE NO
FREEPAGE 31
PCTFREE 15
,PART 2 VALUES('4851270009632812')
USING STOGROUP DUDTSMS
PRIQTY 25200
SECQTY 7200
ERASE NO
FREEPAGE 31
PCTFREE 15
)
BUFFERPOOL BP4
CLOSE YES;
CREATE UNIQUE INDEX RACA.ACI1SPAA
ON RACA.ACTBSPAA
(ACCT_ID ASC )
USING STOGROUP D2DPSMS
PRIQTY 360000
SECQTY 131068
ERASE NO
FREEPAGE 0
PCTFREE 10
BUFFERPOOL BP4
CLOSE YES
PIECESIZE 2097152 K;
VIEWS
-> Views are virtual tables based on one or more base tables.
-> When data in the base table changes, these changes are also reflected in the view.
-> The view is executed only when it is accessed
->They must be defined for a specific use and for one of the following advantages they
provide –
- To provide row and column level security
By limiting the select items to the columns to which the user should have access and/or by limiting the rows by using proper conditions in WHERE clause.
- To ensure optimal access
By using join criteria and by using indexed columns in predicates
- To ensure same calculation
By using data derivation formulas in the select list
- To Mask complexity of a query from DB2 beginners
- To Support Domains
Note:Domain refers to a valid range of values that a column can contain.
The Table CHECK Constraint is used to create the domains.
The WITH CHECK OPTION is used to ensure that the data modification using
Update/Insert conforms to the WHERE conditions specified in the view
definition.
This option is of 2 types-
WITH CASCADED CHECK OPTION – The check option is applied to the current view
and all the views it accesses regardless of whether it is specified or not.
WITH LOCAL CHECK OPTION – The check option is applied to the views where it is
Specified.
- To rename the columns so that the user can understand them better.
-> In a View with
Joins
Functions
Distinct
Group By and Having
Cannot do Insert/ Update /Delete
In a View with
Derived Data
Constants
Without columns that don’t have default value
Cannot do Insert
CREATE VIEW HIGH_PAID_EMP
(EMP_NO, EMP_FIRST_NAME, EMP_LAST_NAME, DEPT, JOB, SALARY)
AS
SELECT EMPNO,FNAME,LNAME,WORKDEPT,JOB, SALARY
FROM EMP
WHERE SALARY > 75000
WITH CHECK OPTION
SYNONYMS
- An alternative private name for a table or a view
- It can be used only by its creator.
- It cannot refer to a remote table.
- When a table/view is dropped, all synonyms defined on it are also dropped
CREATE SYNONYM TESTSYS FOR <CREATOR>.<TABLENAM>
ALIASES
- An alternative private name for a table or a view
- It can be used by everyone.
- It can refer to a remote table.
- When a table/view is dropped, all aliases defined on it are also dropped
- It provides a useful level of indirection between application programs and the data tables.
CREATE ALIAS TESTALS FOR <CREATOR>.<TABLENAM>
SQL – Structured Query Language
It is the standard query language used by many Relational database products like –
DB2, Oracle, Sybase,etc.
Features of SQL ( or sequel) –
- It is a high level language as compared to procedural languages- because it provides a greater degree of abstraction . In procedural languages, we process record by record – we specify what data is to be accessed and how to access them.
Sql on the other hand requires that that programmer specify only what is needed, but cannot specify how to retrieve them. The optimal instructions for data navigation (called the access path) is determined by the database itself.
- Sql is not only a query language – but it is also used to define data structures, insert, modify, delete data and control access to the data. The language is common to the different users – like DBAs, application programmers, etc.
- All database operations using Sql are at Set-Level unlike record level processing using flat files. This includes querying a table and the result is in the form of a subset of the original table. Similarly, updates and deletes also can be done at set level.
Based on Functionality, Sql can be classified into 3 Categories -
2)Data Definition Language(DDL) – Creates and maintains physical data structures using CREATE, ALTER and DROP verbs.
3)Data Manipulation Language(DML) – Accesses and Modifies Data using INSERT, SELECT, UPDATE and DELETE.
4)Data Control Language(DCL) – Control Data Security using GRANT and REVOKE.
Based on the type of usage, Sql can be classified into 2 Categories -
1)Static Sql – It is embedded in an application program written in a high level
Language like Cobol.
2)Dynamic Sql – It could be embedded in an application program where it will be
Constructed at run-time. Or it could be executed online at a terminal.
Rules for Sql –
1)Every Query must access atleast one object and select atleast item.
2) The items selected can be one or more columns from a table, a literal, special
Register, an expression result, an embedded select that returns a single row.
2)The object accessed can be a table, a view, alias or full select.
Examples:
1)Getting the the maximum average salary and the dept that has that value.