We Need to Do 43 Million Batch Inserts a Day. Since DB2 Is So Slow, Maybe We D Better

This session provides examples of how to do DB2 performance forecasting. Quick techniques for developing “ball park” estimates based on DB2 access paths are stressed. Formulas from IBM “red books,” presentations by Santa Teresa’s DB2 performance specialist Akira Shibamiya plus the output of IBM’s DB2 Estimator are included. With these basics, approximations can be made quickly for on-line transactions, batch and utilities. This paper provides a comprehensive bibliography and acts as the foundation for deeper analysis using statistical techniques and the DB2 Estimator® for more complex applications.

The techniques illustrated and opinions expressed are those of the author. The information contained in this document has not been submitted to any formal review and is distributed on an "as-is" basis without any warranty either expressed or implied. The use of this information or the implementation of any of these techniques or suggestions is a DB2 installation's responsibility and depends on that installation's ability to evaluate and integrate them into their operational environment. While the material may have been reviewed for accuracy in a specific situation, there is no guarantee that the same or similar results can or will be obtained elsewhere. Installations attempting to adapt this material to their own environments do so at their own risk.

Questions on DB2 performance typically occur when discussing new applications, at design sessions, when observing prototypes and after periods of heavy stress on a system. They are usually expressed with little quantitative foundation and are generally pessimistic. Examples--heard during a single month--include:

“We need to do 43 million batch INSERTs a day. Since DB2 is so slow, maybe we’d better use VSAM.”

“The new proposed application is estimated to perform 20 million SQL statements a day. None of our other applications comes anywhere near that. Is this feasible?”

”What kinds of SQL statements are you talking about?”

“Don’t know, but 20 million...!!”

“Nightly batch has to be done in six hours, but our forecasting tool says it will take over fifty! It sounds like we’re trying to do the impossible.”

Every designer working with DB2 benefits from learning how DB2 works and what the key factors are. All that is needed is knowledge of how the application is supposed to work, an awareness of disk and processor speeds and DB2 access paths. From then on, the mathematics involves nothing more than simple algebra and basic probability theory. Quick estimates can be done on a calculator but it is far easier to use a spreadsheet or IBM’s DB2 Estimator.

This capability is the designer’s “ticket to the ball park”--a way to make “ball park” estimates of the time DB2 will use to perform their application.

IBM has produced a great deal of literature on DB2 performance forecasting in addition to the DB2 Estimator for Windows package. A comprehensive bibliography is listed at the end of this paper. IBM’s most recent red book for DB2 forecasting, “DB2 V2.3 Nondistributed Performance Topics”, includes numerous formulas. More recent material is included in the proceedings of the IBM DB2 Technical Conferences, the International DB2 Users Group (IDUG) and the SHARE user group.

This chart lists formulas from the IBM DB2 Technical Conference proceedings which have been converted to produce path length estimates rather than the original which gave milliseconds. This permits tailoring the output to a specific target machine by dividing the total application path length by the MIPS (million instructions per second) capacity of the complex.

The formulas are for single row functions performed on-line. To calculate a possible range, two sets of formulas are offered--one for small tables and the other for large. Small tables are those having less than 20 columns and so few rows that their indexes are only one- or two-level. Large tables are those with 50 to 100 columns and with enough rows that three- and four-level indexes are needed.

The only input parameter is the number of indexes affected by the statement (NIX). The output is the number of thousands of instructions (KIs) that DB2 needs to perform the statement. To forecast processor time, multiply the path length by the number of times the statement is executed to determine a total path length, then divide by the MIPS-rate of the OS/390 complex used to execute the statement.

This chart was prepared using these equations to give a quick estimate for two processes mentioned at the start of this paper. Large table are assumed. The chart summarizes the processor cost of executing 20 and 43 million iterations of each of the single row operations. In the column for 20 million iterations, a single unique clustering index is assumed. The table that is the target of the 43 million inserts has four indexes to match the installation’s proposed database design. The insert and delete operations affect every index for each new row. The pessimistic (and logically impossible) assumption is that all four indexes require change for each execution of the update statement.

The chart shows the processor time running multi-thread on IBM’s fastest multiprocessor (the IBM ES/9000 9021-9X2 10-way bipolar central electronic complex) and on IBM’s third generation CMOS 9672-RX3 10-way CEC. The most processor-intensive operation--43 million row UPDATEs--takes 6 to 8 hours, depending on the processor complex. Therefore, from a processor standpoint, 20 million SQL statements or 43 million inserts in one day are not impossible.

However, no allowance was made for the application program path length or the transaction manager. Typical on-line CICS and IMS transactions add fewer than 50% more instructions to accomplish the transaction management and application work. Therefore, if a more conservative estimate is needed, the timings listed in this chart can be increased by 50%. Even with this added factor, modern processing complexes have the “horse-power” to perform a very large number of SQL operations over a 24 hour time period.

As a reference, this chart lists the MIPS rates of three classes of IBM central electronic complexes.

- Water-cooled ES/9000 Bipolar

- Air-cooled ES/9000 Bipolar

- Air-cooled CMOS announced Spring, 1998 (Double the length of the lines to

approximate the machines announced Spring, 1999)

These central electronic complexes vary from one to twelve tightly-coupled processors. In this estimation of on-line transactions scheduled by CICS or IMS, all ten processors in the 9021-9X2 and 9672-RX5 complex can be used to support the parallel execution of transactions. DB2 Version 4 and above on a parallel sysplex configuration are available if more cycles are needed. But a closer look at the process reveals where a concentrated effort is needed. The installation needing to insert 43 million rows is planning for batch--not on-line. What effect does this have on the forecast?

A normal batch process runs single thread and does not benefit from multi-engine complexes because only a single processor can be used at any time. For example, a single-engine on the 9021-9X2 can only supply around 46 MIPS to a batch thread while the 9672-RX5 supplies 49. Therefore, the 43 million inserts in batch should be more thoroughly analyzed.

A major factor in DB2 batch estimating is that almost all of the work for the application is single-threaded. For batch, only two subtasks can use multi-processors. These are the asynchronous pre-reading read engines and post-updating write engines. The only way for batch to exploit all the engines in a tightly-coupled complex is to run multiple copies of the batch program in parallel (ignoring read-only query I/O parallelism, query CPU parallelism and sysplex query parallelism (new in V5)).

Using the equations for on-line transactions, the insert of a row indexed by a clustering and three non-clustering indexes is forecast at 143,000 instructions (55 + (22 x 4) KIs). Repeating the insert 43 million times on IBM’s fastest uni-processor, the 9672-R15 at 65 MIPS, takes 26 hours in single-threaded batch. Obviously this batch work must be designed to run in parallel to solve the single-thread problem.

Multiple copies of an inserting program running in parallel lead to contention on the table and indexes. Table and clustering index contention can be easily solved with a partitioned table and by sorting the input work into multiple “sub-batches.” Each sub-batch is limited to working within a controlled set of partition key ranges.

The non-clustering indexes are the critical issue. Two quick design sub-projects are: (1) to determine the benefits of each of the three non-clustering indexes to see if any can be omitted from the final design, and (2) to develop a unit-of-recovery strategy for the batch process in order to minimize the duration of holding locks or latches on the non-clustering indexes. DB2 Version 4 type two indexes and partition independence are key software features used to support this type of work.

Inserts are generally straightforward. They always start with an “index probe” (B-tree search) through the clustering index to locate the ideal page for the new row. The ideal page contains: (1) the row with the next lower clustering key value, (2) the row with the next higher, and (3) enough free space to accommodate the new row. This probe requires four internal DB2 GETPAGE calls to process the clustering index’s root, non-leaf and leaf page plus the table’s data page. The key of the new row drives the probe with a “locate equal or higher” criterion. First-cut estimating assumes that the root and non-leaf pages are found in a DB2 buffer pool but the leaf page and data page must be read off disk. So the basic probe requires four GETPAGE calls and two random physical disk reads.

After the ideal page is located, the row is inserted. Three more index probes are needed to enter the non-clustering indexes and to insert the three alternate keys and RIDs (row identifiers). Each non-clustering index requires three GETPAGE calls and one physical disk read.

After completing the unit-of-work, all modified pages must be tagged to be written back to disk. This requires a SET WRITE PENDING call from DB2’s data manager to the buffer manager. The buffer manager writes the pages back to disk using asynchronous write engine subtasks. Altogether, one insert requires a total of 13 GETPAGE calls, 5 synchronous random disk read operations, 5 SET WRITE PENDING calls and 5 asynchronous skip-sequential disk write operations for each inserted row.

This chart lists disk access times for DB2 pages from a variety of disk models. Because the goal is quick estimating, use 20 milliseconds (0.020 seconds) per page read randomly and 2 ms for pages read by sequential prefetch or written by an asynchronous write engine. Using 20 ms and running single thread, 43 million inserts requiring five random reads and five skip-sequential writes per row takes almost 50 days.

Disk access is the major problem with inserts whether DB2 or VSAM, IMS, IDMS, ADABAS or “you-name-it” is used for the database. Obviously the non-clustering indexes are an expensive item in this design--they account for 30 of the 50 days.

Equally obvious is the need (1) to spread the data over a large number of disk storage units so that multiple access arms are available to share the work load, (2) to run multiple inserting job streams in parallel, (3) to have enough buffers to hold the entire working set of clustering and non-clustering index pages, (4) to sort the input work into clustering sequence, and (5) to distribute a large amount of free space in both the table space and indexes. Disk control units with data caching--and especially with the fast-write feature--are also good for this type of work.

This also illustrates two key estimating principles. First, processor utilization equations are best for forecasting the total processor capacity required for on-line transaction processing. Second, input/output estimations are much more critical for estimating response time and for batch and ad hoc work. Until more detail is needed, it is better to spend time analyzing only the input/output operations. This will be illustrated in upcoming examples.

When working with a forecasting package, knowledge of input/output is also critical in order to validate answers produced by the package. For example, original versions of IBM’s DB2 Estimator for Windows were weak in forecasting input/output for inserts. The IBM Education Center QUBE spreadsheet can be very misleading if the user is incapable of supplying realistic counts of synchronous and asynchronous read/write operations. Critical factors contributing to this weakness include difficulties in forecasting disk free space management, index page splits, buffering, disk contention and bottlenecks in physical disk configurations. Also a packaged estimator cannot afford to be optimistic. It is far safer for the estimator to be conservative and over-estimate.

However, forecasting input/output operations is relatively simple given (1) knowledge of DB2 access paths, (2) forecast sizes of user tables and indexes, (3) filtering of input rows by WHERE clause predicates and (4) anticipated sizes of result sets derived from executing SQL statements. The third and fourth requirements demand cooperation between the forecaster and business systems analysts.

This does not diminish the usefulness of estimating packages. The DB2 Estimator, for instance, is very quick and accurate when used to calculate disk requirements for table spaces and indexes and to forecast the performance of the basic DB2 utilities--load, image copy, RUNSTATS and reorganization.

This merely underlines the need for understanding the application process, the way DB2 access paths and buffering work and how disk input/output subsystems function.

This chart shows a summary of DB2 retrieval using three common access paths. The target table is 20,000,000 rows of 48 columns and 225 bytes each. They are loaded 16 rows per 4K page. These estimates were produced very quickly using the DB2 Estimator. The processor modeled was the latest IBM CMOS 9672-YX6 with RAMAC 3 disk storage. Several DB2 characteristics are documented by this simple use of the DB2 Estimator. For example:

(1) While DB2 is sensitive to the number of columns retrieved, that is far from the most critical factor in writing SQL statements. (2) An index is extremely important for simple retrieval of one or a few rows to service on-line transactions. (3) A clustering index is not too expensive when retrieving large numbers of rows as this access path avoids sorting the result set into clustering sequence. (4) Table space scans are very uniform with respect to I/O time and show the benefit of sequential prefetch. (5) Accessing large sets of rows via a non-clustering index is very expensive in both processor-cycles and disk access time. The Estimator has to be forced to predict this processor time as DB2 would never use that access path for a large result set. (6) Single row retrieval is very I/O-bound as opposed to retrieving the entire table using sequential prefetch which is processor-intensive. Query CPU parallelism and sysplex query parallelism are promising new DB2 features to help with long-running processor-bound retrieval. (7) Selecting a single row via the clustering or non-clustering index agrees with the formula on Page 2--less than one millisecond or 22,000 instructions on the CMOS 9672-YX6. Fetching all 20 million rows is much cheaper perrow--more on the order of six thousand instructions each.

Underlying the DB2 Estimator is the DB2 processing cost model formulated by Akira Shibamiya of the Santa Teresa Performance Laboratory and first documented by IBM’s DallasSystemCenter [IDSC87]. Documentation has been updated in the red book on DB2 capacity planning [ITSC90], the Version 2.3 red book mentioned [ITSC92] and in a presentation by Mr. Shibamiya at IBM DB2 Technical Conferences [SHIB94], IDUG and SHARE.

The model “walks-through” SQL statements starting with the application program issuing a call down through the three layers of logical DB2 processing. The first layer is the Relational Data System or RDS. This is the overall coordinator of the processing and is set-oriented in keeping with the relational model. RDS does major functions such as sorting for ORDER BY, GROUP BY and unions. RDS also does the major work for join operations.

The second layer is the row-oriented data manager. It manages the data sets and the pages housing DB2 tables and indexes. The buffer manager is the final layer and interfaces a collection of buffer pools to disk storage devices through OS/390 VSAM access methods. The buffer manager provides the data manager with the pages needed.

Mr. Shibamiya tracked the cost of doing individual functions in these layers just as an industrial engineer breaks down a complex operation into a series of simpler steps.

Many simplifying assumptions are made to avoid getting bogged down in pointless detail that adds little to the precision of the result. Others are made on the assumption that the DB2 system has been running for some time prior to the start of the process being estimated. Therefore, buffer pool initialization and data set OPENs, for example, are ignored.