Area of Analysis: Helium Database

Table of Co

Area of Analysis: Helium Database

1.1Timing Analysis from Execution Traces

1.1.1Methodology

Performance analysis experiments performed on eleven important processes revealed that for several of them, the time spent on database operations was over 50% of the total execution time. In particular, we selected the DI0001Daily Accrual Interest Process and the IP0001Incentive Process for closer examination since they strongly exhibited this trend. We studied the database traces obtained by invoking the SQL Profiler while the processes were executed.

Since such traces often contain actions that occur before the initiation of the relevant execution, we found the earliest and latest SQL statements that were clearly the client’s requests for query or modification; from their difference we computed theDBTime figure. Within the trace, we collected all select, insert, and update SQL statements (the logic behind these processes did not require any delete statements) and added up their durations (reported in microseconds) to get theSQLTime figure (in seconds). This allowed us to compute the fraction SQLTime divided by DBTime as a percentage to express what percentage of the DBTime was spent on actual database access.

1.1.2Findings

The results are summarized in the following table (Table 5.4.1).

Process / Number of Borrower
Records / DBTime / SQLTime / SQLTime /
DBTime(%)
IP0001 / 528 / 47 / 1.1 / 2.3
IP0001 / 1926 / 22 / 3.3 / 15
IP0001 / 3405 / 34 / 7 / 20.6
IP0001 / 9941 / 122 / 43.5 / 35.7
DI001 / 528 / 21 / 1.3 / 6.2
DI001 / 1926 / 38 / 3.2 / 8.4
DI001 / 3405 / 49 / 7.7 / 15.7
DI001 / 9941 / 73 / 6 / 8.2

Table 5.4.1: Time consumed by SQL statements.

From Table 5.4.1, it is clear that the SQL operations do not consume more than 36% of the elapsed time (typically less than 20%). The rest is consumed by the overhead associated with the client’s access to the DBMS: the connection and disconnection.

Parenthetically, we note that the figure should be less than 36% for two reasons. First, while the SQLTime was computed by adding up durations of discrete queries, the total time DBTime was obtained by subtracting the EndTime from the StartTime; concurrent database actions would only improve the percentage. Second, a close reading of the trace corresponding to the IP001 process for the 9,941-size dataset indicates that this is likely to be an outlier (e.g., row 24,837 has an SQL operation with a duration of greater than one second while exactly similar operations before it and after it needed 5.4 milliseconds).

To summarize, from the traces, we see that the time taken by the actual database operations is very small compared to the overheads involved in the client application’s access to the DBMS.

1.1.3Recommendation

This overhead could be reduced if more database operations could be completed upon each connection. The application should reduce the number of requests to the DBMS and allow it to perform greater computation on each request. Stored procedures should be exploited as described in the next subsection.

1.2 Design Analysis from Execution Traces

1.2.1Methodology

The same traces also contained the SQL strings sent by the application. These and other attributes of each action item in the trace were examined to obtain an understanding about how the designer made use of the underlying database to fulfill the objectives

1.2.2Findings

We found that the SQL operations are repetitive: the same query is repeated for different values of a parameter. For example, the DI0001 process, given a workload containing 528 borrower records, makes one thousand six hundred and seventy-five separate requests to the DBMS to execute the query shown in Figure 5.4.1, each time with a different value of the parameter @p0 (shown in the last line of the query) corresponding to a different LoanID.

First, if we look closely at this repetitive behavior, we find that the application must connect to the DBMS repeatedly, each time requestingthe execution of exactly the same query for a different parameter, e.g., a loan. According to the trace, the time consumed by that query (or update) is minuscule compared with the overhead associated with the initiation and termination of theserepeated connections. Second, while the application makes these requests using sp_executesql and passes exactly the same query string and the same parameter-type string each time, a good practice enabling the DBMS to compile and prepare an execution plan once and re-use it, this is not enough: the DBMS cannot optimize across a sequence of requests because it cannot anticipate future requests.

Figure 5.4.1: Query executed 1,675 times, with that many connections when executing DI001 with528borrowers.

1.2.3Recommendation

To address the first finding, the overhead should be reduced by completing more database operations upon each connection. For the above example, this could be done if the application made one request to a stored procedure, which executed the repetitive queries by itself. In the above example, if all the LoanIDs were stored in a table NewTable (say), the 1,675 separate instances of the query shown in Figure 1 could be combined into one single query such as that shown in Figure 5.4.2. Of course, the stored procedure would not be just this single statement: it would proceed to use the result of this query, perhaps with the help of a cursor. Also, this approach applies to a number of the processes in this application that seem to be similar: they need to scan one or more tables looking for a condition and then perform a few updates.

Figure 5.4.2: This single query could potentially replace 1,675 database connections for that many executions of the query shown in Figure 1 when executing DI001 with 528 borrowers.

The second finding would also be addressed by the above strategy: an optimization across a sequence of requests would be possible if one stored procedure could be used for an entire sequence of discrete operations.

Making use of one stored procedure for a large amount of repetitive database activity would not only save time that is wasted in connection overhead, but also enable hitherto unattainable optimization.

1.3 Design Analysis from Programming Constructs

1.3.1Methodology

The schema and programming constructs of the Helium database were inspected; the entire schema was not analyzed in detail owing to a shortage of time and the lack of database design documents.

1.3.2Findings

First, no database design documents were available.

Second, the design of the database seems to be good provided one views the underlying DBMS as a primitive resource: a file system that supports a few SQL access commands and nothing else.

Little use is made of semantic integrity constraints, apart from primary and foreign keys on tables. Similarly, the designers have chosen to avoid the use of database triggers.

1.3.3Recommendation

First, database design documentation should be developed for maintenance and viability of the emerging system.

Second, viewingthe DBMS as a primitive storage system was common in the 1980s but it is no longer wise to do so today for large systems. A DBMS is an efficient implementer and enforcer of business logic. One stored procedure can not only combine disparate attributes of different tables, but also execute the business logic leading to appropriate updates of these tables, allowing it, for example, to update the status of all the loans. When complex, such procedures can be broken up into smaller procedures, facilitating readability and maintenance. Pushing down much of this business logic and the actions of processes from the application to the DBMS would enhance performance owing to the reduction in the number of connections and disconnections as we have pointed out earlier. At the same time, it would deliver disciplined concurrency and consistency. Furthermore, the business logic would be easy to find, comprehend, and maintain, being structurally, semantically, logically, and linguistically close to the data objects they access. Performance problems would be easier to observe and known techniques of database tuning could be applied to provide performance improvements.

Use of semantic integrity constraints can detect a great deal of errors of omission or commission and avoid costly corrections of incorrect and inconsistent data. Judicious use of triggers can enforce a great deal of auditing activity, visible and transparent, that can enable continuous vigilance, leave an audit trail for detection of fraudulent activity, and enhance security. Furthermore, it is also possible to use triggers to make the DBMS send email automatically when appropriate conditions arise; this can be used to enable routine communication with the customers as well as to quickly signal alerts to appropriate personnel.

In summary, the role of the DBMS in the overall system architecture should be reconsidered; at a very minimum, the processes that involve data-intensive computations could benefit in performance.