ORACLE DBA: PHYSICIAN OR MAGICIAN?
Chris Lawson
Database Specialists, Inc.
Introduction: The DBA “Doctor” ?
I have some bad news for all Oracle DBAs: We have many sickly database “patients” that are not receiving effective treatment. In many cases, the prognosis is not good. Some of our databases are languishing in Intensive Care Units. Others are precariously surviving on “ventilators.” Some are unable to communicate, and have slipped into a comatose state. Sadly, I suspect that some sickly databases have even been secretly “euthanized” by their caregiver.
Here are some typical examples of sickly databases: Reports that runs 5 hours, a simple query that requires 10 minutes, login that takes 2 minutes, etc. Does this sound familiar?
These databases create lots of mad and frustrated users. The users will rightfully ask the DBA, “What are you doing to treat my sick patient? Please help him! Give him a shot or something!”
The prognosis is not all bad, however. There is hope for even the sickest of databases. Some databases are being treated by knowledgeable practitioners, and the database “diseases” are in full remission. The good news for all DBAs is this: Poor database performance responds well to systematic treatment. In other words, the DBA “doctor” has treatment available that can rejuvenate the “patient.” The prognosis is good for the fortunate patients who are under the care of these enlightened practitioners.
The question addressed by this paper is, “What kind of systematic treatment can revive our sick databases?”
General Approach
The successful DBA practitioner must develop a general course of treatment--a regimen, that can be used to treat a variety of ills. I have found the following steps to be helpful:
Step 1:Define Problem: What is the chief complaint?
Step 2:Investigate: Confirm/quantify the problem
Step 3:Isolate root cause: Simplify and hone in on the essence of the problem
Step 4:Devise solution: The creative step
Step 5:Implement/confirm solution: Make sure the solution addresses the problem
Some Pitfalls and Early Mistakes
The DBA will encounter various obstacles during the treatment process. For instance, users may tire of complaining. Rather than informing the DBA of performance issues, they decide to live with problems rather than report them. In this case, the DBA simply remains ignorant of the problem.
Sometimes the application designers will resist divulging poor performance. This occurs because investigation might reveal poor or embarrassing design techniques.
One very frequent obstacle is the bias in favor of a pre-conceived, “Solution looking for a problem.” At other times, “tricky” and “pet” designs may be entrenched in the application. Re-education may be needed.
Along with obstacles to jump, the DBA may fall into several traps. Each of these traps hinders identification and resolution of the true problem. Here are some pitfalls that I have seen:
- Assuming a solution. Rather than perform the labor to identify the true root cause, it is tempting to skip the analysis and jump right to the solution. This frequently manifests itself by the premature purchase of bigger/better hardware.
- Looking for exotic solutions. Another way to jump to the conclusion is to implement more complex, or “exotic” changes. For instance: assuming that raw disk partitions are the solution, or changing the value of the spin_count parameter.
- Looking for the magic parameter. It is tempting to believe that a simple change to an init.ora parameter will solve the performance problems. This is usually a vain hope.
- Ignoring application design. It is extremely common for a defective application/database design to be hindering proper performance.
- Blaming the user. This is an especially attractive option; it has the added benefit that the user is usually not present to defend himself.
Step 1: Define the Problem (Physician)
When any of us visit a doctor, the doctor will soon ask, “What seems to be the trouble?” In medical clinics, this is called the “chief complaint.” The DBA doctor must do the same thing. He must ask the users, “What is the complaint?” Then he must follow up to get the details. For instance, he may ask the question, “How long is the query delay?” or “When does the problem occur?”
Here are some pertinent questions to ask:
- What server is involved?
- Did the program ever work right?
- Which users are complaining?
- What does the application do?
- What database version is being used?
- Is the connection remote or local?
- Is there more than one database involved?
Sample “Problems”
Remember that Step 1 should define the problem, not propose a solution. Don’t allow yourself to assume a solution.
For instance, some good problem statements would be:
“The Accounting Quarterly Profit Report takes 5 hours to complete.”
“Typical user login takes one minute.”
These are both good descriptions of what the user sees. They don’t try to guess at a cause; rather, they restrict their scope to what, not why.
On the other hand, the following statements are either too vague, or pre-suppose a solution:
“The network is slow”
“Database is slow; I need another index”
“The database needs more memory”
“I need Parallel Query option”
“I need version 8.1.6 because it is faster”
To prepare a good problem statement, interview the users. Ask, “What is your main complaint?”
Step 2: Investigate! (Detective)
After defining the problem, the next step is investigation. Here, the DBA changes hats from physician to detective. The main objective in this step is to recreate and quantify the problem.
Some questions that should be asked are:
- What is the elapsed time of the query?
- How many disk/logical reads are performed?
- Is there a large network transfer?
- Does the application use database links? If so, what other database is involved?
- Are there any large batch jobs running? If so, when?
In some cases, it is best to watch the end-user as he executes the program. Get to know a little about how the application works. A side benefit of this is that the user will understand that you are serious about solving the problem. They will appreciate your interest.
In many cases, it will be necessary to capture the actual SQL statements that have been issued. Run SQL trace on a session to discover all database queries; alternatively, look at the v$sqlarea view to list all SQL in the shared pool.
Oracle Enterprise Manager (OEM) can be very helpful at this stage. For instance, the Top Sessions tool can be used to show resource consumption (e.g., disk reads) for any session.
It is probably a good idea to also check the alert log for the database being queried, just in case there are any space problems indicated.
Step 3: Isolate Root Cause (Pathologist)
Now that we have identified what the problem is, and have quantified the problem, we are ready to find the root cause. In this role, the DBA wears the hat of a pathologist. We try to find the disease that is the primary cause of the performance problem.
Note that there is no need to guess at the root cause. In fact, speculation not based on the facts should be discouraged. Results from the previous investigation (Steps 1 and 2) will help focus attention on the key problem areas.
Simplify
In order to identify the true root cause, it is important to simplify. Remove irrelevant distractions, such as formatting, irrelevant tables, etc. from the query. If many different fields are listed in the query, remove most. Reformulate complex equations as simple SQL, that still use the same problem tables, joins, etc. If views are involved, break down views to underlying tables.
If a multiple-transaction set is the problem, break the large transaction into smaller pieces. Then eliminate the portions that execute quickly.
It is often helpful to build new database objects similar to the problem query in order to test theories. This way, the problem can be further isolated without impacting the production system. For instance, if queries on a certain type of table are a problem, create a similar table (and associated indexes) in another schema and try querying. The query can be made more and more complex, until the problem area is isolated.
Be Suspicious
When trying to find a root cause, the DBA should ask the question, “What is the optimizer doing with the SQL?” That is: what is the execution plan, by which the database runs the query? A very simple way to find this is to use the Autotrace function in SQL*Plus. It is mandatory that the DBA become very familiar with how Oracle determines the executions plan for a SQL statement. Lack of knowledge will lead to guesswork, and poor performance.
When trying to isolate a complex performance problem, be suspicious--consider all the subsystems. Don’t be too quick to eliminate any subsystem from your list of suspects. For example, if the query is run across a network, measure how many bytes were transferred across the network for that session. (OEM Performance Manager easily lists these statistics.) Also, try running the query locally and check the difference.
It may be helpful to consider the various causes of performance degradation that I have encountered. As shown in the figure, application and database design are almost always involved in correcting performance problems. At the other end, rarely have I seen problems that are truly caused by inadequate hardware sizing.
Sample Root Causes
Although the DBA should never guess at a root cause, it may be helpful to review frequent causes of bad performance. In my experience, root causes frequently fall into the following categories
Application Design
Application logic is designed to handle small tables only
Application processes rows one by one, as opposed to an entire set
Use of NVL() or other function in WHERE clause precludes index usage
Table join order is incorrect
Application allows large transfer of data across network from server to client
“Null search” (nothing in WHERE clause) is easy to run by mistake
Database Design
OTRACE not turned off.
Old statistics “fool” optimizer
“Lumpy” data but no histograms
Indexing
Index with excessive columns
Stagnant indexes
Missing/wrong indexes
Init.ora parameters
Greatly undersized SGA--e.g., db_block_buffers set too low.
Unusual or exotic features used unnecessarily--e.g., MTS
Hardware
Server greatly undersized.
Few disk drives cause I/O conflicts
Some Resources for DBAs
There are many different tools that can help the DBA analyze performance problems. Thus, it may be tempting to believe that simply purchasing the latest analysis tool will lead to instant solution of performance problems. A good DBA, however, will realize this point:
There is no substitute for thorough understanding of Oracle operation.
In other words, tools assist a DBA, but they cannot make up for poor grasp of database concepts. A good tool can help with isolating performance bottlenecks by performing mundane tasks such as finding the “hit ratio” of the buffer cache, or listing how many rows are sorted in memory versus disk. Without proper understanding, however, the tool will simply allow an ignorant DBA to make mistakes more quickly than otherwise.
If, however, the DBA has spent the time necessary to truly understand the database, the tools can be very beneficial. Here are some suggested tools and methods:
Become adept with Oracle Enterprise Manager (or a similar tool) OEM Top Sessions shows resource hogs. It is very helpful in discovering which SQL is always “stuck.” Another tool, OEM Performance Manager, is good for quick overview of database: hit ratios, sorts, etc.
Become adept at reading and understanding Execution Plans This will take much time and effort, but it is worthwhile. Once again, there is no substitute for proper understanding of how the Optimizer intends to execute SQL.
Become adept at finding and tuning SQL “heavy hitters” By this, I mean finding the SQL statements that are causing the most work for the database. Many of the statements will be multiple table-joins. This means that the DBA needs to be skilled at table join techniques. Sometimes the data in a particular table will be “lumpy” (non- uniformly distributed). This will require that the DBA understand the value of histograms. It may be necessary to turn “trace” on for a suspect session using one of the provided PL/SQL packages.
Step 4: Devise Possible Solutions (Artist)
Once the root cause has been established, the possible solutions will be much clearer. This is true because all potential solutions must flow from the root cause already identified. Ask yourself the question, “How can the root cause be surmounted?” If the root cause has not been identified, then the DBA will be wasting everyone’s time. This is just another way of saying: Guesswork does not work very well in solving difficult problems.
There is one big difference in this stage, compared to the previous stages. In Step 1, the DBA was a physician,hearing the complaint. In Step 2, the DBA became a detective, tracking down facts, sifting through evidence, simplifying the problem, etc. In Step 3, the DBA was a pathologist, trying to find the fundamental cause of the complaint.
In Step 4, however, the DBA becomes an artist. Instead of analyzing, we now synthesize. Now is the time to be creative: he must try to imagine the perfect solution that addresses the root cause in the simplest fashion possible. For example, is the solution just a matter of adding the missing index, or is a minor change to the application required? Rack your brain trying to conceive of potential solutions.
Here are some good starting places:
Use Oracle Metalink It is free with metal support. It can be used to search for a particular ORA error number, or find other users who have had similar problems. The Technical Forums provide quick answers to the simpler problems.
Brainstorm with other DBAs No one has all the answers! On numerous occasions, I have consulted with other DBAs on what I believed was an extremely thorny problem. Often I am surprised when the other DBA sees an obvious fact that I have simply missed.
Regardless of the tools used, the knowledgeable DBA will be able to consider a much broader range of potential solutions. This is because a good DBA will understand the cause-and-effect relationship between database changes and performance. He will also see that many offered solutions are irrelevant, because they do not address the underlying problem.
Step 5: Implement & Confirm Solution (Magician)
In this step, the DBA will (hopefully) don yet another hat--this time, the hat of a magician. This should be a simple and fun step. All the hard work has already been done--the problem has been thoroughly analyzed, and a solution has been synthesized that truly addresses the root cause of the performance bottleneck.
With the solution in hand, it is time to test the modification, and confirm the performance change. For complicated modifications that could impact production, this is best performed on a test system. Whatever system is used to test the modification, be sure your setup matches the actual production environment. It may be necessary to create various indexes, analyze tables, import data, etc.
After the solution is proven, it is time to document the solution and inform the customer. I have found it helpful to list “before” and “after” statistics. Make the summary very clear, and avoid mumbo-jumbo that only DBAs can understand. The customer will appreciate this simple explanation, because he is probably under pressure to report results to upper management.
Here is a sample performance summary:
Before: Report requires 1 hour and requires 1 million disk reads
After: Report finishes in 3 seconds and requires 50 disk reads
Solution: Rebuild stagnant index
Summary: The DBA “Magician”
A good DBA will wear many hats in the course of solving performance problems. Beginning as a kindly doctor treating a sick patient, the DBA will in turn transform into a Detective, Pathologist, Artist, then finally Magician.
Database performance tuning can be the most rewarding aspect of DBA work Frequently, an experienced DBA will be seen as not only a physician, but a magician, who magically advances performance. The sick “patient” happily skips out of the infirmary, with a clean bill of health.
In conclusion, remember the steps that should be part of each performance investigation:
Step 1: Define the problem
Step 2: Investigate