Best Practice in Database Design for Performance
1This White Paper is based on many years practical experience as a DBA
Geoff Duniam, Senior Oracle DBA at Price WaterhouseCoopers, London, England.
This is aimed at people with some experience in Database Administration.
It is intended to provide a comprehensive review of problems and solutions.
It is written from an Oracle perspective but the contents are very generic and apply equally well to SQL Server.
1.3Successful database applications
- An Application built around (and dependant) on the database… will succeed or fail based on how it uses the database.
- Applications come, applications go. The data, however, lives forever. In the long term, the goal is not about building applications, it is really about using the data underneath these applications.
- A development team needs at its heart a core of database savvy developers who are responsible for ensuring the database logic is sound and the system is built to perform from day one. Tuning after the fact (tuning after development) typically means you did not give serious thought to these concerns during development
1.4Database design – Indexes, Referential Integrity and Statistics
1.4.1Enforce your business model at the database level
Your business model should be implemented and enforced at the database level. The initial application analysis will identify the logical data objects and their relationship to each other, even before you choose your database platform. Not every application will require a data model in Third Normal Form, however when designing your physical data model from your logical data model, the logical relationships should be enforced in the physical model using referential integrity constraints – Primary keys and foreign keys.
Business rules = data rules. For example, “you can only be an employee if you are in a department” should be instantiated as a mandatory foreign key on an employee table to the primary key on a department table. Not only does this enforce the integrity of your data, it will have a knock on effect to the performance of your queries by giving the database optimiser the right information to determine the most effective query paths.
Application specific logic not enforced in the database.
1.4.2Indexing and index types
Indexes enable faster searching and (depending on design) faster updates. Be aware that differing database structures will require differing index strategies, and these decisions should be made in the initial design phase.
You will also need to be aware of the differing types of index that are available to you, and when it would be appropriate (and just as importantly, when it would not be appropriate) to use them.
1.4.3Statistics and their place in query optimisation
Indexes, primary and foreign keys, domain and unique constraints do more than enforce the relationship of data objects to each other. The database kernel needs information to optimise execution paths of your queries, procedures, functions and packages, as well as efficiently allocate database and system resources. Usually, this information is in the form of database statistics, held within the databases’ data dictionary. Statistics may have to be manually gathered, or you may be able to generate them automatically. Your DBA will advise you here, however it is important to understand that unless your database has up-to date and complete statistics for each object in your database, the optimiser will be unable to calculate the most effective execution plan for your application queries.
1.4.4Explain plans and benchmarking
Explain plans are reports on how your query is executed. They will contain information regarding what indexes were used, what tables were scanned, how many blocks and rows were accessed to return your result sets, and an indication of what effort your query cost the database in terms of CPU seconds, IO operations and waits. They are a very effective tool to tune your applications, so take the time to learn how to generate and read explain plans on your database platform.
1.5Writing SQL Statements
At its most basic level, interaction with a database is easy. To retrieve information, READ or SELECT. To CREATE, CREATE or INSERT. To change, UPDATE – and to remove, DELETE. Create Read Update and Delete. Point these statements at database tables or views;use generic code generators application development tools and the data manipulation procedures are done.If we’re not sure what tables we want to access, or what fields we want until we get some input from the end user, it’s very simple to build the query into a string and then execute the string through a JDBC(or equivalent) call.
Unfortunately, that design approach is fundamentally flawed on a number of levels, the most important being security, scalability and performance.
For every database application the mechanics of executing a query need to be understood, even if you are using a generic code generator - because the SQL code that is generated from these tools will probably not be optimised at best and can potentially be dangerously inefficient at worst. You need to be able to SEE the generated code and then decide if it’s appropriate. Code generators should not be used as a screen to hide SQL ignorance behind. Bad SQL is just that, regardless of who or what generates it, and it’s up to you to ensure that every piece of code you write or are responsible for is fit for purpose, and that goes way beyond just the immediate CREATE/INSERT, READ/SELECT, UPDATE or DELETE operation you want to perform.
1.6Bind Variables for efficiency and security
1.6.1Query parsing, binding and execution - overview
After syntax checking, the query is parsed into memory and the database looks to see if the query already exists in memory. If it doesn’t, it has to be security checked, optimised, an execution plan for the queryhas to be generated in memory, then the query parameters are bound and the query is executed. This is a hard parse, and it’s a lot of work, it consumes a lot of resources - and more importantly locks objects to prevent database corruption while this work is being done, preventing other session accessing these objects. Database instance memory and resources are finite, and we do not want to find ourselves in a situation when we run out of memory, or locking resources because of badly designed SQL. And be aware that it can take only one badly written SQL statement to cause a problem – regardless of the fact that YOUR queries are 100% optimised and well written.
As an application developer, you do not want to find yourself in a situation that every query you issue from your application has to be hard parsed.Parsing is an expensive operation for the database and is a performance and scalability killer. What you want to see is your application using the existing execution paths and query plans already in the database memory as much as possible, i.e. soft parsing. So you need to design your SQL code to be as universal as possible, and that means binding your variables to your SQL Statements at execution time, regardless of where the query originates from. The way this is done will vary from database to database, and application to application, but the one constant across the varying technology is that refusing to use bind variables will potentially not only open your application up to quite severe security risks, it will also be a prime cause of your application not being able to scale or perform under load.
Application developers and Project Managers should make sure that the technology beingused to interact with any RDBMS system supports the ability to issue queries with bind variables. Unbound queries can destroy the ability of your application to scale and run efficiently. And even though this should be obvious, it bears repeating - string substitution in an SQL query is not binding variables.
1.6.2Bind variable example
For example, to retrieve the details of employee # 123 from the employee table, you could
Select * from emp where empno = 123;
Alternatively, you could
Select * from emp where empno = :empno
With a variation of “using (123)” to bind the variable.
The first query will have to be hard parsed every time you issue the query with a different employee number – and if this query is going to be executed many times with different parameters, you will flood your database’s memory area with multiple copies of the same statement. Once this occurs, then every new query will have to be hard parsed, as well as possibly (depending on the database platform) the unwelcome side affect of causing your database to start page swapping into your virtual memory with all the unwelcome IO operations that entails.
However, the second query will only have to be checked, optimised and parsed once and once only.
1.6.3Bind Variables prevent SQL Injection
You have a procedure with two parameters you wish to call from your Java or .Net application.
You set up the statement like this –
Stmt = “begin p(‘” & input1 & “’, ‘” & input2 & “’ ); end;”
Your user types in
Input1 = x’. ‘y’); execute immediate ‘drop table all_salaries’; end; -- <junk…>
Your application then runs stmt on your database, which is this –
Begin p(‘x’, ‘y’ ); execute immediate ‘drop table all_salaries’; end; --<junk… >
Now you have to explain to your line manager, director and reporting partner why the salary table has been dropped, and unless you have auditing functions turned on, this will be very hard to trace.
As an exercise, I suggest you Google “sql injection”. The importance of this cannot be overstated.
Programmatic constructs like this
Stmt = “begin p(‘” & input1 & “’, ‘” & input2 & “’ ); end;”
and their variations arenot binding variables. Use your technologies variation of the Java PreparedStatement.
PreparedStatement pstat =
(“begin p(?, ?); end;”);
Pstat.setString(1, <your variable> );
Pstat.setString(2, <your other variable>);
Bind variables are not just about performance; they are also about security. This is particularly important if you are using a code generator, or dealing with third party software.
Some database platforms support the use of “hints”. These are directives to the optimiser to change the execution plan of the query.
Unless you are intimately acquainted with the underlying architecture of your database platforms optimiser, and you have extensively benchmarked your changes, and you have discrete, quantifiable and definite performance reasons to use hints, and you fully understand the internal mechanics of what the hint will do to your execution paths under load, leave them well alone. Under almost all circumstances, the database optimiser will have come up with the optimal query execution plan for your given query.
This of course presumes that your database design and indexing strategy is valid. A hint is not a quick fix for poor database design.
Transactional design will have implications on how your application will scale and perform. Issues and design questions you will have to answer include
- How do we prevent “lost updates”?
- How do we guarantee transactional integrity, and the underlying integrity of our data?
- How do we guarantee a consistent view of data to the end users of our applications?
- What locking model and isolation levels are available in the particular database we are using – and they are different between different databases – and how will this affect our design? and
- How to design appropriate methods and rules to cater for this?
The most efficient way is to package your transactions into stored procedures in SQL Server and packages in Oracle, and use static SQL as much as possible. Certainly within an Oracle instance, the first time a package is called in your session, it is pinned into the shared pool, and the static SQL statements are parsed and the explain plans written, so when the packages are called again, they just execute – no need for a hard or soft parse. This is one of the most pressing arguments for the usage of properly designed stored procedures and static SQL – “parse once, execute many”.A good philosophy to adopt when designing database applications and the associated SQL transactions is -
- Do it in one SQL statement if you possibly can; think in terms of set processing, not row-by-row - especially for bulk processes.
- If you can’t do it in one SQL statement, then a little database native procedural code (ie. PL/SQL on Oracle, or T-SQL in SQL Server) wrapped around well designed SQL statements will almost certainly solve most problems – but remember to think in terms of processing “sets” of data. Keep your procedural programs small and explicit, and designed to solve one specific problem.
- If for whatever reason, native procedural code (and this includes native Java libraries in both Oracle and SQL Server) won’t give you what you need – and this should RARELY be necessary – then an external call to a Java library or C/C++/C#/Assembler dll could be justified.
- If that won’t do it, you probably shouldn’t be trying it in the first place, or your design is fundamentally flawed
- AVOID dynamic SQL unless you have absolutely no other choice; dynamic SQL is very easy to code, but it can be complicated to code properly and if you don’t code it properly, you run the risk of flooding the instance memory area with queries that have to be hard parsed every time and that can bring a database to its knees.
Further, using database packages and procedures, you can make use of the appropriate methods to lock you records for INSERTS and UPDATES depending on your database, and elegantly handle any waits or exceptions you may encounter.
AUTOCOMMIT should always be turned off to preserve transactional integrity. Let’s say you have a process that updates records in five tables, and because of the design of the data, you need to perform this process in three separate SQL Statements. The first statement executes without error, as does the second. The third fails for some reason (maybe it violates a unique constraint) – so the entire transaction should be rolled back, but if the separate prior statements have been AUTOCOMMITed, you have a problem. As an application designer, it is up to you to decide the success or failure of a transaction and when to explicitly commit or rollback.
Pragmatically, you also need to have a very good idea of the data usage your application will need so you can decide what approach to take - what’s not so important in a reporting database with few users becomes critical in a large OLTP system with a high number of concurrent users. But regardless of whether or not record locking will be an issue for your application, you must know how a particular database locks, what the implications are and how best to design your transactions.This is not really something that can be done generically (and it really shouldn’t be handled at the application level, either) – every database is different in the way it handles locks and latches, and no one approach will work effectively across all database platforms.
1.10Database native SQL extensions
DB2, SQLServer and Oracle all have a rich set of SQL extensions that go far beyond ANSI standard SQL, and as a good developer you should know what they are because if you do, it will make you job much, much easier.You should also have a good working knowledge how your queries will be parsed and the mechanisms your database uses to optimise your queries, so you can write and tune efficient SQL. Remember, you should be designing your applications for scalability and performance from the beginning of your project lifecycle.
Properly designed dynamic SQL can be as efficient as static SQL if it’s written properly, but it is very easy to write bad dynamic SQL. The various implementations of dynamic SQL will vary widely across database platforms, so if you do have to use dynamic SQL make sure you know what’s available, how and why it works as it does and what the implications are - because one method may be appropriate for one set of circumstances, another method may be appropriate for another.
For example, in Oracle databases, version 8.1.6 and above, there are two methods for generating dynamic SQL - Native Dynamic SQL and the DBMS_SQL API package which was introduced in version 7.1. The differences between these two methods are quite pronounced, even though they both provide a similar service, so it would be important to choose the correct method depending on what type of transaction you wanted to perform.
1.12Dynamic SQL guidelines with Javaand .Net
If you absolutely MUST generate a dynamic SQL statement from within your Java classes or .Net applicationsALWAYS use a method that allows you to bind variables (eg. for Java, use PreparedStatementnot Statement. PreparedStatement supports the use of binds. Statement does not. Refer back to section 1.4.2).
A better approach would be to make a call to a properly designed database package that will build the (optimised) dynamic statement for you, and return a resultset or value set that you can then manipulate as you see fit.