SQL Server 2005 Row Versioning-based Transaction Isolation
SQL Server Technical Article
Writers: Kimberly L. Tripp, Neal Graves
Published: July 2006
Applies to: SQL Server 2005
Summary: Microsoft® SQL Server™ 2005 provides nonlocking, nonblocking read consistency to your users through snapshot isolation and read committed isolation using row versioning. Find out when you can use these features to improve performance and reduce latency in your applications. (59 printed pages)
Download the associated sample code, SQLServer2005RVTI.exe.
Copyright
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred.
Ó 2006 Microsoft Corporation. All rights reserved.
Microsoft, Outlook, SQL Server, Visual C#, and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Contents
SQL Server 2005 Row Versioning-based Transaction Isolation 1
Copyright 2
Contents 3
Introduction 5
Data Access Patterns and Usage 5
Usage Scenarios 6
Application in Online Transaction Processing 7
Ad hoc Reporting Against Live Data 8
Ad hoc Reporting Against a Copy-Managed Database 11
Database Mirroring 12
Transactional Replication 12
Overnight Reporting Against Live Data 14
Transaction Processing System Workload 15
Data Warehousing System Workload 16
Migration to a Common Database Technology 17
SQL Server and Oracle Differences in Snapshot 18
SQL Server and Oracle Similarities in Snapshot 22
Understanding Concurrency Control 23
Understanding Isolation 25
Isolation Levels Offered in SQL Server 2005 25
Isolation Level and Best-Suited Application 26
Considerations for Row Versioning–based Isolations 28
Definitions, Terminology, and Syntax for Row Versioning–based Isolation 28
Read Committed Isolation Using Row Versioning (Statement-Level Read Consistency) 29
Snapshot Isolation (Transaction-Level Read Consistency) 30
Allowing Snapshot Isolation 30
Requesting Snapshot Transactions 32
Understanding the "Beginning" of a Transaction 33
Understanding Row Versioning 33
Row Versioning in Read Committed Using Row Versioning 34
Row Versioning in Snapshot Isolation 34
DDL Statements Within Snapshot Isolation 35
DDL Statements That Are Not Allowed Within Snapshot Isolation 35
Other DDL Statement Changes After Snapshot Isolation Started 36
Snapshot Transaction Failure Due to DDL Changes Outside of the Transaction 36
Development Best Practices 39
Read Committed Using Row Versioning 39
Snapshot Isolation 40
Minimizing Update Conflicts 44
Illustrating Concurrency Behavior 44
READ COMMITTED 44
READ UNCOMMITTED 48
READ COMMITTED Using Row Versioning 50
SNAPSHOT 53
Administrative Best Practices 55
Database-Level Settings 55
Upgrade Issues 55
Usage of Version Store in tempdb 56
Sizing tempdb 57
Monitoring Version Store Activity 58
Dynamic Management Views 58
Performance Monitor Counters 61
Conclusion 64
For more information 64
SQL Server Books Online 64
Knowledge Base Articles 64
Additional Reading 65
Newsgroups and Forums 65
Introduction
In many systems today, significant read activity is isolated from write activity in the form of a data warehouse or separated system. There are many advantages to this approach: read-intensive applications typically want more index structures, data redundancies, and even other views of data. Transaction processing systems want the best write throughput while incurring only the most minimal overhead. The access patterns of readers and writers typically differ: Readers are more prone to larger analysis types of queries, and writers are more prone to singleton inserts, updates, and deletes. When these activities are separated, the administrator can focus on recovery strategies for a smaller, more manageable transaction processing system. OLTP databases tend to be much smaller than data redundant decision-support or analysis-oriented databases. Having said this, however, this distinction cannot always be clearly made. After data is copied, transformed, and archived to an analysis-oriented database, the data must be maintained and/or rebuilt periodically. Users definitely benefit from looking at a transactionally consistent version of the database; however, that version of the database no longer has current data, may take many hours to build and index, and might not be what the user really wants. In this situation, snapshot isolation and read committed isolation using row versioning may be useful.
The primary focus of this paper is to discuss when using these isolation levels is appropriate, what the possible trade-offs are, and what are the best practices for using isolation levels.
NoteBefore you read this document, we recommend that you read the topic Concurrency Effects in SQL Server Books Online.
Data Access Patterns and Usage
Production databases are quickly growing in size, and data-retention periods are increasing with changing business and regulatory requirements. Additionally, with drive capacity doubling every 12 to 18 months and storage costs falling, the amount of data users want to keep "online" is increasing. One solution is to separate analysis from transaction processing, and although that may have many benefits for complex detailed analysis and business intelligence probing, it does not always work with regards to disk space and manageability. With the demand for more data to be online with more active queries executing, the need for more current and real-time analysis contention for data exists.
In Microsoft® SQL Server™ 2000, contention can be minimized under read committed transaction isolation when an active SELECT statement releases read locks after a resource is read. The default environment follows the standard SQL-92 definition in that only committed data is read and uncommitted changes are not visible. However, although only committed data can be read, the standard does not guarantee read consistency even within the life of a statement (for example a subset of rows that are returned by a SELECT statement may include changes of a transaction that started after the SELECT but committed before the SELECT could end). The resource lock (a shared lock) is released immediately after processing the row, and that data row can be immediately modified, even while the read is still processing other rows. This modified row might reappear in the same SELECT statement if some concurrent transaction moves the row to the set of rows that have not yet been read (for example, an index key change).
NoteIf data movement is not likely (that is splitting is reduced through thorough and appropriate index creation and maintenance), the chance of rereading a row within a single statement is so significantly low that it becomes hard to produce this anomaly.
In many situations, this is the correct and performant choice. Only committed changes are visible, and they are quickly visible with minimal resources locking. For example, if looking for the current total of sales from a continuously processing system, only an estimate is possible because the value will become "stale" only moments after being accessed when transactions continue to be processed. In fact, an even less restrictive transaction isolation level, called read uncommitted, is often specified with a lock hint by using either the WITH NOLOCK or WITH READUNCOMMITTED hints (these are synonymous). This environment allows for uncommitted data to be read; however, when the count of sales and/or total sales is only an estimate, seeing data that is "in progress" may be acceptable. When this is not acceptable, a change in isolation level—made by the programmer to ensure consistency through read repeatability of the data—must be used.
So where do you draw the line? Can statement-level or transaction-level read-consistent data be returned while a system is actively processing? Can you write a long-running query in a production environment, ask for consistency, and not block writers? You cannot address these questions in SQL Server 2000. However, in SQL Server 2005 you can through two optional database-level settings. For statement-level consistency, enabling READ_COMMITTED_SNAPSHOT automatically changes the behavior of read committed isolation. This new behavior offers nonlocking, nonblocking, statement-level read consistency. In this paper, the traditional read committed isolation will be referred to as read committed, and the optional new behavior as read committed using row versioning. For transaction-level consistency, the snapshot new isolation level has been added. Changing to this isolation level will make transaction-level consistency a controllable setting. When these new options are not set, SQL Server 2005 default database behavior works as in earlier versions. This default behavior will continue to be the preferred behavior in many systems in which transaction processing throughput and performance are the highest goals.
If you do not want a form of nonlocking snapshot (either statement level or transaction level), row versioning will be used to track row modifications. To enable this, data writers will pay the cost when an update or delete operation executes, even if there is no reader at the time. The version store in the tempdb database retains version records until all active transactions that require those row versions commit (assuming that the UPDATE or DELETE statement has already committed). Or more accurately, the version store must retain specific version records until the commit or rollback of transactions that are running under row versioning-based isolation that started before the commit of the transaction that made the change. Although this cost of taking a version is minimal, deciding to implement this should not be taken without careful consideration and many best practices in place.
Usage Scenarios
This section explores how the SQL Server 2005 snapshot isolation and read committed isolation using row versioning can help deliver improved performance, reduced latency, and greater developer and database administrator productivity in your organization. The following common business scenarios are discussed:
· Application in online transaction processing
· Ad hoc reporting against live data
· Ad hoc reporting against a copy-managed database
· Overnight reporting against live data
· Migration to a common database technology
Application in Online Transaction Processing
At first glance, the primary use of row versioning-based isolation might seem to be in read-intensive workloads, such as data warehousing and operational reporting systems. With these systems there might be a concurrency impact caused by table-level read locks of complex, long running queries (especially aggregations) against large tables that require a transactionally consistent view of the database that can effectively lock out transactions that need to update the data. However this is not the only application of row versioning-based isolations. The optional new behavior of the read committed isolation level, which works with a snapshot of the data at the statement level, can significantly improve the throughput of mixed-workload systems and still offer transactionally consistent data—for large joins and aggregations. Because the data snapshot guarantees the consistency of the read, long-running conflicts cannot occur. Additionally, in this environment, application changes are not required. The change is made at the database options level.
When pessimistic locking (the way most database vendors traditionally implement the full ANSI standard for levels of transaction isolation) is used, applications typically exhibit blocking. Simultaneous data-access requests from readers and writers within transactions request conflicting locks. This is entirely normal and is not a significant performance bottleneck, provided the blocking is short lived. However, reader/writer contention significantly reduces performance in systems under stress, because any increase in the time it takes to process a transaction (for example, delays that are caused by over-utilized system resources, such as disk I/O, RAM or CPU; and also delays that are caused by poorly written transactions, such as those that hold locks across user interaction) can have a disproportional affect on blocking. The longer a transaction takes to execute, the longer locks are held and the greater the likelihood of blocking.
An example of this might be a car rental company that uses both an internal and Web-based reservation application to book cars on behalf of its customers. Systems such as these have transactions that contend for the same data (that is cars). The system will offer short-running queries that allow the customer service representative to check availability of cars in certain locations before booking them for the customer—this is an area where programming techniques, such as disconnected datasets, are often used to provide optimistic concurrency control, specifically:
- The application queries for all available cars of a certain class, in a specific date range, at a rental location. This query is probably a join of at least a few tables such as Car, Class, Reservation, and Location. Additionally, this query will run under the read committed isolation level to ensure that only committed data is returned to the user.
- The recordset or dataset that is obtained by the query will be disconnected from the database so as to remove any locks held on the data while the data is displayed in the caller's application. This is often called batch optimistic because it emulates the optimistic forms of database concurrency control. It is optimistic in that although the data is active, the likelihood for conflict should be low. The use of row-level timestamps enables the programmer to identify data change and manage conflicting updates with appropriate messages to the user interface.
- The caller will select a specific car, and the dataset will be edited to reflect the reservation.
- The application will then reconnect and try to synchronize the change to the database, using the row-level SQL Server timestamp column to ensure that the data has not been changed by other callers while the data was disconnected.
- The application then reports back to the caller to either report success (the reservation was taken) or to indicate a conflict (the car was taken by another caller) and to offer the chance to try to book another car.
Be aware that the above technique is not truly optimistic. In this design pattern, a significant amount of contention can take place while the query in step 1 is running to find candidate cars. With SQL Server 2005 read committed isolation using row versioning, these requests are given a nonlocking, nonblocking, transactionally consistent version of the data—while the query runs. With this isolation, the locking/blocking load on the server can be reduced, and the live data is not blocked for other customers who want to reserve cars. While this isolation can improve the end-to-end performance for the transactions that are booking cars by eliminating lock waits, it does not necessarily improve the chances that a car that is viewed by the long running query will be available. However, this is an acceptable trade-off. The reservations occur faster and are not blocked by simultaneous requests for car rental data. This leads to increased throughput of transactions, especially under peak workloads, such as those caused by holiday bookings and business travel peak times.