Analysis Services 2005 Performance Guide 61
Microsoft SQL Server 2005 Analysis Services Performance Guide
SQL Server Technical Article
Author: Elizabeth Vitt
Subject Matter Experts:
T.K. Anand
Sasha (Alexander) Berger
Marius Dumitru
Eric Jacobsen
Edward Melomed
Akshai Mirchandani
Mosha Pasumansky
Cristian Petculescu
Carl Rabeler
Wayne Robertson
Richard Tkachuk
Dave Wickert
Len Wyatt
Published: February 2007
Applies To: SQL Server 2005, Service Pack 2
Summary: This white paper describes how application developers can apply performance-tuning techniques to their Microsoft SQLServer2005 Analysis Services Online Analytical Processing (OLAP) solutions.
Copyright
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.
Ó 2007 Microsoft Corporation. All rights reserved.
Microsoft, Windows, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
All other trademarks are property of their respective owners.
Analysis Services 2005 Performance Guide 61
Table of Contents
Introduction 6
Enhancing Query Performance 8
Understanding the querying architecture 8
Session management 9
MDX query execution 10
Data retrieval: dimensions 12
Data retrieval: measure group data 15
Optimizing the dimension design 18
Identifying attribute relationships 18
Using hierarchies effectively 22
Maximizing the value of aggregations 24
How aggregations help 24
How the Storage Engine uses aggregations 25
Why not create every possible aggregation? 27
How to interpret aggregations 29
Which aggregations are built 30
How to impact aggregation design 31
Suggesting aggregation candidates 32
Specifying statistics about cube data 36
Adopting an aggregation design strategy 39
Using partitions to enhance query performance 40
How partitions are used in querying 41
Designing partitions 41
Aggregation considerations for multiple partitions 43
Writing efficient MDX 44
Specifying the calculation space 44
Removing empty tuples 47
Summarizing data with MDX 55
Taking advantage of the Query Execution Engine cache 58
Applying calculation best practices 60
Tuning Processing Performance 61
Understanding the processing architecture 61
Processing job overview 61
Dimension processing jobs 62
Dimension-processing commands 64
Partition-processing jobs 65
Partition-processing commands 65
Executing processing jobs 66
Refreshing dimensions efficiently 67
Optimizing the source query 67
Reducing attribute overhead 68
Optimizing dimension inserts, updates, and deletes 70
Refreshing partitions efficiently 71
Optimizing the source query 71
Using partitions to enhance processing performance 72
Optimizing data inserts, updates, and deletes 72
Evaluating rigid vs. flexible aggregations 73
Optimizing Special Design Scenarios 76
Special aggregate functions 76
Optimizing distinct count 76
Optimizing semiadditive measures 78
Parent-child hierarchies 79
Complex dimension relationships 79
Many-to-many relationships 80
Reference relationships 82
Near real-time data refreshes 86
Tuning Server Resources 91
Understanding how Analysis Services uses memory 92
Memory management 92
Shrinkable vs. non-shrinkable memory 94
Memory demands during querying 95
Memory demands during processing 96
Optimizing memory usage 97
Increasing available memory 97
Monitoring memory management 97
Minimizing metadata overhead 98
Monitoring the timeout of idle sessions 99
Tuning memory for partition processing 100
Warming the data cache 101
Understanding how Analysis Services uses CPU resources 103
Job architecture 103
Thread pools 103
Processor demands during querying 104
Processor demands during processing 104
Optimizing CPU usage 105
Maximize parallelism during querying 105
Maximize parallelism during processing 107
Use sufficient memory 109
Use a load-balancing cluster 109
Understanding how Analysis Services uses disk resources 110
Disk resource demands during processing 110
Disk resource demands during querying 110
Optimizing disk usage 111
Using sufficient memory 111
Optimizing file locations 111
Disabling unnecessary logging 111
Conclusion 112
Appendix A – For More Information 113
Appendix B - Partition Storage Modes 113
Multidimensional OLAP (MOLAP) 113
Hybrid OLAP (HOLAP) 114
Relational OLAP (ROLAP) 115
Appendix C – Aggregation Utility 116
Benefits of the Aggregation Utility 116
How the Aggregation Utility organizes partitions 117
How the Aggregation Utility works 118
Analysis Services 2005 Performance Guide 61
Introduction
Fast query response times and timely data refresh are two well-established performance requirements of Online Analytical Processing (OLAP) systems. To provide fast analysis, OLAP systems traditionally use hierarchies to efficiently organize and summarize data. While these hierarchies provide structure and efficiency to analysis, they tend to restrict the analytic freedom of end users who want to freely analyze and organize data on the fly.
To support a broad range of structured and flexible analysis options, Microsoft® SQLServer™ Analysis Services (SSAS) 2005 combines the benefits of traditional hierarchical analysis with the flexibility of a new generation of attribute hierarchies. Attribute hierarchies allow users to freely organize data at query time, rather than being limited to the predefined navigation paths of the OLAP architect. To support this flexibility, the Analysis Services OLAP architecture is specifically designed to accommodate both attribute and hierarchical analysis while maintaining the fast query performance of conventional OLAP databases.
Realizing the performance benefits of this combined analysis paradigm requires understanding how the OLAP architecture supports both attribute hierarchies and traditional hierarchies, how you can effectively use the architecture to satisfy your analysis requirements, and how you can maximize the architecture’s utilization of system resources.
NoteTo apply the performance tuning techniques discussed in this white paper, you must have SQL Server2005 Service Pack2 installed.
To satisfy the performance needs of various OLAP designs and server environments, this white paper provides extensive guidance on how you can take advantage of the wide range of opportunities to optimize Analysis Services performance. Since Analysis Services performance tuning is a fairly broad subject, this white paper organizes performance tuning techniques into the following four segments.
Enhancing Query Performance - Query performance directly impacts the quality of the end user experience. As such, it is the primary benchmark used to evaluate the success of an OLAP implementation. Analysis Services provides a variety of mechanisms to accelerate query performance, including aggregations, caching, and indexed data retrieval. In addition, you can improve query performance by optimizing the design of your dimension attributes, cubes, and MDX queries.
Tuning Processing Performance - Processing is the operation that refreshes data in an Analysis Services database. The faster the processing performance, the sooner users can access refreshed data. Analysis Services provides a variety of mechanisms that you can use to influence processing performance, including efficient dimension design, effective aggregations, partitions, and an economical processing strategy (for example, incremental vs. full refresh vs. proactive caching).
Optimizing Special Design Scenarios – Complex design scenarios require a distinct set of performance tuning techniques to ensure that they are applied successfully, especially if you combine a complex design with large data volumes. Examples of complex design components include special aggregate functions, parent-child hierarchies, complex dimension relationships, and “near real-time” data refreshes.
Tuning Server Resources – Analysis Services operates within the constraints of available server resources. Understanding how Analysis Services uses memory, CPU, and disk resources can help you make effective server management decisions that optimize querying and processing performance.
Three appendices provide links to additional resources, information on various partition storage modes, andguidance on using the Aggregation Utility that is a part of SQLServer2005 Service Pack2 samples.
Analysis Services 2005 Performance Guide 61
Enhancing Query Performance
Querying is the operation where Analysis Services provides data to client applications according to the calculation and data requirements of a MultiDimensional eXpressions (MDX) query. Since query performance directly impacts the user experience, this section describes the most significant opportunities to improve query performance. Following is an overview of the query performance topics that are addressed in this section:
Understanding the querying architecture - The Analysis Services querying architecture supports three major operations: session management, MDX query execution, and data retrieval. Optimizing query performance involves understanding how these three operations work together to satisfy query requests.
Optimizing the dimension design - A well-tuned dimension design is perhaps one of the most critical success factors of a high-performing Analysis Services solution. Creating attribute relationships and exposing attributes in hierarchies are design choices that influence effective aggregation design, optimized MDX calculation resolution, and efficient dimension data storage and retrieval from disk.
Maximizing the value of aggregations - Aggregations improve query performance by providing precalculated summaries of data. To maximize the value of aggregations, ensure that you have an effective aggregation design that satisfies the needs of your specific workload.
Using partitions to enhance query performance - Partitions provide a mechanism to separate measure group data into physical units that improve query performance, improve processing performance, and facilitate data management. Partitions are naturally queried in parallel; however, there are some design choices and server property optimizations that you can specify to optimize partition operations for your server configuration.
Writing efficient MDX - This section describes techniques for writing efficient MDX statements such as: 1)writing statements that address a narrowly defined calculation space, 2)designing calculations for the greatest re-usage across multiple users, and 3)writing calculations in a straight-forward manner to help the Query Execution Engine select the most efficient execution path.
Understanding the querying architecture
To make the querying experience as fast as possible for end users, the Analysis Services querying architecture provides several components that work together to efficiently retrieve and evaluate data. Figure1 identifies the three major operations that occur during querying: session management, MDX query execution, and data retrieval as well as the server components that participate in each operation.
Figure 1Analysis Services querying architecture
Session management
Client applications communicate with Analysis Services using XML for Analysis (XMLA) over TCP IP or HTTP. Analysis Services provides an XMLA listener component that handles all XMLA communications between Analysis Services and its clients. The Analysis Services Session Manager controls how clients connect to an Analysis Services instance. Users authenticated by Microsoft® Windows and who have rights to Analysis Services can connect to Analysis Services. After a user connects to Analysis Services, the Security Manager determines user permissions based on the combination of Analysis Services roles that apply to the user. Depending on the client application architecture and the security privileges of the connection, the client creates a session when the application starts, and then reuses the session for all of the user’s requests. The session provides the context under which client queries are executed by the Query Execution Engine. A session exists until it is either closed by the client application, or until the server needs to expire it. For more information regarding the longevity of sessions, see Monitoring the timeout of idle sessions in this white paper.
MDX query execution
The primary operation of the Query Execution Engine is to execute MDX queries. This section provides an overview of how the Query Execution Engine executes queries. To learn more details about optimizing MDX, see Writing efficient MDX later in this white paper.
While the actual query execution process is performed in several stages, from a performance perspective, the Query Execution engine must consider two basic requirements: retrieving data and producing the result set.
- Retrieving data—To retrieve the data requested by a query, the Query Execution Engine decomposes each MDX query into data requests. To communicate with the Storage Engine, the Query Execution Engine must translate the data requests into subcube requests that the Storage Engine can understand. A subcube represents a logical unit of querying, caching, and data retrieval. An MDX query may be resolved into one or more subcube requests depending on query granularity and calculation complexity. Note that the word subcube is a generic term. For example, the subcubes that the Query Execution Engine creates during query evaluation are not to be confused with the subcubes that you can create using the MDX CREATE SUBCUBE statement.
- Producing the result set—To manipulate the data retrieved from the Storage Engine, the Query Execution Engine uses two kinds of execution plans to calculate results: it can bulk calculate an entire subcube, or it can calculate individual cells. In general, the subcube evaluation path is more efficient; however, the Query Execution Engine ultimately selects execution plans based on the complexities of each MDX query. Note that a given query can have multiple execution plans for different parts of the query and/or different calculations involved in the same query. Moreover, different parts of a query may choose either one of these two types of execution plans independently, so there is not a single global decision for the entire query. For example, if a query requests resellers whose year-over-year profitability is greater than 10%, the Query Execution Engine may use one execution plan to calculate each reseller’s year-over-year profitability and another execution plan to only return those resellers whose profitability is greater than 10%.
When you execute an MDX calculation, the Query Execution Engine must often execute the calculation across more cells than you may realize. Consider the example where you have an MDX query that must return the calculated year-to-date sales across the top five regions. While it may seem like you are only returning five cell values, Analysis Services must execute the calculation across additional cells in order to determine the top five regions and also to return their year to date sales. A general MDX optimization technique is to write MDX queries in a way that minimizes the amount of data that the Query Execution Engine must evaluate. To learn more about this MDX optimization technique, see Specifying the calculation space later in this white paper.