Analysis Services 2005 Performance Guide 61

Microsoft SQL Server 2005 Analysis Services Performance Guide

SQL Server Technical Article

Author: Elizabeth Vitt

Subject Matter Experts:

T.K. Anand

Sasha (Alexander) Berger

Marius Dumitru

Eric Jacobsen

Edward Melomed

Akshai Mirchandani

Mosha Pasumansky

Cristian Petculescu

Carl Rabeler

Wayne Robertson

Richard Tkachuk

Dave Wickert

Len Wyatt

Published: February 2007

Applies To: SQL Server 2005, Service Pack 2

Summary: This white paper describes how application developers can apply performance-tuning techniques to their Microsoft SQLServer2005 Analysis Services Online Analytical Processing (OLAP) solutions.

Copyright

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted in examples herein are fictitious. No association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

Ó 2007 Microsoft Corporation. All rights reserved.

Microsoft, Windows, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

All other trademarks are property of their respective owners.

Analysis Services 2005 Performance Guide 61

Table of Contents

Introduction 6

Enhancing Query Performance 8

Understanding the querying architecture 8

Session management 9

MDX query execution 10

Data retrieval: dimensions 12

Data retrieval: measure group data 15

Optimizing the dimension design 18

Identifying attribute relationships 18

Using hierarchies effectively 22

Maximizing the value of aggregations 24

How aggregations help 24

How the Storage Engine uses aggregations 25

Why not create every possible aggregation? 27

How to interpret aggregations 29

Which aggregations are built 30

How to impact aggregation design 31

Suggesting aggregation candidates 32

Specifying statistics about cube data 36

Adopting an aggregation design strategy 39

Using partitions to enhance query performance 40

How partitions are used in querying 41

Designing partitions 41

Aggregation considerations for multiple partitions 43

Writing efficient MDX 44

Specifying the calculation space 44

Removing empty tuples 47

Summarizing data with MDX 55

Taking advantage of the Query Execution Engine cache 58

Applying calculation best practices 60

Tuning Processing Performance 61

Understanding the processing architecture 61

Processing job overview 61

Dimension processing jobs 62

Dimension-processing commands 64

Partition-processing jobs 65

Partition-processing commands 65

Executing processing jobs 66

Refreshing dimensions efficiently 67

Optimizing the source query 67

Reducing attribute overhead 68

Optimizing dimension inserts, updates, and deletes 70

Refreshing partitions efficiently 71

Optimizing the source query 71

Using partitions to enhance processing performance 72

Optimizing data inserts, updates, and deletes 72

Evaluating rigid vs. flexible aggregations 73

Optimizing Special Design Scenarios 76

Special aggregate functions 76

Optimizing distinct count 76

Optimizing semiadditive measures 78

Parent-child hierarchies 79

Complex dimension relationships 79

Many-to-many relationships 80

Reference relationships 82

Near real-time data refreshes 86

Tuning Server Resources 91

Understanding how Analysis Services uses memory 92

Memory management 92

Shrinkable vs. non-shrinkable memory 94

Memory demands during querying 95

Memory demands during processing 96

Optimizing memory usage 97

Increasing available memory 97

Monitoring memory management 97

Minimizing metadata overhead 98

Monitoring the timeout of idle sessions 99

Tuning memory for partition processing 100

Warming the data cache 101

Understanding how Analysis Services uses CPU resources 103

Job architecture 103

Thread pools 103

Processor demands during querying 104

Processor demands during processing 104

Optimizing CPU usage 105

Maximize parallelism during querying 105

Maximize parallelism during processing 107

Use sufficient memory 109

Use a load-balancing cluster 109

Understanding how Analysis Services uses disk resources 110

Disk resource demands during processing 110

Disk resource demands during querying 110

Optimizing disk usage 111

Using sufficient memory 111

Optimizing file locations 111

Disabling unnecessary logging 111

Conclusion 112

Appendix A – For More Information 113

Appendix B - Partition Storage Modes 113

Multidimensional OLAP (MOLAP) 113

Hybrid OLAP (HOLAP) 114

Relational OLAP (ROLAP) 115

Appendix C – Aggregation Utility 116

Benefits of the Aggregation Utility 116

How the Aggregation Utility organizes partitions 117

How the Aggregation Utility works 118

Analysis Services 2005 Performance Guide 61

Introduction

Fast query response times and timely data refresh are two well-established performance requirements of Online Analytical Processing (OLAP) systems. To provide fast analysis, OLAP systems traditionally use hierarchies to efficiently organize and summarize data. While these hierarchies provide structure and efficiency to analysis, they tend to restrict the analytic freedom of end users who want to freely analyze and organize data on the fly.

To support a broad range of structured and flexible analysis options, Microsoft® SQLServer™ Analysis Services (SSAS) 2005 combines the benefits of traditional hierarchical analysis with the flexibility of a new generation of attribute hierarchies. Attribute hierarchies allow users to freely organize data at query time, rather than being limited to the predefined navigation paths of the OLAP architect. To support this flexibility, the Analysis Services OLAP architecture is specifically designed to accommodate both attribute and hierarchical analysis while maintaining the fast query performance of conventional OLAP databases.

Realizing the performance benefits of this combined analysis paradigm requires understanding how the OLAP architecture supports both attribute hierarchies and traditional hierarchies, how you can effectively use the architecture to satisfy your analysis requirements, and how you can maximize the architecture’s utilization of system resources.

NoteTo apply the performance tuning techniques discussed in this white paper, you must have SQL Server2005 Service Pack2 installed.

To satisfy the performance needs of various OLAP designs and server environments, this white paper provides extensive guidance on how you can take advantage of the wide range of opportunities to optimize Analysis Services performance. Since Analysis Services performance tuning is a fairly broad subject, this white paper organizes performance tuning techniques into the following four segments.

Enhancing Query Performance - Query performance directly impacts the quality of the end user experience. As such, it is the primary benchmark used to evaluate the success of an OLAP implementation. Analysis Services provides a variety of mechanisms to accelerate query performance, including aggregations, caching, and indexed data retrieval. In addition, you can improve query performance by optimizing the design of your dimension attributes, cubes, and MDX queries.

Tuning Processing Performance - Processing is the operation that refreshes data in an Analysis Services database. The faster the processing performance, the sooner users can access refreshed data. Analysis Services provides a variety of mechanisms that you can use to influence processing performance, including efficient dimension design, effective aggregations, partitions, and an economical processing strategy (for example, incremental vs. full refresh vs. proactive caching).

Optimizing Special Design Scenarios – Complex design scenarios require a distinct set of performance tuning techniques to ensure that they are applied successfully, especially if you combine a complex design with large data volumes. Examples of complex design components include special aggregate functions, parent-child hierarchies, complex dimension relationships, and “near real-time” data refreshes.

Tuning Server Resources – Analysis Services operates within the constraints of available server resources. Understanding how Analysis Services uses memory, CPU, and disk resources can help you make effective server management decisions that optimize querying and processing performance.

Three appendices provide links to additional resources, information on various partition storage modes, andguidance on using the Aggregation Utility that is a part of SQLServer2005 Service Pack2 samples.

Analysis Services 2005 Performance Guide 61

Enhancing Query Performance

Querying is the operation where Analysis Services provides data to client applications according to the calculation and data requirements of a MultiDimensional eXpressions (MDX) query. Since query performance directly impacts the user experience, this section describes the most significant opportunities to improve query performance. Following is an overview of the query performance topics that are addressed in this section:

Understanding the querying architecture - The Analysis Services querying architecture supports three major operations: session management, MDX query execution, and data retrieval. Optimizing query performance involves understanding how these three operations work together to satisfy query requests.

Optimizing the dimension design - A well-tuned dimension design is perhaps one of the most critical success factors of a high-performing Analysis Services solution. Creating attribute relationships and exposing attributes in hierarchies are design choices that influence effective aggregation design, optimized MDX calculation resolution, and efficient dimension data storage and retrieval from disk.

Maximizing the value of aggregations - Aggregations improve query performance by providing precalculated summaries of data. To maximize the value of aggregations, ensure that you have an effective aggregation design that satisfies the needs of your specific workload.

Using partitions to enhance query performance - Partitions provide a mechanism to separate measure group data into physical units that improve query performance, improve processing performance, and facilitate data management. Partitions are naturally queried in parallel; however, there are some design choices and server property optimizations that you can specify to optimize partition operations for your server configuration.

Writing efficient MDX - This section describes techniques for writing efficient MDX statements such as: 1)writing statements that address a narrowly defined calculation space, 2)designing calculations for the greatest re-usage across multiple users, and 3)writing calculations in a straight-forward manner to help the Query Execution Engine select the most efficient execution path.

Understanding the querying architecture

To make the querying experience as fast as possible for end users, the Analysis Services querying architecture provides several components that work together to efficiently retrieve and evaluate data. Figure1 identifies the three major operations that occur during querying: session management, MDX query execution, and data retrieval as well as the server components that participate in each operation.

Figure 1Analysis Services querying architecture

Session management

Client applications communicate with Analysis Services using XML for Analysis (XMLA) over TCP IP or HTTP. Analysis Services provides an XMLA listener component that handles all XMLA communications between Analysis Services and its clients. The Analysis Services Session Manager controls how clients connect to an Analysis Services instance. Users authenticated by Microsoft® Windows and who have rights to Analysis Services can connect to Analysis Services. After a user connects to Analysis Services, the Security Manager determines user permissions based on the combination of Analysis Services roles that apply to the user. Depending on the client application architecture and the security privileges of the connection, the client creates a session when the application starts, and then reuses the session for all of the user’s requests. The session provides the context under which client queries are executed by the Query Execution Engine. A session exists until it is either closed by the client application, or until the server needs to expire it. For more information regarding the longevity of sessions, see Monitoring the timeout of idle sessions in this white paper.

MDX query execution

The primary operation of the Query Execution Engine is to execute MDX queries. This section provides an overview of how the Query Execution Engine executes queries. To learn more details about optimizing MDX, see Writing efficient MDX later in this white paper.

While the actual query execution process is performed in several stages, from a performance perspective, the Query Execution engine must consider two basic requirements: retrieving data and producing the result set.

  1. Retrieving data—To retrieve the data requested by a query, the Query Execution Engine decomposes each MDX query into data requests. To communicate with the Storage Engine, the Query Execution Engine must translate the data requests into subcube requests that the Storage Engine can understand. A subcube represents a logical unit of querying, caching, and data retrieval. An MDX query may be resolved into one or more subcube requests depending on query granularity and calculation complexity. Note that the word subcube is a generic term. For example, the subcubes that the Query Execution Engine creates during query evaluation are not to be confused with the subcubes that you can create using the MDX CREATE SUBCUBE statement.
  2. Producing the result set—To manipulate the data retrieved from the Storage Engine, the Query Execution Engine uses two kinds of execution plans to calculate results: it can bulk calculate an entire subcube, or it can calculate individual cells. In general, the subcube evaluation path is more efficient; however, the Query Execution Engine ultimately selects execution plans based on the complexities of each MDX query. Note that a given query can have multiple execution plans for different parts of the query and/or different calculations involved in the same query. Moreover, different parts of a query may choose either one of these two types of execution plans independently, so there is not a single global decision for the entire query. For example, if a query requests resellers whose year-over-year profitability is greater than 10%, the Query Execution Engine may use one execution plan to calculate each reseller’s year-over-year profitability and another execution plan to only return those resellers whose profitability is greater than 10%.

When you execute an MDX calculation, the Query Execution Engine must often execute the calculation across more cells than you may realize. Consider the example where you have an MDX query that must return the calculated year-to-date sales across the top five regions. While it may seem like you are only returning five cell values, Analysis Services must execute the calculation across additional cells in order to determine the top five regions and also to return their year to date sales. A general MDX optimization technique is to write MDX queries in a way that minimizes the amount of data that the Query Execution Engine must evaluate. To learn more about this MDX optimization technique, see Specifying the calculation space later in this white paper.