Performance and Sizing

Sizing and Performance

Performance

ASAP for BW Accelerator

SAP Business Information Warehouse

Performance issues and tuning of a BW system

Version 2.0

June 2000

Table of Contents

ASAP for BW Accelerator 1

1 Introduction 1

1.1 Software Version Supported 1

2 Performance 1

2.1 Influencing factors 1

2.1.1 Golden rules 1

2.1.2 InfoCube Design 2

2.1.3 Loading of Data 6

2.1.4 Querying 13

2.1.5 Size of Queries 13

2.1.6 Batch printing 13

2.2 System settings 14

2.2.1 Database Storage Parameters 14

2.2.2 DB Optimizer statistics 14

2.3 Monitoring 14

2.3.1 BW Statistics 14

3 Measurements 15

4 Detailed Information 15

Ó20001999 SAP AG 2

Sizing and Performance

1 Introduction

This document describes performance issues in a BW system. Monitoring tools will be presented that help you improve performance.

1.1 Software Version Supported

This document was written specifically for BW version 2.0B, but it should apply to all versions of BW.

2 Performance

Database (DB) functionality, Business Information Warehouse (BW) coding, and the system’s implementation influence the performance of a BW System.

Improvements in database platforms, tools and basis technology will constantly be incorporated into BW coding to achieve a better performance. Code will also be optimized based on experiences being made with customer installations.

This paper will focus on those issues, which need to be dealt with during your BW implementation or later in production for achieving better performance.

2.1 Influencing factors

2.1.1 Golden rules

The most crucial factors which influence the performance of data loading and querying are listed below. Paying attention to these golden rules will help you to avoid unnecessary performance problems. Of course there are further factors which influence the performance, they are described in the other chapters.

· Data loading

· Aggregates when loading deltas (2.1.3.4.2)

· Buffering of number ranges (2.1.3.4.1, 2.1.3.5.1)

· InfoCube design (2.1.2)

· Load master data before transaction data (2.1.3.3)

· Parallel upload (2.1.3.5.1)

· Package size (2.1.3.1)

· Secondary indexes for fact table dropped? (2.1.3.5.1)

· Use of Persisten Staging Area (PSA) (2.1.3.3)

· Querying

· Aggregates (see methodology paper “Aggregates”)

· Avoid huge query results (2.1.5, 2.1.4)

· DB - Statistics (2.2.2)

· Hierarchies (2.1.2.7)

· InfoCube design (2.1.2)

· Navigational Attributes (2.1.2.6)

· Secondary indexes existing and analyzed? (2.1.3.5.2)

2.1.2 InfoCube Design

Before starting to create InfoCubes in SAP BW it is crucial to seriously consider the data model. Data modeling is often a controversial topic and many approaches exist. In this document you’ll only find a short discussion of this issue since the design of an InfoCube has a significant influence on the performance. Data Modeling is treated in detail in a separate Methodology paper entitled “Data Modeling with BW”.

When designing InfoCubes you should consider

· business processes & data

· users’ reporting requirements

· decision processes

· level of detail required

In this section we summarize the most important data modeling issues with respect to performance. Although we are primarily addressing query performance issues, some issues related to data uploads will be discussed as well.

2.1.2.1 Fact table

The fact table consists of dimension table keys and key figures.

Characteristics indirectly define the fact table key and are values, which describe the key figures more exactly. Examples of characteristics might be customer, order, material, or year. In BW characteristics are grouped into dimensions. The fact table’s key consists of all the pointers to the associated dimensions around the fact table. For each combination of characteristics uploaded into BW the corresponding key figures are to be found in the fact table.

Key figures usually are additive numeric values (for example, amount, quantity and numeric values) but they can also be values such as average, unit price, auxiliary date fields, non-cumulative values, and non additive calculations (for example, price).

2.1.2.2 Fact table granularity

Volume is always a concern for fact tables. The level of detail has a large impact on querying efficiencies and overall storage requirements.

The grain of the fact table is directly impacted by the dimension table design because the most atomic characteristic in each dimension determines the grain of the fact table.

Let’s say, for example, that The performance of outlets and articles needs to be analyzed. Descriptive attributes are: outlet, receipts, articles, customers, time. Limit analysis to articles and time, and further assume 1.000 articles are grouped by 10 article groups. The article group performance is tracked on a weekly basis.

· Granularity: article group, week, and 300 sales days a year (45 weeks)
10 x 45 = 450 records in the fact table per year due to only these two attributes if all articles are sold within a week.

· Granularity: article, week, 300 sales days a year (45 weeks)
1,000 x 45 = 45,000 records in the fact table per year due to only these two attributes if all articles are sold within a week.

· Granularity: article, day, 300 sales days a year
1.000 x 300 = 300,000 records in the fact table per year due to only these two attributes if all articles are sold within a day.

· Granularity: article, hour, 300 sales days a year, 12 sales hours a day
500 x 300 x 12 = 1,800,000 records in the fact table per year due to only these two attributes if on average 500 articles are sold within an hour.

2.1.2.3 Fact table considerations

Large fact tables impact reporting and analysis. Therefore you should always consider whether the use of aggregates of the fact table are feasible methods for improving performance and reducing hardware needs.

In addition, consider partitioning of the fact table. Many database platforms support table partitioning. Partitioning can only be setup on the E table (storing the compressed requests, see 2.1.3.7) of an InfoCube in the BW system before any data has been loaded into the InfoCube. Currently partitioning by calendar month or fiscal period is possible. Another concept of partitioning on a logical level is available in BW: MultiCubes. Setting up a cube as MultiCube enables you to read from and load into smaller cubes in parallel, thus improving performance. More information on MultiCubes is available in the methodology paper “ Multi-Dimensional Modeling with BW”.

Furthermore keep the number of key figures to a minimum. Avoid storing values that can be calculated. For example, instead ofstoring the average price,store quantity and revenue. The average price can be calculated in the query (revenue/quantity).

2.1.2.4 Dimension tables

Each InfoCube may have up to 16 dimensions. There are 3 default dimensions: time, unit, and package. This leaves a maximum of 13 user defined dimensions. Possible dimensions could be: customer, order, date, and material. A maximum of 248 characteristics can be defined in each dimension.

A dimension should be defined in such a way that each row in a dimension table has several corresponding rows in the fact table.

The fact table and dimension tables are arranged according to the star schema. This means for each query first the dimensions will be browsed and then with the gathered key values all records in the fact table will be selected which have the same values in the fact table key.

In general, dimensions should be modeled in such a way that the number of instances are kept small. This means that the dimension tables should be small. This is important for the star join mechanism to work properly. Some rules of thumb are:

(a) The ratios

size of dimension table / size of fact table

should be less than 15%. This limit is heuristic and has no deep scientific background.

(b) No combination of characteristics that are put into the same dimension should have an n:m relationship. For example, it is usually not a good idea to put customers and products into the same dimension: customers buy many different products and each product is bought by many customers. Thus there is an n:m relationship between these two entities and, as a consequence, many combination of customers and products as entries in the corresponding dimension table. Therefore n:m relationships are likely to violate rule (a).

· example: 10,000 customers and 10,000 products
for two dimensions: 10,000 + 10,000 = 20,000 records
for one dimension: 10,000 x 10,000 = 100,000,000 records

Obviously, if either n or m is small (i.e. 2, 3 or 4) then this should not necessarily be considered as a violation of rule (b).

(c) It is better to have many dimensions with few characteristics rather than a few dimensions, each with many characteristics.

(d) If you have a characteristic which has a different instance for almost every fact table record (a line item characteristic) you can set up a dimension as a line item dimension and include only this characteristic in the dimension (before any data is loaded into the InfoCube). For this dimension no separate dimension table is created but the characteristic is included in the fact table itself, thus improving performance for both loading and querying.

2.1.2.5 Master data tables

Master data is a common description for values that are InfoCube-independent, i.e. they can be used with several InfoCubes.

Master data may have a descriptive text and can be used with hierarchies.

Usually a master data table exists for each characteristic in a dimension table. Besides the key the master data tables can contain additional navigational attributes which behave like characteristics. For example, customer may contain customer number, customer group, customer region, customer name, and customer address.

2.1.2.6 Characteristics vs. Navigational Attributes

Using navigational attributes always incurs a performance penalty in comparison to the situation in which the same InfoObject is used as a characteristic. Therefore, you should carefully consider whether an InfoObject is used as a navigational attribute or as a characteristic.

2.1.2.7 Hierarchies

In BW, the are essentially three possibilities for modeling hierarchies:

· as a hierarchy of characteristics within a dimension

· as a hierarchy of attributes attached to a characteristic

· as an external hierarchy

Let us take a quick look at the pros and cons of those different modeling techniques.

2.1.2.7.1 Hierarchies within a Dimension

A typical example for a hierarchy fitting into this context is a time hierarchy with levels such as millennium – century – decade – year – month – day – hour. Another typical example is a geographic hierarchy with levels such as continent – country – state – region – city.

Hierarchies that can be modeled within a dimension have certain properties:

· The number of levels is fixed; each level is represented by an InfoObject.

Example: A geographic dimension with InfoObjects 0COUNTRY (country), 0REGION (region) and 0CITY (city).

· Either the hierarchy does not change or its changes do not apply to the past (for example, facts that are already loaded into an InfoCube).

For example, the geographic hierarchy above changed during German unification. A city like "Dresden" suddenly belonged to an other country. However, this change should not usually affect data/facts that refer to the time before German unification as at that time the previous geographical dependencies applied.

The performance aspects of this technique are:

· Queries to InfoCubes that use these kinds of hierarchies are generally faster than the same queries to InfoCubes that model the same scenario with one of the two other hierarchy modeling techniques

· However, BW does not explicitly know about hierarchical dependencies. Therefore aggregates that summarize data over regions are not used for queries that summarize over countries if the country is not included in that aggregate as well. Therefore you should always (manually) include the hierarchical levels to such an aggregate that lie above the level over which data is summarized.

Example 1: If an aggregate summarizes data over 0REGION then include 0COUNTRY in that aggregate, too.

Example 2: If an aggregate summarizes data over months then include years, decades.

2.1.2.7.2 Hierarchies as Navigational Attributes of a Characteristic

This case is very similar to the one discussed in the previous section. The difference is the increased flexibility (for example, realignment facilities) that comes with navigational attributes. The hierarchy should still have a fixed number of levels. However, changes to that hierarchy (i.e. changes to attribute values) can be easily applied to facts that are already loaded into a cube. This is the essential difference with section 2.1.2.7.1.

A typical example is the hierarchy of sales office – sales group – sales person. This hierarchy has a fixed number of levels but is frequently reorganized.

From a performance perspective the same arguments hold as in section 2.1.2.6. In general, this is the least attractive hierarchy modeling technique as it performs worse than 2.1.2.7.1 and frequently not better than the one in the following section. It is more flexible than 2.1.2.7.1 but less flexible than 2.1.2.7.3.

2.1.2.7.3 External Hierarchies

An ideal external hierarchy

· frequently changes and/or

· has no fixed number of levels (sometimes referred to as a unbalanced hierarchy).

A typical example is a cost center hierarchy in which several (sub-)cost centers belong to one cost center which itself belongs to another cost center and so on. Such a hierarchy which has no fixed number of levels as cost centers usually corresponds to departments or groups within a company which might be reorganized into new subgroups. Thus, new levels might be introduced, old ones disappear, the hierarchy might be deeper at one end (due to a deeper hierarchical organization) and shallow at the other end.

Another major advantage of external hierarchies vs. their alternatives is that an InfoObject can have several such hierarchies and all these can be used within the same InfoCube. The same effect could only be achieved through unpleasant work-arounds when using the alternative approaches.

The performance issues connected to this type of hierarchy are the following:

· External hierarchies usually perform worse than those modeled within dimensions.

· They usually perform at least as well as the hierarchies based on navigational attributes.

· Problems can arise for big external hierarchies containing many thousands of nodes and leaves. In that case it might be better to consider one of the other two alternatives.