Dw: Star Versus Star Transformation D-615

DW: STAR VERSUS STAR TRANSFORMATION D-615

Performance issues with Star Schemas

A frequent problem with queries against a Star Schema is that the selection criteria is spread among the Dimension tables—that is, the query ‘filtering’ is not very restrictive for any single Dimension table. This means that even if the query as a whole only returns a few rows, the intermediate processing will necessarily degrade the query.

Example:

Find all transactions listed in the SALES fact table (100 m rows)

Where

Quarter = ‘Q1-2000’

Manuf = ‘Coleman

Product = ‘Camping Gear’

Store = ‘Boston’

‘Old’ Way to Join:

Execution Plan: Total estimated time: 11 minutes.

Join SALES - STOREreturns 500,000 rows10 mins

Join above result to MANUFreturns 50,000 rows1 min

Join above result to PRODUCTreturns 1,000 rows30 sec

Join above result to QUARTERreturns 250 rows02 sec

Clearly, no matter which table we join first, we must “wade through” many rows because each restricting condition is considered separately, rather than jointly. That is, the earliest joins must “pay the price.”

STAR Join

With the Star Join, prior to joining to the Fact table, the optimizer jointly considers the dimension constraints. The optimizer builds a list of all possible combinations (Cartesian product) of the Dimension rows that meet the selection criteria. Then, this set of rows is used to access the Fact table, via a (B*tree) composite index:

Execution Plan: Total estimated time: 5 seconds.

Step 1: Find Cartesian product of Dimension rows meeting search criteria:

Quarter = ‘Q1-2000’ 90 rows

Manuf = ‘Coleman x 1 row

Product = ‘Sleeping Bag’ x 1 row

Store = ‘Boston’ x 1 row = 90 rows total

Step 2: Using this set of 90 rows, access Sales table via index on columns {Quarter,

Manuf, Product, Store}

Problems with STAR Join Method

Although the Star Join can provide dramatic performance improvement for some queries, there are significant drawbacks and limitations:

1)This method requires huge composite indexes. In fact, Multiple huge indexes will likely be required, so that the “leading columns” of at least one index will always be available to meet the applicable selection criteria.

2)For some queries, the Cartesian product can be huge, requiring millions of index lookups against the Fact table, even if the ultimate result set is small. A huge Cartesian product is especially likely if there are a large number of Dimension tables.

For example, the following query would produce over a millions rows to check, with

big performance degradation:

Quarter = ‘Q1-2000’ 90 rows

Manuf like ‘C%’ x 10 rows

Product like ‘Sleeping’ x 10 rows

Store_Area = ‘East’ x 20 rows = 1,800,000 rows total

Thus, in this scenario, the Star Join would require performing 1.8 million index lookups—certainly a poor optimization choice.

STAR Transformation

Oracle’s newest optimization method also employs preprocessing, but with a different (and tricky) twist. Instead of building a Cartesian Product of all possible combinations, the Star Transformation method uses the inherent speed of combining bitmaps. A key to this method is the RI relationship in star schemas—that is, entries in the Fact table are children of the Dimension tables. This method also requires that the Fact table have a bitmap index for each foreign key in the Fact pointing to a Dimension table.

First, for eachDimension table, rows meeting the query’s criteria are determined. Only the Primary Keys for the compliant rows are retrieved at this point. These PK values are used to get the bitmaps on the Fact table that match these PK values. The bitmaps are “OR’ed” together. This yields a bitmap of all rows in the Fact table that meet the query’s restriction for a single Dimension table. The process is repeated for all Dimensions, yielding one summary bitmap per Dimension.

The bitmaps for all the Dimension tables are combined (“AND’ed”). This yields a “final” bitmap of the Fact table rows that meet ALL restrictions on the Dimension tables.

Execution Plan:

Step 1: Scan each dimension table to fetch primary key(s) that match the search criteria.

QUARTER90 Pk Values{‘1/1/2000’, ‘1/2/2000’, 1/3/2000’, etc.}

MANUF10 Pk Values{‘Coleman’, ‘Clark’, ‘Clancy’, etc.}

PRODUCT10 Pk Values{‘Sleeping Bag’, ‘Sleepware’, ‘Sleeping Tent’, etc.}

STORE: 20 Pk Values{‘Boston’, ‘New York’, etc.}

Step 2: Access matching bitmap index on Fact table. Combine bitmaps (‘OR’) for all keys returned from Step 1:

QUARTERMerge 90 bitmaps, yielding 1 bitmap

MANUFMerge 10 bitmaps, yielding 1 bitmap

PRODUCTMerge 10 bitmaps, yielding 1 bitmap

STORE: Merge 20 bitmaps, yielding 1 bitmap

Step 3: Combine (‘AND’) bitmaps from Step 2, yielding the “final” bitmap that encompasses all selection criteria related to the Dimension tables.

Step 4: Fetch Fact table rows via Rowids, using the ‘final’ bitmap.

Step 5: Return to the Dimension tables to fetch requested fields not already included in Step 1 (i.e., fields other than the Primary key).

Reference: Jonathan Lewis, Practical Oracle 8i.