04-6 Slowly Changing Dimensions Reading

Another brief explanation of SCD. (VERY IMPORTANT to understand)

In the day-to-day OLTP system changes are often handled differently than in a Data warehousing system.

If the selling price of a product changes from 5.00 each in 2008 to 6.00 each in 2009, the selling price in the product table is updated to reflect the change.

Suppose that during the month of September in 2008 there were 3 orders of product 3295 sold.

Order 13987 sold 25 of product 3295 at $5.00 each for a total of $125.00

Order 13992 sold 30 of product 3295 at $5.00 each for a total of $150.00

Order 14523 sold 45 of product 3295 at $5.00 each for a total of $225.00

The entry in the fact table would be as follows

Time / Product-id / Units sold / $ Sales
Sep 2008 / 3295 / 100 / 500.00

In September 2009 there were 8 orders that happen to total 100 units, but sold for $6.00 each. The fact table entry would look like this.

Sep 2009 / 3295 / 100 / 600.00

If I were to compare the two years, even though the price changed in 2009 it did NOT affect the results stored for 2008. So if I do analysis in 2007, 2008 or 2009, the values for 2008 will remain the same. This is what should happen. NOTE: Very Important.

If I looked at the dimension table called product, there would not be an attribute called selling price.

Things like price would be involved in determining a measure. Also price changes frequently.

Let us look at the product dimension in the DW in 2008.

3294 / heavy tent / S
3295 / swim suit / S
3297 / sun glasses / S

In 2009 if there was no change the product dimension would look the same.

Suppose the product in 2008 was seasonal (S) and in 2009 it was sold all year round (Y). In the OLTP system the attribute would change from S to Y. The FACT table will look the same

Time / Product-id / Units sold / $ Sales
Sep 2008 / 3295 / 100 / 500.00
Sep 2009 / 3295 / 100 / 600.00

If in 2008 the President of the company asked for sales of seasonal items for the last 2 years (2007 and 2008), the SQL statement would include -- WHERE product type = ‘S’. The $500 sales from product 3295 would be included in the 2008 seasonal sales totals.

Suppose we do the same in the product dimension table as we did in the product table in the OLTP and change the status from S to Y at the end of 2008. The dimension table in 2009 would look like this.

3294 / heavy tent / S
3295 / swim suit / Y
3297 / sun glasses / S

If in 2009 the President again asks for seasonal sales for years 2007, 2008 and 2009, the sales figure for 2008in seasonal would be $500 less, because the product type now reads Y and not S. The WHERE clause just collects the “S” dataand shows it.

But wait a minute; you can’t make decisions on how your business is to operate if the numbers you get keep changing.

The problem of slowly changing attributes is an important problem to be resolved. Management decisions are made on the data produced. Success of the company depends upon good decisions made on good data.

Here is one way to resolve this problem.

By using a surrogate key, we can solve this inconsistency of results over time.

The product table in 2008 would look like this

173 / 3294 / heavy tent / S
174 / 3295 / swim suit / S
175 / 3297 / sun glasses / S

In 2009 the product table will have an extra row (see 299).

173 / 3294 / heavy tent / S
174 / 3295 / swim suit / S
175 / 3297 / sun glasses / S
etc …
etc …
299 / 3295 / swim suit / Y

The fact table now looks like this

Time / Product-surrogate-id / Units sold / $ Sales
Sep 2008 / 174 / 100 / 500.00
Sep 2009 / 299 / 100 / 600.00

Now if one totals the sales by years where product type is S, the correct values over time will occur. That means this method (of the 3 methods) for solving the “slowly changing” attribute, preserves history by splitting it. This method is very common. (See notes – on how to solve it and problems that still exist)

Extra Note: Product ID’s don’t change, product names get corrected they don’t change; otherwise they would be a different product. Store addresses essentially don’t change. It is rare for a store to move and usually it is the same store just moved and has no effect on analysis. In a customer dimension table for people the address might change, but it probably doesn’t affect anything that the company wants to analyze or the effect over 1000’s of customers is insignificant.

The kinds of things you are looking for are product types like seasonal, but you would not expect frozen bags of peas to change from product type called frozen vegetable to product type called dairy. Look for student type such as full time and part-time or marital status.

04-6-Reading-Slowly-changing-Good Short Explanation by rt -- 10/28/20181 of 3