MBA 8473 DATA MINING – IN CLASS WORKSHEET

PROBLEM DOMAIN – MARKET BASKET ANALYSIS

GOAL:

  1. To understand data processing for data mining. The example that we will use involves converting Point of Sales data into a useful “count” form so that it can be used for further data mining.
  1. To learn to locate patterns that can be expressed as rules – by simple inspection or manual computation.
  1. To understand the support level for a rule by the data set.
  1. To compute the significance of a rule.
  1. To learn to do above 1,2,3,4 for a larger number of data points where we need the help of DM tools.

DO:

  1. Use the Tables below. Fill in the co-incidence matrix in Table 2 by counting data from Table 1. You are looking for how many times one item (e.g. orange juice) occurred with another item (e.g. soda).
  1. To locate patterns, look for relative frequency of co-incidence of items. Mostly extreme frequencies (high or low) are the places to start with, and see what can interpreted from them. Start writing them down as a rule. (While doing this ignore the diagonal frequencies).

An example pattern can be:

Orange juice and soda are more likely to be purchased together than any other two items.

Expressed as a rule:

If a customer purchases soda, then the customer also purchases orange juice.

  1. How well is this rule supported by the data? (i.e. in how many transactions from the total number of transactions this happens). Generally expressed a percentage.
  1. How about the confidence in the rule? Confidence is defined as the ratio of the number of the transactions supporting the rule to the number of transactions where the conditional part (i.e. the IF part) of the rule holds. What’s the confidence in the following rules:

CONFIDENCE

If soda, then orange juice.

If orange juice, then soda.

Additional discussion:

The above rules are, in general, of the form IF A THEN B.

One can also look for, IF A THEN NOT B type rules.

How do we find them? Start by looking at zero frequencies in the co-incidence matrix. Did you find any?

As a rule: If a customer purchases milk, then the customer does not purchase soda.

Then there are rules that have the form:

IF A and B, then C

IF A and C, Then B

If B and C, Then A.

For example, IF diapers AND Thursday, then beer

Table 1: Grocery Point-of-Sale Transactions

Customer / Items
1 / Orange juice, soda
2 / Milk, orange juice, window cleaner
3 / Milk, detergent
4 / Orange juice, detergent, soda
5 / Window cleaner, soda

Table 2: Co-Occurrence of Products

Orange Juice / Window Cleaner / Milk / Soda / Detergent
Orange Juice
Window Cleaner
Milk
Soda

Detergent