Q1: What is the business scenario being used here?

The business scenario being used in this model is post-paid phone subscribers.

Q2: What is the target variable, its type, and permissible values?

The target variable is “Churn,” or consumers who leave the phone service provider, either voluntarily (consumer decides to leave) or involuntarily (consumer is ordered to leave by provider due to failure to pay phone bill), also you can use the target variable of active or nonactive, which presents a whole new set of variables for data analysis. Types are credit classes, A, B, C, D or Missing based on the strength of their credit score. Not really sure what the phrase “permissible values” is, it’s not mentioned in our textbook even once, and I couldn’t find anything online in my research in order to figure out what that means. I assume it deals with values that are allowed because they are not variables but are significant because they help explain the results of the target value. Deposit and nondeposit for excellent credit holders may be a permissible value.

Q3: What is the percentage distribution of the target variable values in the training data set as a whole? The percentage distribution in the “training data set” is

I: 33%.

A: 33%.

V: 33%.

Q4: Which node is the root node, and what are the percentage distributions of the target variable?

The root node is the very top box labeled (target: futureChurnType), the percentage distributions are

I: 33%.

A: 33%.

V: 33%.

Q5: There are three boxes with a letter I, A, or V connected to the box via an arrow. What is distinctive about each of these three boxes, and what does the letter with the arrow pointing to the box signify?

The boxes that are a darker shade represent distributions that strongly differ from the training data set in the top (first) node. The letter by the arrows identifies the majority class within the leaves, so if I is at the arrow, then I is the majority class by a large margin due to the shade of the leaves. Each of the three boxes is darker in shade, meaning they are significantly different from the original node’s training data set, and it’s important to look at.

Q6: What is the name of each of the three boxes referred to in the previous question?

Leaves

Q7: What is the rule set for the three boxes labeled with a letter and an arrow referred to in the previous question? Make sure your answer is specific for each box.

I: TENURE <264.5 or Missing

A: ALREADYOFF <0.5

V: GOINGOFF >=0.5

Q8: On page 243, the author states: "After six months, 89.3% of subscribers are still active, 4.39% have left involuntarily, and 6.32% have left voluntarily." What were the corresponding distributions for the training set? Why do you think they were different? What are the implications for this difference?

Corresponding distributions for the training set are

Still Active (A): 23%.

Involuntarily (I): 60%.

Voluntarily (V): 17%.

They were different because the distribution of values in the original data is different from the distribution in the model set used to build the tree. If you apply probabilities to the decision tree rules (top node training data set), which is preclassified data, you actually get results closer to the truth rather than 60% you get 14% (I- 4.39%) leaving involuntarily, which is still 10% margin of error, but it’s not 55% margin of error. This is a far more accurate application process. The implications of this difference are that you will be generating inaccurate data for forecasting or predictive analytics and the real results will be far different. This will generate a lack of trust and confidence in people who have to make vital business decisions based on the decision tree results.

Q9: This organization is in a mature market, which means there are relatively few entities that do not already have a vendor supplying this product. The book says that in this type of market, organizations are concerned about churn. Why is that?

Organizations in this market understand that ultimately there are two paths traveled here in terms of the consumer. They are either happy and stay with their provider, or they are unhappy and will look at one of their current provider’s competitors for a new contract and future service (voluntarily). You can’t create new users, so the only way to increase your revenue is to entice consumers from other providers to leave them and come to your organization.

Q10: What did the organization learn, in your opinion, about churn that it can use from this activity?

The organization learned that contracts are very effective, and retaining consumers with their organization to avoid the fee for cancelling before the mature date is negatively affecting their credit score. People with good credit should have to sign contracts from the provider’s perspective, and this would be an effective tactic, because people with good credit scores do not want to see their credit score be impacted by walking away from the contract before it is completed. Credit Class C is not required to pay a deposit, but compared to other credit classes, they are most likely to walk away from a contract, so the recommendation is clearly to require people in Credit Class C to make a deposit.