OR474 – Fall 2002 – D. Ruppert

Exam 2

Solutions to In-Class Part

  1. A regression tree is used when the response is interval and a classification tree is used when the response is categorical.
  1. We would simply grow a right-sized tree if we knew how to do this, but unfortunately that seems impossible. The problems are: first, that we don’t at the start what will be a right-sized tree; second, that as the tree is being grown we can never know if later there will be a large improvement in the error rate if we continue to grow the tree; and, finally that the best tree may not be one of the trees in the sequence grown by node splitting. Pruning will lead to new, and perhaps better, trees.
  1. The two main advantages of decision trees compared to linear or logistic regression is that trees can model nonlinearities and interactions. The main disadvantage of decision trees is that they are often somewhat less accurate than regression models. Note: Some students answered that trees are easy to understand and interpret. While this is true, the same can be said of regression so it is not an advantage of tree relative to regression.
  1. The weakest link is the link that will give the smallest per node increase in RSS if pruned. Weakest-link pruning starts with the final tree produces by node split. The weakest link is found and the branch below that link is pruned. Then the weakest link of the new tree is found and the branch below is again pruned. This continues until only the root is left. Each of the trees produced during this process is a minimum cost-complexity tree. The minimum cost-complexity trees can be compared, say by RSS for validation data, and the best of them is used as the final tree.
  1. NAB = needs, attitudes, behaviors. Rosetta uses survey data to learn about the needs and attitudes of customers. This is combined with data about their behaviors. All three types of variables are combined to segment the customer base. Each customer segment receives a marketing campaign tailored to its needs, attitudes, and behaviors, which will be relatively homogeneous within each segment.
  1. P(Y= 1 and X=2) = .0125. P(X=2) = .4083. P(Y=1|X=2) = .0306.

Solution to take-home

Students had no trouble fitting logistic regression and tree models, so I will not go over this part of the exam. It is good practice to view the distributions of the predictor and target variables and to consider transformations if they exhibit skewness. Most students did this and found no skewness. I few students did not look at the distributions and lost a small amount of credit. Everyone noticed that the tree model had a much better misclassification rate than logistic regression. This suggests that there are nonlinearities and/or interactions. In such a situation, it is wise to look for nonlinearities or interactions involving the variables known to be significant in the regression or involved in splits in the tree. Interactions or quadratic terms in these variables can be added to the logistic regression model. In this example, if an interaction between x4 and x8 is added to the regression model then regression slightly outperforms the tree model, a reversible of what happens without the interaction term.