Aggregating Support Vector Models for Cross-Selling Problem

Aggregating Support Vector Models for Prediction of Cross-Selling Problem

Si Jie Phua

School of Computer Engineering, Nanyang Technological University, Singapore

1 Introduction

This paper aims to demonstrate a methodology to find better solutions for a cross-selling business problem [1]. The dataset is obtained from PAKDD competition 2007. I have used an ensemble classifier method with Support Vector Machine (SVM) [2, 3, 4] as base classifier to obtain faster training speed and accurate results.

This paper is organized as follows: Section 2 lists the modifications made on the dataset before applying them for training and prediction. Section 3 briefly explains the support vector machine (SVM) and ensemble classifier. Section 4 gives the parameters used to train the model. Section 5 concludes the results with discussion on the business insight.

2 Data Preparation

Modifications are done on following attributes before training and prediction in order to fit the data mining technique used:

ANNUAL_INCOME_RANGE: “0K -< 30K”, “30K -< 90K”, “90K -< 150K”, “150K -< 240K”, “240K -< 360K”, “360K+” is changed to 1, 2, 3, 4, 5, 6 correspondingly. This modification intends to represent the ordinal property of this attribute.
DISP_INCOME_CODE: A-E is changed to 1-5 accordingly to represent the ordinal property of the attribute.
All attributes related to bureau except “B_DEF_PAID_IND” and “B_DEF_UNPD_IND”: 98 is changed to -1 and 99 is changed to -2 so that the contact frequencies of customer contact with bureau are better represented.

3 Modeling Technique

The training dataset is large and the class distribution is unbalanced. Thus, we propose to train a few models of Support Vector Machine (SVM) where each model is trained by a portion of training dataset, and then classify new instances using the trained models with voting scheme.

SVM is developed on the concept of decision planes that define decision boundaries. A decision plane separates a set of objects that have different class memberships. SVMconstructs a hyperplane as the decision plane, which separates the positive and negative samples with the maximized margin. Among the training technique of support vector machine, sequential minimal optimization (SMO) [5] algorithm in YALE [6] is used as it can handle both numerical and nominal attributes. SVM usually has nice generalization abilities due to its margin maximization strategy. However, the training speed of SVM usually increases tremendously with the size of dataset.

Thus, we partition the training dataset to train a few models of SVM so that the training speed is faster. We have constructed 9 training datasets from the original training dataset to train 9 models of SVM. Each training dataset contains all the positive instances and equal amount of negative instances. By this way, we can reduce the effect of unbalanced class distribution.

For the purpose of prediction, we combine the outputs of several classifiers with voting scheme. This type of classifier is popularly known as ensemble classifier [7]. This classifier generally has low variance and thus has better generalization abilities than individual classifiers.

With the strategies of sampling, ensemble classifier and SVM, we aim to provide a good prediction on cross-selling problems.

4 Summary of Scoring Model Results

The only important parameter is the cost parameter of SVM. We have set it to 1 for faster training speed.

5 Discussion

By applying the trained model on the test dataset to be predicted, we get the following results listed in Table 1:

Score / Number of Instances / Percentage
0 / 704 / 8.8%
1 / 1811 / 22.7%
2 / 2733 / 34.2%
3 / 910 / 11.4%
4 / 543 / 6.8%
5 / 375 / 4.7%
6 / 305 / 3.8%
7 / 379 / 4.7%
8 / 168 / 2.1%
9 / 72 / 0.9%

Table 1: Prediction of test dataset

The higher the score, the more likely the customer will sign up for a new home loan with company within 12 months of opening the credit card account. As a result, the company should focus more on the customers that have high scores in order to maximize their profits.

References:

[1] PAKDD Competition 2007,

[2] C. C. Burges, “A tutorial on support vector machines for pattern recognition,” Proc. Int. Conf. Data Mining and Knowledge Discovery, Vol.2, 1998, pp. 121-167.

[3] Hyeran Byun, Seong-Whan Lee, “A survey on pattern recognition applications of support vector machines,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 17, No. 3, 2003, pp. 459-486.

[4] Marti A. Hearst, “Support vector machines,” IEEE Intelligent Systems, July/August 1998, pp. 18-21.

[5]J. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines,” Microsoft Research Technical Report MSR-TR-98-14, (1998).[4]

[6] YALE (Yet Another Learning Environment), .

[7] Classifier Ensembles, .