Selection:
The goal of this case is to classify customer into several groups according to their monthly usage. In order to achieve this goal, we would use customer information, marketing data as well as phone call details information.
Data selected
Field Name / Table Name / DescriptionCustomerID / CDemographics / The ID of the customer
Sex / CDemographics / The sex of a customer
Date of birth / CDemographics / The date of birth of a customer
Education Level / CDemographics / The education level of a customer
PhoneNo / CDemographics / Phone number of the customer
Joining Date / CDemographics / Date of joining ABC telecom
Join From / CDemographics / Which telecom company does the customer previously joined
CustomerID / MarketingData / The ID of the customer in the MarketingData
Special Offer / MarketingData / Special Offer offered to the customer.
PhoneNo / PhoneCallDetails / The origin phone number which makes the call
StartTime / PhoneCallDetails / Time of making the call
EndTime / PhoneCallDetails / Time of ending the call
Preprocessing
Missing value
Field Name / Table Name / No. of missingrecord / Total no. of record in table / Percentage
Date of birth / CDemographics / 4 / 256 / 1.56%
Joining Date / CDemographics / 7 / 256 / 2.73%
There are total 10 records which missing “DateOfBirth” or “JoiningDate”. Since there is only a small portion (3.91%), we would ignore those records with missing field “DateOfBirth” or “JoiningDate”.
Inconsistent records
JoinForm in CDemographics table
There are totally 10 possible value of field “JoinForm” in CDemographics table. Some of the value is very close to each other. It is believed that these should be caused by data entry error. The field “JoinForm” would be corrected as shown in the following table.
Connection Table
Original Value / New ValueAnemal / Animal
Animal / Animal
Apple / Apple
FolishTnoe / FolishTone
FoolishTone / FoolishTone
Monday / Monday
OldWorld / OldWorld
OldWrold / OldWorld
Thre2One / Three2One
Three2One / Three2One
EducationLevel in CDemographics table
This field consists an unknown value “Sec” and there are total 23 records have this value. It is believed the value of this 23 records should be S (Secondary). The mapping between the original values and corrected values are shown in the following table.
Original Value / New ValueA / A
D / D
P / P
S / S
Sec / S
T / T
Ambiguous meaning
EducationLevel in CDemographics table
The group “D” (Degree) and “A” (Degree or above) are overlapped. The group “D” (Degree) would be merged into group “A” (Degree or above).
Transformation – Initial Stage
Summering Data
Age:
The age is derived from the “Dateofbirth” field in the CDemographics table.
The age would be calculated by the following formula:
Age = (timestampdiff(MONTH, dateofbirth, now()) / 12)
The unit of this field is year. Please note that the above formula is a portion of the resulting SQL statement.
No. of joined month:
The field no. of joined month is derived from “joiningDate” field in the CDemographics table.
The age would be calculated by the following formula:
No. of joined month = timestampdiff(MONTH, JoiningDate, now())
The unit of this field is month. Please note that the above formula is a portion of the resulting SQL statement.
Total air time of a customer
The field TotalAirTime is derived from summing up all the difference of starttime and endtime in the phonecalldetails. The total air time is calculated by a view created in the database. The name of the view is called “PhoneCallDetails2”. The unit of this field is minute.
Average air time per month of a customer
This field is derived from the total air time of a customer and the no. of joined month of a customer. The average air time per month is calculated by using the following formula:
Average air time per month = Total Air Time / No. of joined month.
The unit of this field is minute. Please note that the above formula is a portion of the resulting SQL statement.
Special Offer
There are three fields about Special Offer would be created. Their name is A_SPECIAL_OFFER, B_SPECIAL_OFFER, C_SPECIAL_OFFER. A view “MarketingData2” is created in order to retrieved these three fields. Basically, if a customer has a record in marketingData table and the value of “SpecialOffer” is “A”, then the derived field A_SPECIAL_OFFER would be ‘Y’. Similar logic is applied to fields B_SPECIAL_OFFER and C_SPECIAL_OFFER.
Binning
Age
Initially we would divide this field into 3 bins by using fixed width binning method.
Bin / From / ToBin 1 / 20 / 31.2
Bin 2 / 31.2 / 42.4
Bin 3 / 42.4 / 53.6
Bin 4 / 53.6 / 64.8
Bin 5 / 64.8 / 76
No. of joined month:
Initial we would divide this field into 6 bins by using fixed width binning method.
Bin / From / ToBin 1 / 28 / 37.8333
Bin 2 / 37.8333 / 47.6666
Bin 3 / 47.6666 / 57.5
Bin 4 / 57.5 / 67.3333
Bin 5 / 67.3333 / 77.1666
Bin 6 / 77.1666 / 87
Average air time per month per customer
Initial we would divide this field into 3 bins by using fixed width binning method.
Bin / From / ToBin 1 / 19.3867 / 63.1582
Bin 2 / 63.1582 / 106.9297
Bin 3 / 106.9297 / 150.7011
Data mining – Initial Stage
We would like to use C5 algorithm to build a decision tree for classifying customer average monthly air time into several groups. The target of the C5 algorithm is set to air time bin.
Parameters
Parameter Name / DescriptionSex / Sex of the customer
A_SpecialOffer / Indicator for Customer having Special Offer A.
If customer has special offer a, the value of this field is Y, else N
B_SpecialOffer / Indicator for Customer having Special Offer B.
If customer has special offer b, the value of this field is Y, else N
C_SpecialOffer / Indicator for Customer having Special Offer C.
If customer has special offer c, the value of this field is Y, else N
JoinFrom / Which telecom company does the customer using before joining ABC telecom.
CorrectedEducation / The education level of customer.
It consists of only four possible values.
P – Primary School
S – Secondary School
T – Tertiary Education
A – Degree or above
Air_Time_Per_Month_BIN / The binned value of the customer average monthly air time.
Age_Bin / The binned value of the customer age
Joined_Bin / The binned value of the no. of month does the customer joined ABC telecom.
Setting of Clementine
SQL statement provided to Clementine:
SELECT
Sex,
(timestampdiff(MONTH, dateofbirth, now()) / 12) as age,
EducationLevel,
(TotalAirTime / timestampdiff(MONTH, JoiningDate, now())) as air_time_per_month,
timestampdiff(MONTH, JoiningDate, now()) as joined,
JoinFrom,
A_SPECIAL_OFFER,
B_SPECIAL_OFFER,
C_SPECIAL_OFFER
FROM cdemographics, MarketingData2, phonecalldetails2
WHERE cdemographics.customerid = marketingdata2.customerid
AND cdemographics.customerid = phonecalldetails2.customerid
Decision tree generated of 1st run:
We use the original data set for generating the decision tree to check the correctness of the resulting tree.
Test result of 1st run:
No. of Record / PercentageCorrect / 200 / 81.3%
Incorrect / 46 / 18.7%
Evaluation – 1st Stage
The generated decision tree looks good. We tried to modify the transformation process to see whether the correctness of the discussion tree increase.
Transformation – 2nd stage
In this stage, all fields transformation would remain unchanged except the binning of average air time per month of customer will be modified.
Now, the average air time per month of customer would be divided into 4 bins as shown in the following table.
Bin / From / ToBin 1 / 19.3867 / 52.2153
Bin 2 / 52.2153 / 85.0439
Bin 3 / 85.0439 / 117.8725
Bin 4 / 117.8725 / 150.7011
Data mining – 2nd Stage
The setting of Clementine and the SQL statement provided to Clementine remains unchanged.
Decision tree generated of 2nd run:
Test result of 2nd run:
Correct / 194 / 78.84%
Incorrect / 52 / 21.14%
Evaluation – 2nd Stage
The correctness of the generated decision tree decrease, we keep the original binning method of average monthly air time of customer and try to modify the transformation process in order to generate a better decision tree.
Transformation – 3rd Stage
In this stage, the number of bin of number of month of a customer joined ABC telecom decrease to 4. The following table shown how average monthly air time of customer binned. For the number of bin of average monthly air time of a customer decrease to 3 that is same as the value of 1st stage as the correctness of the generated decision is not as high as 1st stage.
Binning table of average monthly air time of a customer
Bin / From / ToBin 1 / 19.3867 / 63.1582
Bin 2 / 63.1582 / 106.9297
Bin 3 / 106.9297 / 150.7011
Binning table of number of month joined ABC telecom
Bin / From / ToBin 1 / 28 / 42.75
Bin 2 / 42.75 / 57.5
Bin 3 / 57.5 / 72.25
Bin 4 / 72.25 / 87
Data mining – 3rd Stage
The setting of Clementine and the SQL statement provided to Clementine remains unchanged.
Result of the generated tree
Test result of 3rd stage:
No. of Record / PercentageCorrect / 196 / 79.67%
Incorrect / 50 / 20.33%
Evaluation – 3rd Stage
After comparing the data mining result of these 3 stages, the result of stage one have highest correctness and that should be the result we looking for. The other reason of selecting this decision tree as the result is that this generated result tree covers all 3 possible class labels while the other 2 tree did not. Furthermore, the decision tree generated in the 1st stage is much more logical and easily explained.
Final Decision Tree:
Interpretation
First of all, a mapping from the binned continuous to a logical name would be taken place.
Mapping of number of month joined ABC telecom
Bin / From / To / Logical NameBin 1 / 28 / 37.8333 / Joined 2 to 3 years
Bin 2 / 37.8333 / 47.6666 / Joined 3 to 4 years
Bin 3 / 47.6666 / 57.5 / Joined 4 to 5 years
Bin 4 / 57.5 / 67.3333 / Joined 5 to 6 years
Bin 5 / 67.3333 / 77.1666 / Joined 6 to 7 years
Bin 6 / 77.1666 / 87 / Joined 7 to 8 years
Mapping of age of customer
Bin / From / To / Logic NameBin 1 / 20 / 31.2 / Very Young Customer
Bin 2 / 31.2 / 42.4 / Young customer
Bin 3 / 42.4 / 53.6 / Middle Aged Customer
Bin 4 / 53.6 / 64.8 / Old Customer
Bin 5 / 64.8 / 76 / Very Old Customer
Mapping of age of average monthly air time of a customer
Bin / From / To / Logical NameBin 1 / 19.3867 / 63.1582 / Low Usage
Bin 2 / 63.1582 / 106.9297 / Normal Usage
Bin 3 / 106.9297 / 150.7011 / High Usage
Rules generated after apply logical name of binned value:
Rule 1 / If it is a customer joined for 2 to 3 years and the customer has discount purchase of phone, then his usage would be highRule 2 / If it is a customer joined for 2 to 3 years and the customer does not have discount purchase of phone, then his usage would be normal
Rule 3 / If it is a customer joined for 3 to 4 years then his usage would be normal
Rule 4 / If it is a customer joined for 4 to 5 years and the customer has free MTR, Tunnel and license fee, then his usage would be normal
Rule 5 / If it is a customer joined for 4 to 5 years and the customer has free MTR, Tunnel and license fee, then his usage would be low
Rule 6 / If it is a customer joined for 5 to 6 years then his usage would be low
Rule 7 / If it is a customer joined for 6 to 7 years then his usage would be low
Rule 8 / If it is a customer joined for 7 to 8 years then his usage would be low
From the above rules, a trend could be discovered. That is the longer a customer joined ABC telecom, the lower the air time usage the customer has. Moreover, by comparing rule 1 to 2 and rule 4 to 5, we could see that discount purchase of phone and free MTR, Tunnel and license fee are very successfully. These two special offers could successfully increase the monthly usage of a customer. On the other hand, offering free minutes to customers is not very success. We can not see any customer behavior would be changed because of offered extra free minutes.
The following suggestions are made based on above finding in order to increase monthly usage of customer:
1. Offer discount on phone purchase to those customers who joined ABC telecom for 2 to 3 years.
2. Offer free MTR, tunnel and License fee to those customers who joined ABC telecom for 4 to 5 years.
3. We don’t have to offer special offer to those customers joined ABC telecom for more than 5 years.
4. Whenever a customer requests for special offer, ABC telecom should offer “discount on phone purchase” and “MTR, Tunnel and License fee” instead of offering “Extra free minutes”.