A Transaction Pattern Analysis System Based on Neural Network

A Transaction Pattern Analysis System Based on Neural Network

Tzu-Chuen Lu* and Kun-Yi Wu

Department of Information Management,

ChaoyangUniversity of Technology, Taichung41349, Taiwan, R.O.C.

E-mail:,

Correspondence address:

Tzu-Chuen Lu

Department of Information Management,
Chaoyang University of Technology,
168, Jifong East Road, Wufong Township,
Taichung County 41349, Taiwan (R.O.C.)
FAX: +886-4-23742337
E-mail:
URL:

A Transaction Pattern Analysis System Based on Neural Network

Abstract

Customer segmentation is a key element for target marketing or market segmentation. Although there have been quite a lot of ways available for segmentation today, most of them emphasize numeric calculation instead of commercial goals. In this study, we propose an improved segmentation method called Transaction Pattern based Customer Segmentation with Neural Network (TPCSNN) based on customer’s historical transaction patterns. First of all, it filters transaction data from database for records with typical patterns. Next, it reduces inter-group correlation coefficient and increases inner cluster density to achieve customer segmentation by iterative calculation. Then, it utilizes neural network to dig patterns of consumptive behaviors. The results can be used to segmentnew customers. By this way, customer segmentation can be implemented in very short time and costs little. Furthermore, the results of segmentation are also analyzed and explained in this study.

Keywords: market segmentation, clustering, customer segmentation, association rule mining, neural network

Introduction

Consumer market changes rapidly without any settled logic. In most occasions, you can find all kinds of demands in it. Customers’ requirement will never be satisfied merely by one or two products. However, excessive products can be a burden or risk to the company’s operation [9, 13, 17, 19]. Therefore, in order to satisfy various customer requirements within company’s capacity, we need to split consumer market into several segmentations and find out appropriate marketing strategies for them [3, 4, 5, 6, 14, 15].

The spirit of strategic marketing proposed by Kotler is STP: Segmentation, Targeting, and Positioning[14, 15]. Based on some customer diversities, the complicated market in the reality can be separated into several small markets with similar properties. Among them, the companies can find their target markets and the positions. Such strategy is a golden rule even in today’s business. In recent years, Kotler proposed a new brand marketing modeCreate Communicate Deliver Value Target Profit (CCDVTP). It tries to create new communication tunnels and deliver brand values. Then it conducts marketing with specific targets, and finally achieves profits. To stipulate for a marketing strategy, market segmentation is the first step [14, 15].

Today, there are a lot of market segmentation methods available, but most of them are based on existing segmentation methods, such as K-means, Density-based Spatial Clustering of Applications with Noise (DBSCAN) and so on [1, 2, 9, 7, 8, 10, 11, 16, 18]. The users have to choose an appropriate clustering method based on the goals to be resolved or the characteristics of the database. After the decision, the users need to find suitable parameters for the clustering. It requires the users to be very familiar with the problem or the data characteristics to find the parameters and obtain the optimized result. As a matter of fact, this is a mission impossible for normal companies. Therefore, in this study, we try to propose an easy and understandable method for normal users, which can implement the segmentation rapidly and correctly. By this way, the business operation can be supported by the theory. Besides, most of existing methods are not designed for business purpose. As a result, some adjustments to the data are required during the application. Such adjustment may make the results away from the target problems.

In 2004 ChangChien and Kuo proposed a customer segmentation method called Transaction Pattern based Customer Segmentation (TPCS) [3]. Their customer segmentation method covers both marketing and business purposes.They utilize customers’ historical transaction data and group the customers by similar transaction patterns. Meanwhile, they put marketing and business purposes into consideration. However, their method is incapable of analyzing or explaining the segmentation results. In this study, we try to improve the TPCS method and add extraction mechanics for transaction data to the original method, analyze the segmentation result. Furthermore, neural network technology is adopted in our improvement. It enables the users to obtain new customer segmentation quickly.

Here is a brief structure of this paper.Section 2 is a literature review, which introduces TPCS method and other relevant technologies. In Section 3, improved segmentation method proposed by us is introduced. In Section 4, we try to evaluate the improvements by simulation and actual data. The evaluation can be validated by the real system. Section 5 is a summary.

Related Works

2.1 Transaction Pattern based Customer Segmentation (TPCS)

TPCS looks for the patterns of consumer behaviors from the transaction data. The segmentation can be adjusted by the customer weighing and become customer oriented. This function avoids some single customer with transaction data in different segmentations. Then, the segmentation correlation matrix (SCM) is produced based on consumer behavior patterns and correlation coefficient to demonstrate the intersection degree among the segmentations. The segmentation can be further adjusted to achieve lowest inter-segmentation correlation. After that, a density indicator is added to measure inner cluster correlation. By reducing inter-segmentation correlation and enhancing inner cluster density, customer segmentation can be achieved using a combination of merging and separation strategies. However, this method has two limitations:

During transaction rules mining, attentions are paid to items purchased instead of amount.
Transaction is a collection of items purchased. It’s not connected with purchasing order. It also includes the situation of purchasing single item.

We are going to explain the steps of TPCS segmentation by 7 transaction records which with 4 customers and 4 different items. Table 1(a) is the original transaction data. In column Item, commodities purchased by each customer are listed. A, B, C, and D are item codes. TID is transaction number. Above all, we convert this table into Table 1(b). 1 stands for yes and 0 stands for no. Minimum support is adopted at this time to get rid of rare items.

Table 1.The transaction records and the converted data

(a)original transaction data

TID / CustomerID / Item
1 / Alpha / A, B, D
2 / Beta / A, B
3 / Charlie / C, D
4 / Delta / B, C, D
5 / Alpha / A, B, C
6 / Beta / A, B, D
7 / Delta / C, D

(b)Converted data

Item
TID / A / B / C / D
1 / 1 / 1 / 0 / 1
2 / 1 / 1 / 0 / 0
3 / 0 / 0 / 1 / 1
4 / 0 / 1 / 1 / 1
5 / 1 / 1 / 1 / 0
6 / 1 / 1 / 0 / 1
7 / 0 / 0 / 1 / 1
Count / 4 / 5 / 4 / 5

Assume that the customers can be separated into cluster X and cluster Y, as shown in Table 2, we can calculate the customers’ cluster correlation matrix for these two segmentations using expected value and probability of the rules together with permutation and combination of the four items, as shown in Table 3. Besides, if we assume that the profit for each product is 1, the in the table represents the occurrence probability of rule in cluster X. is the expected value. For example,rule 12 is 1100, it represents that the occurrence probability of transactions purchasing both A and B in cluster X is 1/7, the expected value is 2/7.

Table 2.Initial segmentation results

(a) Cluster X

Item
TID / A / B / C / D
1 / 1 / 1 / 0 / 1
2 / 1 / 1 / 0 / 0
3 / 1 / 1 / 1 / 0
4 / 1 / 1 / 0 / 1

(b) Cluster Y

Item
TID / A / B / C / D
1 / 0 / 0 / 1 / 1
2 / 0 / 1 / 1 / 1
3 / 0 / 0 / 1 / 1

Table 3. Customer Correlation Matrix

Cluster
Rule / Cluster X / Cluster Y
1 / 0001 / 0 / 0 / 0 / 0
2 / 0010 / 0 / 0 / 0 / 0
3 / 0011 / 0 / 0 / 2/7 / 4/7
4 / 0100 / 0 / 0 / 0 / 0
5 / 0101 / 0 / 0 / 0 / 0
6 / 0110 / 0 / 0 / 0 / 0
7 / 0111 / 0 / 0 / 1/7 / 3/7
8 / 1000 / 0 / 0 / 0 / 0
9 / 1001 / 0 / 0 / 0 / 0
10 / 1010 / 0 / 0 / 0 / 0
11 / 1011 / 0 / 0 / 0 / 0
12 / 1100 / 1/7 / 2/7 / 0 / 0
13 / 1101 / 2/7 / 6/7 / 0 / 0
14 / 1110 / 1/7 / 3/7 / 0 / 0
15 / 1111 / 0 / 0 / 0 / 0
Correlation Coefficient (CC) / -0.1721

Then, we go ahead with inner cluster density calculation. The number of data records is divided by the sum of distances between each transaction record and the cluster centroid. The result can be used as a density indicator for inner cluster compactness. The distances are calculated as Euclidean distance. We can calculate the centroid for every cluster from Table 2. Table 4 shows the results.

Table 4.Centroid list

(a) Cluster X

ITEM / Count / Mean
A / 4 / 1
B / 4 / 1
C / 1 / 1/4
D / 2 / 1/2

(b) Cluster Y

ITEM / Count / Mean
A / 0 / 0
B / 1 / 1/3
C / 3 / 1
D / 3 / 1

From Table 4, we can find that the density of cluster X is 2.2857 and the density of cluster Y is 4.4998. The effect of two clusters is 42.4953. This algorithm pairs every two clusters and look for the combination with worst effect for adjustment. If all possible combinations have been tested and none of them is better than the original cluster, the segmentation is finished.

2.2 Back-Propagation Network (BPN)

Although the segmentation can be implemented by TPCS effectively, this method has no capability of explaining. Therefore, the BPN is adopted in the analysis and learning of segmentation [12, 13]. BPN is a kind of supervised learning network and is widely used in many occasions [20]. It has very good performance in diagnosis, prediction, classification, and optimization. BPN has three layers: input layer, hidden layer, and output layer. Each layer contains several processing units. The input layer is for feeding data and output layer is for sending results. There may be several hidden layers between the input and output layers. They are used to represent the interactions among input processing units. Today, BPN is the most widely adopted network. It has a lot of successful stories, high learning precision, and fast recalling.

The Proposed Method

In this study, we are going to propose an improved method named Transaction Pattern based Customer Segmentation with Neural Network (TPCSNN). Following goals are expected to be achieved.

Minimized inter cluster correlation.The correlation can be calculated by the cluster correlation matrix (CCM). Segmentation can be achieved by reducing inter cluster correlation.
Maximized inner cluster correlation. Euclidean distance is used to generate a density indicator, in order to improve the incompact structure in the cluster.
A fast responding system.Normal algorithms consume a great deal of time in the segmentation operation. In this study, neural network technology is adopted to find out the best cluster for the new customer.

The major steps are shown below.

Extract customer data.
Initial segmentation: transaction records with equivalent patterns are classified into one cluster.
Adjust the segmentation to be customer oriented: the initial segmentation in Step 2 is based on transaction patterns. As a result, transaction records under same customer may be grouped into different cluster. The clusters should be converted to be customer oriented to ensure that every customer belongs to a single cluster.
Transaction rule mining on all clusters: transaction patterns can be dug from each cluster. These rules can be used to represent consumer properties in the cluster.
Inter cluster correlation calculation: cluster correlation matrix can be generated using the correlation coefficient among the clusters.
Inner cluster density calculation: divide the number of total transaction records in the cluster by the sum of distances between centroid and all transaction data. The result can be used to evaluate the compactness in the cluster.
Evaluating clustering: clusters are paired freely to evaluate the clustering. The worst cluster pair is picked as improvement candidate.
Improvement: through three segmentation adjustment strategies, we can find a best improvement method. If none of the strategies works, go back to Step 7 for second worst pair for improvement. If no improvement is possible, repeat this operation until all combinations are tested. If there is some improvement, go back to Step 3 and repeat the calculation.
Result analysis: an interface can be provided to the user for analyzing inter-cluster or inner cluster transaction situation.
Building BPN model: the segmentation results generated by the above steps can be used to train the BPN with capability of rapid segmentation.

3.1 Extraction of customers’ transaction data

Let’s take small to medium businesses as examples. Generally, a lot of products are being sold and thousands of customers take part in the transactions. Therefore, we will not include all items in the analysis. We put our focuses on the important products with top purchases. The segmentation is also aiming at the customers that may purchase these products. Thus, the company can conduct further classification and marketing for the customers that may purchase popular items.

During the import of Enterprise Resource Planning (ERP) system, product numbers are established for some services or expenses just for convenience, such as installation and delivery expenses, gifts, and rent. These products are not purchased by the customers on their own initiatives. As a matter of fact, they are attached by the manufacturers. These products should be removed when filtering the data. Besides, the product numbers have different leading codes indicating their types. This filtering condition can be provided to the users. Figure 1 shows the conditions of data extraction.

Figure 1.Screenshot of ERP data filtering

3.2 Initial segmentation

In this stage, we convert the customers’ transaction records into a format required by the system. Transaction data with equivalent rules are merged into same cluster. At this time, we don’t care about the owner of the transaction. Table 1 demonstrates the transformation.

3.3 Adjust the segmentation to be customer oriented

In the previous stage, same customer can be put into different clusters. This conflicts with our purpose. To ensure that all of the records under same customer are put into the same cluster, we determine the master cluster by the transaction ratio. If there are more than two clusters sharing highest percentage, benefits will be considered. Currently, the first cluster will be taken as master cluster. Table 5 demonstrates how the choosing works. In this table, the probabilities of classifying transaction records of customer Alpha into cluster 1,2,3, and 4 are 50%, 25%, 12.5%, and 12.5% respectively. Therefore, cluster 1 becomes the master cluster for customer Alpha.

Table 5.Choosing mater cluster

Customer / Cluster / Transaction ID / Probability / Master cluster
Alpha / 1 / 3,4,5,6 / 50% / 
2 / 2,3 / 25%
3 / 1 / 12.5%
4 / 5 / 12.5%

3.4 Transaction rule mining on the clusters

Say all of the transaction data can be represented as , and the products purchased in each record can be represented as . Here, ,. Item number is represented by . If =1, it indicates that product is purchased in the current transaction. If =0, it indicates that product is not purchased in the current transaction. For example, there are four kinds of available products A, B, C, and D. A transaction record expression 1101 indicates that A, B, and D are purchased in that transaction.

After all of the transaction data have been converted, we are going to calculate the vertical sum for each product, as shown in Table 1(b). At this time, minimum support is added into the judgment. If the minimum support is 0.6 and there are 7 transactions, it indicates that the single product should have more than 4.2 occurrences before being included in the calculation. As a result, product A and C listed in Table 1 (b) will never be included in the calculation. After removal of products with low occurrence, we are going to find out the transaction patterns based on the remaining products.

In this study, we ask the users to choose products with top sales for analysis in the stage of customer data extraction. Since the selection is based on the subjective judgment, minimum support is not needed.

3.5 Inter-cluster correlation

Correlation coefficient is a statistical method which is used to demonstrate the linear relationship between two variables and determine their compactness. Closer its absolute value is to 1, closer the correlation between two clusters will be. In order to achieve low inter-cluster correlation, lower the correlation coefficient, the better. We utilize expected profit of each transaction pattern to calculate the correlation coefficient between the two clusters and present it by cluster correlation matrix. We represent the correlation coefficients of cluster X and Y by . The calculation formula is

.(1)

The in the Equation 1 is the product number to be analyzed. and are expected profits for different transaction patterns . and are average expected profits for different transaction patterns .

In order to calculate the expected profit for the transaction pattern, we also define the occurrence probabilities for the transaction patterns in

.(2)

Here, represents the actual amount purchased satisfying the specific pattern. is total records in the transaction database. is the transaction probability of pattern in cluster X. As shown in Table 3, the transaction pattern 1101 has two occurrences in cluster X. The total number of transactions is 7. Therefore, the occurrence probability of this transaction pattern in cluster X is 0.2857. Then, we utilize

(3)

to calculate the profits for this transaction pattern. Here, we put our emphasis on the purchase pattern. Therefore, the profit can be ignored. We can assume the profits for all products are 1. In the equation, represents whether there is purchase for item , Pattern , under cluster X. It’s indicated by 0 or 1.

For the transaction pattern 1101 listed in Table 3, the expected profit is 0.8571, since (110.2857)+ (110.2857)+ (100.2857)+ (110.2857) = 0.8571. The average expected profit for cluster X is the sum of all expected profits divided by the number of patterns. For the cluster X listed in Table 3, the average expected profit is 0.1048, since .

3.6 Calculation of inner cluster density

In most occasions, the number of products with historical transactions is always less than the total number of company’s products. For example, we analyze 50 different company’s products. However, less than 10 products are valid for each transaction record. Therefore, we define the density

(4)

as the number of items in the unit distance. This formula utilizes Euclidean distance to calculate the total distance between each transaction and the centroid. As shown in Equation 5, less the distance, similar the cluster members will be.

,(5)

where represents the product in each transaction,