Performance Analysis of Back Propagation Neural Network for Internet Traffic Classification

Performance Analysis of Back Propagation Neural Network for Internet Traffic Classification

Kuldeep Singh#1, Sunil Agrawal#2

#University Institute of Engineering & Technology. , Panjab University, Chandigarh (India)-160014

Abstract-With rapid increase in internet usage over last few years, the area of internet traffic classification has advanced to large extent due to a dramatic increase in the number and variety of applications running over the internet. These applications include www, e-mail, P2P, multimedia, FTP applications, Games etc. The diminished effectiveness of traditional port number based and payload based direct packet inspection internet traffic classification techniques motivate us to classify internet traffic into various application categories using Machine Learning (ML) techniques.Neural Networks are also one of the important ML techniques. In this paper, Back Propagation Neural Networks (BPNN) are employed for internet traffic classification which is a type of multilayer feed forward neural network. In this research work, Performance of BPNN is analysed based upon accuracy, recall, number of hidden layer neurons and training time of network using large feature datasets and reduced feature data sets. This experimental analysis shows that BPNN is an efficient technique for internet traffic classification for reduced feature data setsalso.

Keywords- Internet traffic classification, Back propagation neural network, Machine Learning, Accuracy, Recall,Training Time, Features.

INTRODUCTION

The demand for Internet Traffic Classification that optimizes network performance by solving difficult network management problems for Internet Service Providers (ISPs)and provides quality-of-service (QoS) guarantees has increased substantially in recentyears, in part, due to the phenomenal growth of bandwidth-hungry application [1], [2]. Variety of applications running over internet are www, e-mail, p2p, multimedia, FTP applications, interactive services, Games etc which lead to rapid increase in internet traffic. Classification of this internet traffic is necessary in order to solve ISPs network management and monitoring problems such as available bandwidth planning and provisioning, measure of QoS, identification of customer’s use of particular application for billing, detection of indicators of denial of service attacks and any severe problem degrading the performance of network etc. Now a day, it is also being utilized by various governmental intelligence agencies from security point of view.

Internet Traffic classification can be either offline or online. In online classification, analysis is performed while data packets flowing through the network are captured; but in case of offline classification technique, firstly data traces are captured and stored and then analysed later [2]. Traditionally, various internet traffic classification techniques have been based upon direct inspection of packets flowing through the network [1]. These techniques are payload based and port number based packet inspection techniques. In payload based technique , payload of few TCP/IP packets are analysed in order to find type of application which is not possible today because of use of cryptographic techniques used to encrypt data in packet payload and privacy policies of governments which do not allow any unaffiliated third party to inspect each packets payload. In port number based packet inspection technique, well-known port numbers are provided in header of IP packets which are reserved by IANA (Internet Assigned Numbers Authority) for particular applications e.g. port number 80 is reserved for web based applications. Unfortunately, this method also becomes ineffective due to the use of Dynamic port numbers instead of Well-known port numbers for various applications.

The diminished effectiveness of traditional port number based and payload based direct packet inspection internet traffic classification techniques motivate us to classify internet traffic into various application categories using Machine Learning (ML) techniqueswhich are based upon supervised and unsupervised learning techniques [1]. Neural Networks which are a massively parallel distributed network consisting of number of information processing units (Neurons) inspired by the way human brain work, also comes under the category of ML techniques.

This paper is based upon use of Back Propagation Neural Network (BPNN) for internet traffic classification [10], [11]. BPNN is a type of supervised multilayer feed forward neural network which is based upon backward flow of error signal between actual output and desired output in order to update weights during training process. In this paper, performance of BPNN is analysed for two different training and testing data sets having different number of features contained in input samples. Performance of this network is evaluated on the basis of accuracy, recall, number of hidden layer neurons and training time [7], [8], [9]. This paper shows that as the back propagation neural network gives good classification accuracy even by reducing the number of features of input samples in training and testing data sets to much extent and it also leads to reduced complexity and reduction in training time of BPNN.

The rest of the paper is organized as follows: Section II gives introductory information about BPNN for readers who are new to this field. Section III reveals certain information about data sets of internet traffic. Implementation and result analysis is given in Section IV. Some conclusions are given in Section V.

BACK PROPAGATION NEURAL NETWORK

Back Propagation Neural Network (BPNN), also known as Multilayer Perceptron (MLP), is a Multilayer Neural Network which is based upon back propagation algorithm for training. This neural network is based upon extended gradient-descent based Delta learning rule, commonly known as Back Propagation rule. In this network, error signal between desired output and actual output is being propagated in backward direction from output to hidden layer and then to input layer in order to train the network [10], [11].

Consider the network shown in fig. 1. It consists of input layer having i neurons, hidden layer having j neurons and output layer having k neurons.

Fig.1.Structure of Back Propagation Neural Network

In Back Propagation Neural Network, Training process is done in two phases which are explained as following:

In the first phase, the training data is fed into the input layer. It is propagated to both the hidden layer and the output layer. This process is called the forward pass. In this stage, each node in the input layer, hidden layer and output layer calculates and adjusts the appropriate weight between nodes and generate output value of the resulting sum.

In second phase, the actual output values will be compared with the target output values. The error between these outputs will be calculated and propagated back to hidden layer in order to update the weight of each node again. This is called backward pass or learning. The network will iterate over many cycles until the error is acceptable. After the training phase is done, the trained network is ready to use for any new input data. During the testing phase, there is no learning or modifying of the weight matrices. The testing input is fed into the input layer, and the feed forward network will generate results based on its knowledge from trained network [7], [8], [9].

According to Gradient- descent based Delta learning rule, the weight change should be in negative direction to the error gradient in order to minimize total sum square error E, which is given as

E = (1)

Where dk is the desired output of this BP neural network. Now change in weights from output to hidden layer interconnections can be expressed as

∆ Wkj = - ƞ (2)

Where ƞ gives the learning rate of this neural network. Thus according to back propagation algorithm, weight update from output layer to hidden layer interconnections can be expressed as

W t+1 = W t + ∆ Wkjt (3)

W t+1 = W t - ƞ (4)

Similarly weights are updated for hidden to input layer interconnections and neural network is trained.

INTERNET TRAFFIC DATA SET

In this research work, we have used a data set of 10,193samples for two different cases of input features. [5], [6]. Each data sample belongs to a particular internet application. These applications for present data set are of twelve types which are given as: WWW, Mail, FTP-CONTROL, FTP-PASV, Attack, P2P, Database, FTP-Data, Multimedia, Services, Interactive and Games.

For first case, we have trained BPNN using a data set of 10,193 samples, where each input sample consists of 248 features to characterize each particular application. These features mainly include inter packet arrival times (max., min., median, mean, variance, first quartile, third quartile etc), total number of packets (server to client and client to server), total number of bytes on the wire and in IP packet (max., min., median, mean, variance, first quartile, third quartile etc), control signals, bandwidth, duration, FFTs of various features, server port, client port and many other features. After the network is being trained, then out of the data set of 10,193 data samples, 2548 input data samples are being used for testing purpose and to obtain the classified outputs.

For second case, we have trained BPNN using a data set of 10193 samples again, where each input sample consists of only 48 features to characterize each particular application. These features mainly include Inter packet arrival times (min., max., mean and variance), total number of packet byes on the wire and in IP packet (max., min., mean, variance), bandwidth features, duration etc. After the network is being trained using this reduced feature data set, 2548 data samples of same data set are used for testing purpose and to obtain classified outputs.

IMPLEMENTATION AND ANALYSIS

Methodology

In this research work, we have used MATLAB R2009b to develop BPNN program. First of all, this BPNN model was trained with training pair consisting of 10,193 training inputs and training targets consisting of 248 features in each input i.e. full feature training data set. [5], [6]. After that full feature testing data set of 2548 input samples is provided to this BPNN model and output file consisting of various application classes are obtained. After that, we have used reduced feature training data set of 10,193 samples , each having 48 features only, are used for training the BPNN again and then reduce feature testing data set of 2548 samples is used to obtain the classified outputs.

The Classification Accuracy and recall areemployed in this research work in order to evaluate performance of BPNN against number of hidden layer neurons and training time for full feature data set and reduced feature data set [1], [3], [4]. Accuracy can be deduced from confusion matrix as shown in fig. 2.

Positive Negative

TP / FP
FN / TN

True

False

Fig.2Confusion Matrix

In this confusion matrix following terms are used:

True Positive (TP): Percentage of samples of class Z correctly classified as belonging to class Z.
True Negative (TN): Percentage of samples of other classes correctly classified as not belonging to class Z.
False Positive (FP): Percentage of samples of other classes incorrectly classified as belonging to class Z (equivalent to 100%-TN).
False Negatives (FN): Percentage of samples of class Z incorrectly classified as not belonging to class Z (equivalent to 100%-TP).

All these matrix terms are considered to range from 0 to 100%. In general, Accuracy can be defined as percentage of correctly classified samples over all classified samples. It is given as follows:

Accuracy (ACC.) = (5)

Recall can be defined as percentage of samples of class Z correctly classified as belonging to class Z. It is given as follows:

Recall (R) = TP (6)

Training Time (Ttrn): It is the total time taken for training of RFB Neural Network. Training time depends upon number of training data samples and number of hidden layer neurons. In this paper, it is measured in minutes.

Analysis

Table I shows Classification accuracy of BPNN with increase in number of hidden layer neurons and Training Time of BPNN model for full feature data sets and reduced feature data sets. It is clear from this table and fig. 3and fig. 4 that as the number of hidden layer neurons increases from given data set of training samples, accuracy of BPNN classifier is going to be increased and training time of BPNN is also going to be increased.

TABLE I

ACCURACY VS. NO. OF HIDDEN LAYER NEURONS AND TRAINING TIME

No. of Hidden
Layer Neurons / BPNNwith full feature data set / BPNN with reduced feature data set
Accuracy (%) / Training Time
(Minutes) / Accuracy (%) / Training Time
(Minutes)
50 / 78.54 / 5 / 62.45 / 3
100 / 80.35 / 7 / 65.30 / 4
200 / 80.16 / 11 / 68.74 / 6
300 / 77.78 / 14 / 70.98 / 9
400 / 82.71 / 19 / 68.64 / 11
500 / 81.26 / 22 / 72.54 / 13
600 / 75.12 / 26 / 64.55 / 16
700 / 74.89 / 29 / 66.80 / 18
800 / 64.56 / 33 / 69.87 / 24
900 / 55.19 / 36 / 66.75 / 28
1000 / 44.91 / 47 / 69.83 / 33

In case of full feature (248 features) training and testing data set, BPNN gives optimum accuracy value of 82.71 % at 400 hidden layer neurons and training time of 19 minutes. But in case of reduced feature (48 features only) training and testing data set, BPNN gives optimum accuracy value of 72.54 % at 500 hidden layer neurons and training time of 13 minutes.

Fig. 3. Accuracy Vs. No. of hidden Layer Neuronsfor BPNN with full feature and reduced feature data sets

Fig. 4. Training Time Vs. No. of hidden Layer Neuronsfor BPNN with full feature and reduced feature data sets

Table II shows recall value of various internet applications for BPNN with full feature and reduced feature training and testing data sets. It is also clear from fig. 5 that BPNN with reduced feature data sets givesalmost equal recall value for most of applications as compared to BPNN with full feature data sets.

TABLE II

RECALL OF BPNN FOR FULL FEATURE AND REDUCED FEATURE DATA SETS

Various Internet Applications / BPNNwith full feature datasets:
Recall
(%) / BPNN with reduced feature datasets: Recall (%)
WWW / 85.91 / 61.65
MAIL / 60.98 / 68.64
FTP-CONTROL / 89.59 / 81
FTP-PASV / 83.41 / 78.26
ATTACK / 79.0 / 75.5
P2P / 46.28 / 56.65
DATABASE / 95.05 / 55.98
FTP-DATA / 96.61 / 77.76
MULTIMEDIA / 82.57 / 70.18
SERVICES / 91.48 / 91.48
INTERACTIVE / 100 / 80.85

Fig. 5. Recall of various applications for BPNN with full feature and reduced feature data sets.

From all this analysis, it is clear that by reducing number of features for testing and training data sets, accuracy of BPNN classifier decreases slightly as compared to that of BPNN with full feature data sets and training time is reduced to much extent in case of reduced feature data sets as compared to that of full feature data sets. Thus with reduction in number of features of training and testing data sets , complexity and training time of neural networks reduce to great extent. Therefore, number of features in training and testing data sets used to characterize various internet applications should not be necessarily very high. These data sets should include only important and meaningful features required to describe any particular internet application.

CONCLUSION

In this paper, wehave first designed a BP Neural Network. Then this neural network is trained by using given full feature and reduced feature data sets of training data samples. Then for both the cases, the performance of BPNN is evaluated based upon increase in number of hidden layer neurons and training time of BPNN. Results show that for better overall performance of BPNN, number of features in training and testing inputs should not be very high. It should include important and meaningful features to represent various internet applications because with reduction in number of features, accuracy of BPNN classifier is still maintained to a magnificent value. Thus this research work shows that BPNN is an effective machine learning technique for internet traffic classification even with reduction in number of features of training and testing data samples.

REFENCENCES

[1]Thuy T.T. Nguyen and Grenville Armitage. “A Survey of Techniques for Internet Traffic Classification using Machine Learning,”IEEE Communications Survey tutorials, Vol. 10, No. 4, pp. 56-76, Fourth Quarter 2008.

[2] Arthur Callado, Carlos Kamienski, Géza Szabó, Balázs Péter Ger˝o, Judith Kelner,Stênio Fernandes ,and Djamel Sadok. “A Survey on Internet Traffic Identification,”IEEE Communications Survey tutorials,Vol. 11, No. 3, pp. 37-52, Third Quarter 2009.

[3]Runyuan Sun, Bo Yang, Lizhi Peng, Zhenxiang Chen, Lei Zhang, and Shan Jing. “Traffic Classification Using Probabilistic Neural Network,” in Sixth International Conference on Natural Computation (ICNC 2010), 2010, pp. 1914-1919.

[4]Luca Salgarelli, Francesco Gringoli, Thomas Karagiannis. “Comparing Traffic Classifiers,” ACM SIGCOMM Computer Communication Review, Vol. 37, No. 3, pp. 65-68, July 2007.

[5]Andrew W. Moore, Denis Zuev, Michael L. Crogan,“Discriminators for use in flow-based classification,”Queen Mary University of London, Department of Computer Science, RR-05-13, August 2005.

[6]Wei Li , Marco Canini , Andrew W. Moore , Raffaele Bolla, “Efficient application identification and the temporal and spatial stability of classification schema”Elsevier Journal, Computer Networks, Vol. 53, pp. 790–809,23 April, 2009.

[7]Y.L. Chongand K. Sundaraj, “A Study of Back Propagation and Radial Basis Neural Networks on ECG signal classification”. 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009.

[8]Mutasem khalil Alsmadi, Khairuddin Bin Omar, Shahrul Azman Noah ,Ibrahim Almarashdah, “Performance Comparison of Multi-layer Perceptron (Back Propagation, Delta Rule and Perceptron) algorithms in Neural Networks”2009 IEEE International Advance Computing Conference (IACC 2009) ,Patiala, India, 6-7 March 2009, p. 296-299.

[9]P. Jeatrakul and K.W. Wong, “Comparing the Performance of Different Neural Networks for Binary Classification Problems,”Eighth International Symposium on Natural Language Processing, 2009, pp. 111-115.

[10]Satish Kumar, Neural Networks: A Classroom Approach, 6th edition, Tata McGraw – Hill Publishing Company Limited, New Delhi, 2008.

[11]Simon Hakin, Neural Networks: A Comprehensive foundation, 2th edition, Pearson Prentice Hall, New Delhi, 2005.