Performance Prediction of Software Defined Network Using an Artificial Neural Network
Abstract
An Artificial Neural Network has been proposed as predicting the performance of the Software Defined Network (SDN) according to effective traffic parameters. Those used in this study are round-trip time, throughput and the flow table rules for each switch, POX controller and OpenFlow switches, which characterize the behaviour of the SDN network, have been modelled and simulated via Mininet and Matlabplatforms. An ANN has the ability to provide an excellent input-output relationship for nonlinear and complex processes. The network has been implementedusing different topologies, one and two layers in the hidden zone with different numbers of neurons. Generalization of the prediction model has been tested with new data that are unseen in the training stage. The simulated results show reasonably good performance of the network.
Keywords: Neural Network; SDN; Traffic Prediction.
Introduction
The meteoric rise in the use of information and communication technology (ICT) over the past three decades has had a profound impact on consumers’ lives. Today, ICT is tasked with the demand for meeting new services and applications according to exacting requirements, including availability, service quality, dependability, resilience and protection. This has led to increasingly complex networks and software systems for supporting heterogeneous applications, new technologies and multi-vendor equipment, thus making the management of network infrastructures a major challenge [1]. Software Defined Networking (SDN) and OpenFlow architecture have emerged as being capable of delivering the programmability necessary to configure and manage the network dynamically. In distinguishing the control plane from the data plane and transforming the former into a conceptually centralized controller, SDN provides network operators with a robust capacity to carry out a wide-range of network practices (e.g. routing, security and fault-tolerance) as well as the capability of deploying novel network technologies speedily. The most employed SDN application today depends upon a logically centralized controller, which has an overall view of the network. At present, OpenFlow is the most popular SDN protocol/standard with a set of design specifications. However, a single controller in SDN and an OpenFlow protocol usually has a restricted resource capacity, hence being unable deal with the large volumes of traffic originating from the infrastructure switches. In this case, the size of the traffic system can rise significantly and hence, impact negatively on application and service performance [2]. Recent proposals to address these limitations, have involved using a prediction system in the controller that works cooperatively so as to manage the network traffic configuration more effectively [3], [4]. Owing to these and the volumes at different locations significantly varying over time, an ANN is proposed in this paper to be implemented in the controller so as to respond to network “congestion” and predict the traffic pattern after a set period of time. Neural networks are nonlinear systems with the capacity to learn through comprehending the behaviour of the input-outputs (data sets) gathered from real systems. Systems with neural network building blocks are robust in that small errors occurring does not interfere with the correct functioning of the system and this property makes them appropriate for prediction the performance of the network[5].
The aim of this paper is to trainan ANN to predict the performance of an SDN controller by employing the most effective factors and testing the model with new situation unseen during the training stage. Two types of traffic parameters are used in training, the first being round-trip time (RTT), also called round-trip delay, which is the time for a packet to move between a particular source to a destination and return. The second is throughput, which pertains to measuring the maximum flow setup rate that the controller can maintain. The dataset including the flow table, RTT and throughput have been collected every five minutes for various traffic congestion patterns[6].
SDN and OpenFlow Protocol
The OpenFlow architecture has the following components: the OpenFlow controller, OpenFlow device (switch) and the OpenFlow protocol. The OpenFlow method involves a centralized controller that configures all the devices, which should be kept simple so as to achieve improved forward performance and the network control is undertaken by the controller [7]. The OpenFlow controller is the centralized controller of an OpenFlow network, which sets up all the OpenFlow devices, maintains the topology data and monitors the status of the whole network. An OpenFlow device refers to any device capable of OpenFlow in a network, such as a switch, router or access point. Each maintains a flow table that shows the processing applied to any packet of a particular flow. The OpenFlow protocol operates as an interface between the controller and the switches that establish the flow table in the SDN. Moreover, the protocol needs to employ a secure channel according to Transport Layer Security (TLS) [7], [8].
The controller keeps the flow table up to date by adding and removing the flow entries through a secure channel called Transport Layer Security (TLS) between it and the OpenFlow switch using the OpenFlow Protocol. The flow table is a database containing the flow entries that are used to command the switch to apply actions for a certain flow, some of which are: forward, drop and encapsulate. Every OpenFlow device has a flow table with flow and a flow entry comprises three elements: rule, action and statistics. The rule field is used to define the match condition for a specific flow, whilst the action field refers to the action that is to be applied to this flow and the statistics field is employed to count rule occurrence for the purpose of management [8].When a packet arrives at the OpenFlow Switch, it is compared with the flow entries in the flow table and the action is triggered if the flow rule is matched, with the statistics field being subsequently updated. If the packet does not match any of the entries in the flow table, it will be directed to the controller over a secure channel in order to request an action. The packets are matched against all the flow entries according to a prioritization scheme such that an entry exhibiting an exact match (no wildcards) is given the highest priority. Alternatively, the flow table can have a priority field associated with each entry and a higher number indicates that the rule should have priority [9].
Artificial Neural Networks
Artificial neural networks (ANNs) represent one of the most exciting developments in artificial intelligence in recent years, which have been used to model and predict dynamic systems optimally. They have demonstrated excellent performance with regards to learning the input-output relationship for nonlinear as well as complex systems. This relationship can be rapidly and efficiently found by reducing the error between the network output(s) and the actual output(s). Once the network has been trained, the output can be predicted in just a few seconds. Models based on ANNs have been used to solve a host of engineering problems in different fields, including adaptive control, pattern recognition, robotics, image processing, medical diagnosis, fault detecting, process monitoring, renewable and sustainable energy, laser applicationsand nonlinear systems identification [10-14].
An ANN comprises several layers of interconnected nodes (neurons): a) an input layer, b) one or more hidden layers and c) an output layer. Its most popular architecture is feed-forward, whereby the information travels through the network in a forward direction from the input layer in the direction of the output layer. The most important task for the neural network constructer is selecting the right network topology for solving a particular problem. The topology refers to the number of nodes (neurons) and layers found in the hidden zone. There are two basic methods for making the selection, with the first involving the use of evolutionary algorithms (EAs), such as the genetic algorithm GA [15] or practical swarm optimisation (PSO) [16]. The second way is exhaustive search for which the set of all the expected numbers of neurons in the first and second hidden layers is applied. For this paper, the second method was used to build the optimal network topology.
Methodology
In this section, first, the network simulation experiments using Software Defined Networks with POX Controller in a simulated environment with Mininet platform are described. Second, all datasets collected are pre-processed and prepared for implementation in an ANN by the Matlabplatform. The basic steps for developing the model are as follows:
- Simulate the SDN by Minnet
- Database collection
- Analyzing and pre-processing of the data
- Training of the ANN
- Testing of the trained ANN
- Post-processing of the output data
- Testing the ANN with unseen data, for a generality prediction check
- Use of the trained ANN for simulation and prediction
a.SDN Simulation
The simulation scenario consisted of five OpenFlow switches (S1, S2, S3, S4, S5), with each being connected to two hosts (h1, h2) and all the switches and hosts were connected to a reactive POX controller.TheSCP program was used to send different file sizes for the run time from one host to another. The data representing TCP throughput and RTT were collected and analyzed from the switch components in POX controller during this time. The results were gathered from all the switches rather than individually. The complete run time was 180 minutes divided into three equal parts. The file size transferred was 100 megabytes in the first part, whilst during the second it was 50 megabytes and finally, for the third part this was set at 25 megabytes. In addition, the program provided the performance metrics and the average values for RTT and throughput, which were collected and registered every 5 minutes so as to study the performance of different traffic loads in the network. All the information was stored in a data base for use in the learning system in the next stage. Moreover, the flow table for each switch was saved in the database at the same as when the RTT and throughput were collected, which were stored in the same database.
b. Artificial Neural Network Simulation
For this work, the dataset of the inputs and outputs was divided randomly into three subsets:
- Training set (80%)
- Testing set (10%)
- Validation set (10%)
The first subset was used for computing the gradient and updating the network weights and biases. The error on the second subset was observed during the training development. The validation error is usually decreased during the initial phase of training, as is the training set error. Nevertheless, when the network overfits the data, the error in the validation set invariably starts to rise. In the current case, the network parameters were saved at the minimum of the validation set error [17].
As mentioned earlier, the optimal topology giving the best performance was selected by conducting an exhaustive search for the number of neurons in both the first and second hidden layers, as depicted in Figure (1). The ANN was trained in a nested loop and the recorded performance index, the Mean Square Error (MSE), was saved in the external matrix containing all the information gathered during the loop. Owing to the random initializing of the parameters (weights and biases), the network training was run ten times. During each run the numbers of neurons between 1 and 20 were taken into account for both layers in the hidden zone.
Figure 1. Exhaustive search of optimal NN topology [18]
The transfer function used in the hidden zone is log sigmoid, given by equation (1):
… (1)
where, Xi is the input of the neuron in the hidden layer and Yi is the output of neuron. When calculating Xi, the inputs values must be normalized in the rangeh [-1, 1] corresponding to the minimum and maximum of the actual values. Subsequently, testing the ANN required new independent set (test sets) to validate the generalization capability the prediction model.
Results and Analysis
After collecting and preparing the dataset, it was randomly divided into 70%, 15% and 15% for training, validation and testing, respectively. A feedforward multilayer network was implemented to estimate the performance of the SDN. For each network architecture, the training algorithms were run ten times with different random initial weights and biases using the Levenberg–Marquardt algorithm (LMA). After investigating the performance of different architectures using the exhaustive search method, the best trained ANN with one hidden layer consists of 17 neurons in the hidden layer, which gives comparably better performance of MSE, with 2.488×10-8. Whilst the best network with two hidden layers (including 10 neurons in the first and 10 in the second) gives nearly the same MSE performance with 2.466×10-8. Figures 2 and 3 show the performance of the network as a mean square error (MSE) versus the network architecture for single and double hidden layers, respectively.
Figure 2 Performance of a one hidden layer ANN
Figure 3 Performance of a double hidden layer ANN.
The predicted and actual SDN performance is compared and depicted in Figure 4, while the linear regression of the date is shown in Figure 5.
Figure 4 Comparison of predicted with actual SDN performance for the training sets
Figure 5 Comparison of predicted with actual SDN performance for the testing sets
(a) /
(b)
Figure 6 the linear regression of thetargetsrelative tooutputs for the SDN performance (a) training set (b) testing set
Conclusions
In this paper, an ANN has been proposed to find the correlation between the most effective input factors and performance output, with the former being the round-trip time, throughput and the flow table rules for each switch as well as the POX controller and OpenFlow switches. The ANN can be used very efficiently as a predictor, especially in an SDN controller with different traffic load, when nature of the network is complex and highly nonlinear. The various topologies of the ANN were tested by applying one and two hidden layers with different numbers of neurons. Moreover, LMA was used as learning algorithm in the feed-forward ANN structure. The results have shown that the network with one hidden layer gives a very acceptable MSE, for this has demonstrated that the MSE is less than 2.466×10-8. The proposed ANN could be used efficiently to improve the performance of an SDN by selecting the best input parameters, which is to be the subject of future work.
Acknowledgment
The corresponding author is grateful to the Iraqi Ministry of Higher Education and Scientific Research for supporting the current research.
References
[1]D. Tuncer, M. Charalambides, S. Clayman, and G. Pavlou, “Adaptive Resource Management and Control in Software Defined Networks,” IEEE Trans. Netw. Serv. Manag., vol. 12, no. 1, pp. 18–33, 2015.
[2]A. Tootoonchian, S. Gorbunov, Y. Ganjali, M. Casado, and R. Sherwood, “On controller performance in software-defined networks,” Proceeding Hot-ICE’12 Proc. 2nd USENIX Conf. Hot Top. Manag. Internet, Cloud, Enterp. Networks Serv., pp. 10–10, 2012.
[3]A. Tootoonchian and Y. Ganjali, “Hyperflow: a distributed control plane for openflow,” Proc. 2010 internet Netw.pp. 3–3, 2010.
[4]F. Bari, A. R. Roy, S. R. Chowdhury, Q. Zhang, M. F. Zhani, R. Ahmed, and R. Boutaba, “Dynamic Controller Provisioning in Software Defined Networks,” pp. 18–25, 2013.
[5]W. Junsong, W. Jiukun, Z. Maohua, and W. Junjie, “Prediction of internet traffic based on Elman neural network,” 2009 Chinese Control Decis. Conf., no. 2, pp. 1–4, 2010.
[6]T. Luo, H.-P. Tan, P. C. Quan, Y. W. Law, and J. Jin, “Enhancing responsiveness and scalability for OpenFlow networks via control-message quenching,” 2012 Int. Conf. ICT Converg., pp. 348–353, 2012.
[7]F. Hu, Q. Hao, and K. Bao, “A Survey on Software-Defined Network and OpenFlow : From Concept to Implementation,” IEEE Commun. Surv. TUTORIALS, vol. 16, no. 4, pp. 2181–2206, 2014.
[8]A. Lara, A. Kolasani, and B. Ramamurthy, “Network Innovation using OpenFlow : A Survey,” IEEE Commun. Surv. TUTORIALS, vol. 16, no. 1, pp. 493–512, 2014.
[9]M. Fernandez, “Evaluating OpenFlow Controller Paradigms,” in ICN 2013, The Twelfth International Conference on , 2013, no. c, pp. 151–157.
[10]I. M. Mujtaba, N. Aziz, and M. a. Hussain, “Neural network based modelling and control of batch reactor.,” no. 152, 2006.
[11]H. Jackson and D. Ph, “Application of Neural Networks to Chemical Process Control,” vol. 37, pp. 387–390, 1999.
[12]J. W. E. Catto, D. a. Linkens, M. F. Abbod, M. Chen, M. Meuth, and F. C. Handy, “The application of artificial intelligence in predicting outcome of bladder cancer: A comparison of neuro-fuzzy modelling and artificial neural networks,” Eur. Urol. Suppl., vol. 2, no. 1, p. 66, 2003.
[13]A. Khosravi, S. Nahavandi, D. Creighton, and a. F. Atiya, “Comprehensive Review of Neural Network-Based Prediction Intervals and New Advances,” IEEE Trans. Neural Networks, vol. 22, no. 9, pp. 1341–1356, 2011.