Radial Basis Network

RADIAL BASIS NETWORK:

AN IMPLEMENTATION

ADAPTIVE CENTERS

Nivas Durairaj

Final Project for ECE539

(ctrl+click to follow contents)

Table of Contents 2

List of Figures 3

Introduction 4

Background 4

Methodology & Development of Program 5

Adaptation Formulas 6

Testing & Comparison of Results 10

Sinusoid Function Testing 12

Piecewise-Linear Function 14

Polynomial Function 16

Conclusion of Results 18

APPENDIX 19

Manual For RBN_adaptive.m 20

Manual For rbn_fixed_selfgen.m 25

Derivation of Partial Derivatives (Adaptive RBF Network) 28

Linear Weights Partial Derivative Term 28

Positions of Centers Partial Derivative Term (hidden layer) 28

Spreads of Centers Partial Derivative Term(hidden layer) 29

Excel Spreadsheet Data for Sinusoidal, Polynomial, 30

& Piecewise Linear Functions 30

References 32

List of Figures

(ctrl+click to follow contents)

Figure 1: An RBF network with one output 4

Figure 2: An RBF network with multiple outputs 5

Figure 3: Training Set Plot from Trainset1.txt 10

Figure 4: Output with 3 Radial Basis Function Inputs 10

Figure 5: Output with 2 Radial Basis Functions 11

Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions 12

Figure 7: Sinosoid Function Cost Function Output 13

Figure 8:Adaptive RBF Network with 10 Radial Basis Functions 14

Figure 9: Adaptive RBF Network with 6 Radial Basis Functions 14

Figure 10: Piecewise-Linear Cost Function Output 15

Figure 11:Adaptive center RBF network for Polynomical Function (6 Radial Basis Functions) 16

Figure 12: Polynomial Cost Function Output 16

Introduction

What neural network model has the same benefits as a feedforward neural network? Of course, it is the Radial Basis Function Network. Similar to feedforward networks such as backpropagation and multilayer perceptron, the radial basis function network aids us in function approximation, classification, and modeling of dynamic systems. They have actually been used to produce results in stock market prediction and speech recognition.

I chose to implement my Intro to Artificial Neural Networks project on RBFs (Radial Basis Functions) because they are still an active research area and there is a lot to be learned from them. These functions were first introduced in the solution of multivariate interpolation problems and now it is one of the main fields of research in numerical analysis. Since I was well acquainted with simple feedforward networks, I decided to implement an adaptive center RBF. In addition, I have some interest in Economics. The thought of producing an algorithm that could help predict the stock market was very appealing to me.

Background

In its most basic form, an RBF consists of three layers with entirely different roles. The input layer is made up of nodes that connect the network to its environment. The second layer is the hidden layer of neurons. At the input of each neuron, the distance between the neuron center and the input vector is calculated. By applying the radial basis function (Gaussian bell function) to this distance, the output of the neuron is formed.

Figure 1: An RBF network with one output

Figure 2: An RBF network with multiple outputs

The last layer is the output layer. It is linear and supplies the response of the network to the activation pattern. The rationale of a nonlinear transformation followed by a linear transformation can be justified in a paper by Cover. [1] A pattern-classification problem is more likely to be linearly separable in high-dimensional space. Therefore, this is the reason for making the dimension of the hidden space in an RBF network high. It is also important to note that the higher the dimension of the hidden space, the more accurate it will be in smoothing the input-output mapping.

Radial basis functions have different learning strategies in the way they approach a problem. Their linear weights tend to evolve on a different time scale compared to the nonlinear activation function. Thus, to optimize the layers, it is best to operate on different time scales. The different learning strategies depend mostly on changing how the centers of the radial-basis functions of the network are specified. My project is based on the particular learning strategy known as supervised selection of centers. Such a RBF network is founded on the interpolation theory.

The easiest approach is to assume fixed radial-basis functions when defining the activation functions of the hidden units. However, with additional computations, one can create an RBF network whose centers of functions undergo a supervised learning process.

Methodology & Development of Program

In developing such a system, the first step should be to develop a cost function as shown below. The cost function is implemented using a gradient-descent procedure that represents a generalization of the least means squares algorithm. Least Mean Squares (LMS) algorithm is widely used to determine the transfer function of an unknown system. By using inputs and outputs of that system, the LMS algorithm is applied in an adaptive process based on the minimum mean squares error.

Cost function

N is the size of the training sample, ej is the error signal and || . ||2 is the Euclidean Distance or norm.

Ej consists of Green’s function. The basic idea of a Green’s function is to play an important role in the solution of linear ordinary and partial differential equations. They are also a key component in the development of integral equation methods.

Green’s function

We can substitute where is the inverse covariance matrix. is training set sample j and is the ith cluster center.

Finally, here is the Green’s function I used to produce the RBF network.

As you can see, it represents a multivariate Gaussian distribution with mean vector ti and covariance matrix . The vectors and matrix span the space Rm where m is the feature dimension of t and x. Thus, the Green’s function results in a single number.

Ex. 1xm vector*mxm matrix*mx1 vector gives 1x1 number.

As seen from above, we need to find the parameters, wi, ti, and such that it minimizes the cost function. The adaptation formulas for the linear weights, positions, and spreads of centers of RBF networks are given below. I was able to get this information from Haykin on page 303. The derivations for the partial derivatives are given in the appendix. [1]

Adaptation Formulas

1. Linear weights (output layer)

where i = 1, 2…..c

2. Positions of centers (hidden layer)

results in a 1xm vector where m is the feature dimension of t and x.

Ex. [1x1*mxm (matrix ()) * mx1 (vector())]

where i=1, 2……..c

3. Spreads of centers (hidden layer)

results in a mxm matrix where m is the feature dimension of t and x.

Ex. [1x1*PxP ] is equivalent to multiplying a mx1 vector and 1xm vector(in this case the transpose) to create a mxm matrix.

where i = 1, 2…..c

Note: c is the number of radial basis functions used.

To calculate the linear weights, I first had to calculate Green’s function which output a single number. Then I found the new wi by substituting the old wi.

%Calculation of linear weights

weightdiff=0;

for j=1:n

g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));

weightdiff = weightdiff + e(j)*g;

end

w(i)=w(i) - (eta1*weightdiff); %single number

The positions of centers were also computed in a similar way. However, ti was going to be a vector that spans Rm where m is the feature dimension.

%Calculation of positions of centers(hidden layers)

postdiff=0;

for j=1:n

g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));

postdiff = postdiff + (e(j)*g*covinv(:,:,i)*(x(j,:)-t(i,:))');

end

t(i,:)=t(i,:)-(eta2*2*w(i)*postdiff)'; %1xm vector

Spreads of centers were output in matrix form which was expected as the updating inverse covariance was a matrix with mxm dimensions.

%Calculation of Spreads of centers (hidden layer)

spreaddiff=0;

for j=1:n

g=exp(-0.5*((x(j,:)-t(i,:)))*covinv(:,:,i)*((x(j,:)-t(i,:))'));

spreaddiff=spreaddiff + (e(j)*g*(x(j,:)-t(i,:))'*(x(j,:)-t(i,:)));

end

covinv(:,:,i)=covinv(:,:,i) - (eta3*-1*w(i)*spreaddiff); %mxm matrix

In regards to the power of Matlab, I probably should have coded the above using matrix and vector operations. A for loop in Matlab takes up a lot of overhead. However, since I am more used to C, I implemented it as I would in C to avoid confusion in my calculations. Therefore, I believe this program can be further optimized to make full use of the Matlab.

According to Haykin, there are a few points that need to be understood when dealing with an adaptive center RBF network.

· The cost function Ε will be convex with respect to wi, but it is nonconvex with respect to ti, and . This can cause a problem when determining ti, and since the parameters could get stuck at a local minima. I tried to get around this problem by using the Matlab command, pinv. Although it takes longer to compute than the usual inv command, it uses the Moore-Penrose pseudo-inverse algorithm and avoids singular matrix division.

· The parameters wi, ti, and are usually assigned different learning rate parameters η1, η2, η3. In my program, these parameters are input at the beginning. They should be values from 0<η<1.

· This procedure uses the gradient-based steepest descent algorithm unlike the feedforward network, back-propagation. Thus, it does not use error back-propagation.

To prevent infinite values, it is sometimes better to begin the search from a structured initial condition that limits the parameter space to a known area. Before running the RBF network, it may be useful to run it through a standard pattern classifier. This reduces the chance of converging on a local minima.

The algorithm begins with the parameters w, t, and which are given below. It was very important that I set the variables at values that would allow the network to run with the minimum errors. At the beginning, I had first initialized w to w =0.005*randn(c, 1). Unfortunately, this was not a good method of initializing w, because my RBF network produced results that were flagrantly incorrect. I tried many times to find proper eta parameters but that was not possible. Since I was trying to produce a RBF network that would be comparable a fixed-center RBF, I decided to set my initial weights to w=pinv(G)*d. This improved my results immensely because my weights were limited to a known area. The vector t was initialized using the kmeans algorithm. was initialized to an identity matrix of size m by m by c where m is the number of features and c is the number of cluster centers. I thought that this was a good starting point since it reduced any chances of getting stuck in a local minimum at initialization itself.

%Initialization of initial linear weights

G=gauss(x,t,covinv);

w=pinv(G)*d;

%Initialization of t vector

t=cinit(x,2,c); % spread initial cluster center over entire range

t=kmeansf(x,t,.0001,50);

%Initial covariance matrix, identity matrix

cov = eye(m);

%need to take inverse of covariance matrix, makes calculations easier

for i=1:c

covinv(:,:,i)=pinv(cov);

end

Testing & Comparison of Results

To test my adaptive center RBF, I first took some data files from homework 3 of ECE539. The training set (train.txt) consisted of 10 samples of x and d and feature dimension of 1. The testing set (test.txt) consisted of 20 samples. The training set and the output of my RBF network is plotted below:

Figure 3: Training Set Plot from Trainset1.txt

Figure 4: Output with 3 Radial Basis Function Inputs

In this case, eta1=eta2=eta3=0.5. This helped prove to me that my adaptive center RBF network was working correctly. I ran the same data on a fixed center RBF network and received a similar looking output. I could not see any perceptive differences just by examining the graph so I produced a cost function for the fixed center RBF network. It turned out that the cost function outputs from each network were not too different.

Cost for Adaptive Center RBF Network with 3 input radial basis functions / Cost for Fixed Center RBF Network with 3 input radial basis functions
1.1439e-5 / 1.1648e-5

Next, I decided to input only 2 radial basis functions.

Figure 5: Output with 2 Radial Basis Functions

Again, I only found a slight difference between both RBF networks.

Cost for Adaptive Center RBF Network with 2 input radial basis functions / Cost for Fixed Center RBF Network with 2 input radial basis functions
0.404 / 0.404

To see if I could reduce the cost of the adaptive center RBF network, I tried modifying the eta parameters from 0.5. My conclusion was that modifying the eta parameters can reduce the costs but they may not be significantly lower than costs of a fixed center RBF network.

Eta1 / Eta2 / Eta3 / Cost
0.3 / 0.3 / 0.3 / 0.403
0.2 / 0.5 / 0.9 / 0.403
0.8 / 0.2 / 0.3 / 0.404

Using Dr. Hu’s function generator, I was able to generate a few functions to test on my RBF networks. I wanted to see if a certain type of RBF network would actually perform better in certain situations. The function generation output training and testing data for 3 functions, namely sinusoid, piecewise-linear and polynomial. I decided to use the sinusoid, piecewise-linear, polynomial functions to compare the results of the two RBF networks.

Sinusoid Function Testing

Figure 6:RBF network output (Sinusoid Function) with 7 Radial Basis Functions

Figure 7: Sinosoid Function Cost Function Output

Testing the radial basis function networks against the sinusoid data, the data seemed to show that for fewer radial basis functions, the adaptive center RBF network performs slightly better. However, after that, a fixed-center RBF network achieves results that are similar if not better than the other RBF network. As a side note, we can probably forget about the cost output of two radial basis functions since two is too few a number to correctly match the sinusoid function. The data for the above is chart is given in the appendix.

Piecewise-Linear Function

Figure 8:Adaptive RBF Network with 10 Radial Basis Functions

Figure 9: Adaptive RBF Network with 6 Radial Basis Functions

Figure 10: Piecewise-Linear Cost Function Output

For this function, the adaptive center RBF network performed better till the number of radial basis functions reached 6. After 6, the fixed-center RBF network began to gain better results. I stopped compiling the cost outputs at 10 radial basis functions as the differences were in the powers of negative 7. Nevertheless, at 9 radial basis functions, both the adaptive center and fixed center network models were providing similar approximations of the piecewise-linear function. At 10 radial basis functions, the adaptive center RBF network provided the best model with a cost function output of 3.7823x10-7. Data for the chart is given in the appendix.