CHAPTER TWONEURAL NETWORKS
2.1 Overview
This chapter presentsan explanation of Artificial Neural Networks (ANNs).The importance of ANNs in the technological development and how it works.
Inspirationof thesystemfrom the human brain.ANNs architectural and components.The process oftrainingand testing for the ANNs.Their algorithmsand laws of trainingwill be discussed. The backpropagation learning algorithmwhich is used to train the dental identification system will be described.
2.2 Artificial Neural Networks (ANNs)
An ANN is amathematicalmodeldesigned forsimulationofnerves systeminthe human brain.It is a machine that is designed to model the way in the brain performs a particular task or function of interest.The network is usually implemented by using electronic components or is simulated in software on a digital computer. Each cellof this mathematical modelissimple, butwhen itmet together it will be able to producecomplexresults [12].
During the timewith the development ofhumanknowledgeof the humanbrainand try to makea mathematical model that simulatesthebrain system. Create a systemto solvethe problems of artificial intelligence withoutnecessarily need tocreatingof a real biological system. Here is thebeginning ofwork onartificialneural networks.
McCullouch and Pitts in 1943, created a computational model for neural networks based on mathematics and algorithms. They called this modelthreshold logic. The model paved the way for neural network research to split into two distinct approaches. One approach focused on biological processes in the brain and the other focused on the application of neural networks to artificial intelligence [13].
In the late 1940s psychologistDonald Hebbcreated a hypothesis of learning based on the mechanism of neural plasticity that is now known asHebbian learning.
Hebbian learning is considered to be a 'typical'unsupervised learningrule and its later variants were early models forlong term potentiation. These ideas started being applied to computational models in 1948 withTuring's B-type machines[14].
Farley and Clark in 1954, first used computational machines, and then called calculators, to simulate a Hebbian network at Massachusetts Institute of Technology(MIT). Other neural network computational machines were created by Rochester, Holland, Habit, and Duda in 1956[15] [16].
Rosenblattin 1962, created the perceptron, an algorithm for pattern recognition based on a two-layer learning computer network using simple addition and subtraction. With mathematical notation, Rosenblatt also described circuitry not in the basic perceptron, such as theexclusive-orcircuit.A circuit whose mathematical computation could notbeprocessed until after thebackpropagationalgorithm was created by Werbos in 1975[15] [17].
Neural network research stagnated after the publication of machine learning research by Minsky and Papertin 1969. They discovered two key issues with the computational machines that processed neural networks. The first issue was that single-layer neural networks were incapable of processing the exclusive-or circuit.
Thesecond significant issue was that computers were not sophisticated enough to handleeffectively the long run time required by large neural networks. Neural network research slowed until computers achieved greater processing power. Also key in later advances was thebackpropagationalgorithm which effectively solved the exclusive-or problem[18].
The cognition in 1975, designed byKunihiko Fukushimawas an early multilayered neural network with a training algorithm. The actual structure of the network and the methods used to set the interconnection weights change from one neural strategy to another, each with its advantages and disadvantages. Networks can propagate information in one direction only, or they can bounce back and forth until self-activation at a node occurs and the network settles on a final state. The ability for bidirectional flow of inputs between neurons/nodes was produced with theHopfield's networkin 1982, and specialization of these node layers for specific purposes was introduced through the firsthybrid network[19].
Theparallel distributed processingof the mid-1980s became popular under the nameconnectionism. The text by Rummelhart and McClelland in 1986 provided a full exposition on the use of connectionism in computers to simulate neural processes[20].
The rediscovery of thebackpropagationalgorithm was probably the main reason behind the repopularisation of neural networks after the publication of "LearningInternal Representations by Error Propagation" in 1986 (Though backpropagation itself dates from 1969).Training was done by a form of stochasticgradient descent. The employment of the chain rule of differentiation in deriving the appropriate parameter updates results in an algorithm that seems to 'backpropagate errors', hence the nomenclature. However it is essentially a form of gradient descent. Determining the optimal parameters in a model of this type is not trivial. And localnumerical optimizationmethods such as gradient descent can be sensitive to initialization because of the presence of local minima of the training criterion. In recent times, networks with the same architecture as the backpropagation network are referred to asmultilayer perceptrons (MLP). This name does not impose any limitations on the type of algorithm used for learning[20].
The back propagation network generated much enthusiasm at the time and there was much controversy about whether such learning could be implemented in the brain or not.Partially because a mechanism for reverse signaling was not obvious at the time, but most importantly because there was no plausible source for the 'teaching' or 'target' signal. However, since 2006, several unsupervised learning procedures have been proposed for neural networks with one or more layers, using"deeplearning"algorithms. These algorithms can be used to learn intermediate representations,with orwithout a target signal, that capture the salient features of the distribution of sensory signals arriving at each layer of the neural network[21].
2.3 Human Brain
The sonar of a bat is an active echo-location system, in additional to providing information about how far away a target is.A bate sonar load information about the relative speed of the target, the size of the target, the size of different features of the target, the azimuth and elevation of the target. The complex neural computations needed to extract all this information from the target echoes occur within a brain the size of aberry.The bat sonar system can achieve a success rate accuracy excellence and larger than all the radar and sonar systems that made by engineering.
The human nervous system can be viewed and represented as a system consistsof three stages. The first stage is the central brain, which is the main combinationof the nervous system in humans, which receives information continuously and responds to them and take the appropriate decisions. While the second stage, which is a set of lines and lanes, which represents the transfer of data from sensors in all parts of the body to the brain asan electrical pulse. And feed the information through the system, vice versa in the process of system response the lines represent the reaction of the system to a particular input and response quickly. While the third stage is the sensors system deployed in the all over the body and which are converting external stimuli to an electrical pulse that transmits information to the brain, and effectors shift electrical impulses generated by the neural networks as a significant response to the output of the system as shown in figure 2.1[22].
Figure 2.1 Block Diagram RepresentingNervous System [22].
The human brain is made up of a wide network of computing elements called (neurons), coupled with sensory receptors and effectors. The average human brain approximately contains about 100 billion cell of various types. Neurons are a special cell that conducts an electrical signal. There are about 10 billion neurons in the human brain. The remaining 90 billionon cells are called glial or gluecell and these serve as support cells for the neurons. Neurons interact through contact links called synapses. Each synapse spans a gap about a millionth of an inch wide. On the average each neuron receives signals via thousands of synapses[19] [22].
In fact, the neurons are five or six timesslower than current of silicon gates. The moderncomputer easily outperforms the human in reprogrammable, repetitive computationsquickly than human do, but in real applications such as understanding, perception, speech, which human being almost effortlessly implements, are still beyond the reach of serial digital computers even after allowing for an increase of speed. This task is difficult for the serial digital computers because the load requires speed not realizable with existing technology currently[22].
Manyofthe factsare still unknownabouthow the brain worksand how it trainsitselftoprocess information. The learning processoccursby changingthe effectivenessof thesynapsesbetween nerve cells, and by adding or deleting connectionsbetween neurons.
2.4Artificial Neural Network Components
An Artificial Neural Network (ANN) is a massively parallel distributing processor made up of simple processing unites which has natural propensity for storing experientialknowledge and making it available for use. It resembles the brain in two respects [23]:
- Knowledge is acquired by the network from its environment through a learning process.
- Inter neuron connection strengths known as synaptic weights are used to store the acquired knowledge.
2.5 Models of Neurons
A neuron is an information- processing unit that is fundamental to the operation of a neural network. Thefirst attemptsto simulateneural networks in the nervous systemare to drawthe main features of neurons and howinterconnection andprogrammed it bya computer programto simulatethese features. Butour knowledge ofnervessystemis not great enough andour potentialtechnological arelimited,in figure 2.2belowthe mathematical modelto simulate neurons[24] [25].
Figure 2.2NeuronsMathematicalModel[22].
There are three basic elements of the neurons model as shown in figure 2.4 [24]:
1. A set of synapses of connecting links, each of which characterized by a weight or strength of its own.
2. An adder for summing the input signal, weighted by the respective synapses of the neuron, the operations described here constitute a linear combiner.
3. An activation function for limiting the amplitude of the output of a neuron. The activation function is also referred to in the literature as a squashing function in that it squashed (limits) the permissible amplitude range of the output signal to some finite value. Typically the normalized amplitude rang of the output of a neuron is written as the closed unit interval [0 1] or alternatively [-1 1].
Figure2.3 NeuronsNetwork Elements[22].
2.5.1 Typesof Activation Function
The activation function, defines the output of a neuron in terms of the activity level at its input. There are three basic types of activation function [22] [24]:
- Threshold function: in Engineering literature this form of a threshold function is commonly referred to as a Heaviside function. Correspondingly, the output of neuron K employing such a threshold function is expressed as
(2.1)
McCulloch-Pitts model, create a model in recognition of the pioneering work done by McCulloch and Pitts (1943). In this model the output of a neuron takes on the value of 1 if the induced local field of that neuron is nonnegative and 0 otherwise as shown in figure 2.4.
Figure 2.4Threshold Activation Function[22].
- Piecewise linear function:Theamplificationfactorinside the linear regionissupposedto beunity.This type ofan activationfunction can be viewed as approximation to a nonlinear amplifier. The following two situations may be viewed as special forms of the piecewise linear function [22]:
- A linear combiner arises if the linear region of operation is maintained without running into saturation.
- The piecewise linear function reduces to a threshold function if the amplification doctor of the linear region is made infinitely large.
The output of neuron that uses the piecewise linear function is expressed as
(2.2)
Figure 2.5 shows the Piecewise linear activation function
Figure 2.5Piecewise LinearActivation Function[2].
- Sigmoid function: the sigmoid function is the mostcommonly form of activation function that usedin theconstruction ofartificial neural networks. It can be definedas accurate increasefunction ofincreasingandaccurate. There isa balance betweenlinearand nonlinear behavior [24].
There is a slop parameter of the sigmoid function (a). By varying the parameter a, we obtain sigmoid function of different slops. When we make the slop parameter approaches infinity, the sigmoid function become a threshold function. Where the threshold function range value is 0 or 1, the sigmoid function has a continuous range of value from 0 to 1.The equation below shows the sigmoid activation function representation and how the output value calculated.
(2.3)
The sigmoid activationfunction rang is between0to+1, butsometimes the range that used inthis functionis between -1 to+1,in somecases,which is assumes an anti symmetric with respect tothe origin. The activation function is an odd function the induces local field. Figure 2.6 shows the sigmoid activation function.
Figure 2.6Sigmoid Activation Function[2].
2.6Neural Network Architectures
First we must define the perceptron and what it represents. The perceptron can be seen as the simplest kind ofartificial neural network it basically consist of a single neuron with the adjustable synaptic weight as shown in figure 2.7 [19] [24].
Figure 2.7The Perceptron[26].
We can classify the neural network architectures into three classes:
- Single layer perceptron feedforward networks (SLP).
- Multi layer perceptron feedforward networks (MLP).
- Recurrent Networks.
2.6.1 Single Layer Perceptron (SLP)
Artificial Neural Networks (ANNs)are organizedthe neurons in the form of successivelayersof artificial neurons. Thesimplest form ofthese classesthat we have is the input layer, which is thesourcelayerandfollowed by the outputlayeroras it is calledthe computation nodes. In this network there is just a process offeedingforwardbutnot vice versa and it cannot be feedbackward networks. Figure 2.8below showed the single layer perceptron neural network [27].
Figure 2.8 Single Layer Perceptron SLP[28].
2.6.2Multi Layer Perceptron (MLP)
The second class of feedforwardneural networkdistinguishes itself by one or morehidden node or hidden layers.Which isacomputation nodecalledthehiddenneuronsorhiddenunits.The objective and the function ofhidden neurons are to intervenebetween the external inputs and the network outputs in some useful manner.By adding oneor morehidden layerstheneural networkscanactivate theindustrialcapacities ofgreatlearningandpattern recognitionand many othertasks.
Thisprocess depending on the number of hidden nodes by increase the size of hidden layer byadd or decrease by deletehiddenlayers as shown in figure 2.9 [22] [27].
Figure 2.9MultiLayerPerceptron MLP with Two Hidden Layers[22].
The source node in the input layer of the network supply elements to the computation neurons in the first hidden neuronsinthe next layer, which isherfirsthiddenlayer.Output ofthe second layer (first hidden layer) represents the input of the next layer, and so onfor the rest ofthe neural networklayers until the output layer which is the last computation layer in the neural network.Allneurons inthe neural network have inputs and alsohaveoutputsat the sametime.Outputs of the lastlayer neurons (final layer) in artificial neural networksare theoutput ofthe network byprocessingthe supplied input source node in the input (first) layer.
The neural network said fully connected in the sense that every node in each layer of the network is connected to every other node in the adjacent forward layer.If some of the communication links (synaptic connections) are missing from the network, the network is partially connected[22].
2.6.3 Recurrent Networks
A recurrent neural network distinguishes itself from a feed forward neural network and it has at least one feedback loop.Thatmeans theoutputof the feed forward neuralnetworksdueto beinputin the rest ofthe neurons and feedbackin the samenetworkbutnotto the sameneuronsitself.
The presence of feedback loops inartificial neural networkgivesitthe ability to learn from the networkand affectits performance, the figure below shows the recurrent neural networks.
Figure 2.10 Recurrent Networks[22].
2.7Artificial Neural NetworkLearning
The significance of artificial neural network is the ability of this network to learnfrom its environment. The process oflearning inartificial neural networksandalsoreferred to astraining processdependsmainly on the pattern recognition.Wherethe artificial neural networks like human itlearnsthroughexamples.The neural networks learn about the environment through adjusting the synaptic weight value and bias levels. There aretwo types oflearningartificial neural networks first one called thesupervised learning.In this type oflearning,the learning process isunder the supervision ofthe teacher,in this case the programmer.Where the teacher is determinedthe desired output or as called target in termsof inputs thelearningrate depends onthe response of artificial neural networkstothe training, andhow closes the actual outputresulting fromthe training processofthe desiredoutput (target). The second type of learning is unsupervised learning.In this type the neural network doesn't need to a teacher to determine the output of the network [29] [30].
There are a large numberoftraining algorithms that used to train artificialneuralnetworks. The most importantandbest knownand mostwidely usedalgorithmisBackpropagation training algorithm. This algorithm that wasadopted to train the artificial neural networksystem in this thesis.
2.7.1 Learning Paradigms
To make thelearning processeasierto understand. Consider a neural network consisting ofone input neurons in input layer and one computation neurons (output layer) andone or more hidden layers. When the input node feed the network with the input signal it will be operations computationin thehidden layersto thearrivalto theoutput layer.Then we have an actual output, represent the output of theneural network. We make a comparison between the actual output obtainedfromtheneural networkwith the desired output or as called the target. This learning method is known aslearningby error correction learning.
The objective ofthis methodis to createa sequence ofadjustmenton the synaptic weight of the networkto make theactual output signal of theneural network come closer to the desired output or target. There are two types of learning paradigms [30]:
2.7.1.1 Supervised Learning
Supervised leaning and alsoreferred tolearning with a teacher. In these paradigmsthe teacher must haveknowledge ofthe environmentsurroundingthe neural network. The teacher represented this knowledge as a sequence of input output examples. Supposethat the teacherwants toteachthe neural networka particular process, the teacher must provide the neuralnetwork with the desired outputs according with the inputsin thetraining process. Certainlyas we have alreadyexplain the training process doneby modifying thevalues ofthe synaptic weightsin the neural network then reduce the value of the actualoutputand compare it withdesiredoutputs (target).Thus,step by step,the neural networks are simulating the teacher.In the training processknowledge transfer from the teacherto theneural networks.When the neural networkgets tothis stagetheneural networkdispenses the teacher and start working and deal with the environment completely by itself [30] [31].