Fundamentals of Neurocomputing

Fundamentals of Neurocomputing

• systems constructed to make use of some of the organizational principles felt to be used by the brain.

• theoretical themes

1. network structure

2. learning algorithms

3. knowledge representation

The Generic Connectionist Model

• neurons receive inputs from other neurons via synapses which can be excitatory or inhibitory.

• excitatory input - the receiving neuron is likely to fire action potentials.

• inhibitory input - the receiving neuron is less likely to fire.

• outputs are sent to other neurons by axons.

• a neuron contains a continuous internal potential called a membrane potential and when this exceeds

a threshold, the neuron can propagate an all-or-nothing action potential down its axon.

• many neurons can be active simultaneously — the set of simultaneous element activities is represented

by a state vector.

• artificial neural networks have many computing elements connected together — often arranged in a

connection matrix.

• overall system behaviour is determined by the structure and strengths of the connections — the strengths

may be changed by various learning algorithms.

• the learning phase — connection strengths in the network are modified.

• the retrieval phase - some initial information (initial state vector or activity pattern) is put into the

system, passes through the connections and gives rise to an output pattern.

• network structure

— elements are arranged in groups or layers.

— a single layer of neurons that connects to itself is referred to as an autoassociative system.

— multi-layer systems contain input and output neurons and neurons which are neither, called hidden units.

• brain-like general rules for representations:

1. similar inputs usually give rise to similar representations.

2. things to be separated should be given widely different representations.

3. if something is important, lots of elements should be used to represent it.

4. do as much lower-level preprocessing as possible, so the learning and adaptive parts of the network

need do as little work as possible

— build invariances into the hardware and do not require the system to learn them.

Foundations of Artificial Neural Networks

• Three Elements

1. an organized topology of interconnected processing elements.

2. a method of encoding information.

3. a method of recalling information.

• Two Key Concepts

1. techniques for analyzing neural network dynamics.

2. general taxonomy of all neural network paradigms.

Processing Elements

• components where most, if not all, of the computing is done.

• input signals

— from environment or other PEs.

— form an input vector A = (a1,..., ai,..., an) where ai, is the activity level of the ith PE or input.

• weights

— associated with each connected pair of PEs is an adjustable value.

—the weights connected to the jth PE form a vector Wj = (w1j,.. .,Wij,.. .,Wnj) where Wij represents the

connection strength from the PE ai to the PEbj.

• internal threshold value

— j is modulated by the weight wojthat is associated with the inputs.

— must be exceeded for there to be any PE activation.

• output value

bj = f(A • Wj – wojj) or

bj = f(ni=1ai Wij – Wojj)

Threshold Functions

• map a PE's infinite input domain to a prespecified range or output.

• there are four common functions:

1. linear function

2. nonlinear ramp function

3. step threshold function

4. sigmoid threshold function

Linear Function

• f(x) = x where  is a real-valued constant that regulates the magnification of the PE activity x.

Nonlinear Ramp Function

• the linear function bounded to the range [—, +].

f(x) = + if x 

x if x

- if x  -

where  (-) is the PE's maximum (minimum) output value or saturation level.

• this is a piece-wise linear function which is often used to represent a simplified nonlinear function.

Step Threshold Function

• the threshold function only responds to the sign of the input.

f(x) = + if x > 0

- otherwise

where  and  are positive scalars.

• often this is a binary function with 1 and 0 outputs.

Sigmoid Threshold Function

• this S-shaped function is a bounded, monotonic, non-decreasing function that provides a graded,

nonlinear response.

• examples

1. the logistic function

S(x)=(l+e- x)- l

with saturation levels of 0 and 1.

2. the hyperbolic tangent

S(x) = tanh(x)

with saturation levels -1 and 1.

3. the augmented ratio of squares

f(x) = (x2)/(1+ x2) if x > 0

0 otherwise

with saturation levels 0 and 1.

Topology Characteristics

1. Connection Types

(a) excitatory

• these connections increase a PE's activation.

• represented by a positive signal.

(b) inhibitory

• they decrease a PE's activation.

• represented by a negative signal.

2. Interconnection Schemes

(a) intra-field or lateral connections

• connections between PEs in the same layer.

(b) inter-field

• connections between PEs in different layers.

• connections that loop and connect back to the same PE.

• inter-field signals propagate in one of two ways:

(a) feedforward signals only allow information to flow amongst PEs in one direction.

(b) feedback signals allow information to flow amongst PEs in either direction.

3. Field or Layer Configurations

• layer configurations combine PEs, information flow and connection schemes into an architecture.

• types:

(a) lateral feedback

(b) field feedforward

• input layer - layer that receives input signals from the environment.

• output layer — layer that emits signals to the environment.

• hidden layers - any layers that lie between the input and output layers.

Memory

• pattern types

1. spatial - single static image.

2. spatiotemporal - a sequence of spatial patterns.

• types of spatial pattern matching memories

1. random access memory - maps addresses to data.

2. content-addressable memory — maps data to addresses.

3. associative memory - maps data to data.

• artificial neural networks can provide:

1. CAM - stores data at stable states in some memory matrix W.

2. AM - provides output responses from input stimuli.

• mechanisms for mapping

1. autoassociative

— the memory, W, stores the vectors (patterns) A1,...,Am.

2. heterassociative

— W stores pattern pairs (A1,B1),..., (am, Bm}.

Recall

• a heterassociative recall mechanism is a function g() that takes W (memory) and Ak (stimuli) as input

and returns Bk (responses) as output.

B=g(Ak,W)

• two primary recall mechanisms:

1. nearest-neighbour recall

2. interpolative recall

1. Nearest-Neighbour Recall

• finds the stored input that closely matches the stimulus and responds with the corresponding output.

Bk= g(A',W) where dist{A',Ak)= MIN{dist{A',Aq)}

Q=1 to m

where dist() is usually the Hamming or Euclidian distance function.

2. Interpolative Recall

• accepts a stimulus and interpolates (possibly nonlinearly) from the entire set of stored inputs to produce

the corresponding output.

• for linear interpolation:

B' = g(A', W) where Ap <= A' <= Aq and Bp <. B' ^ Bq

for some pattern pairs (Ap, Bp) and (Aq, Bq).

Learning

The ANN Perspective

• learning is defined to be any change in the memory W.

Learning = dW/dt  0

• two categories

1. supervised learning

2. unsupervised learning

Supervised Learning

• a process that incorporates an external teacher and/or global information.

• techniques:

— deciding when to turn off learning

— deciding how long and how often to present the training

— supplying performance error information

• algorithms: error-correction learning, reinforcement learning, stochastic learning, hardwired systems.

Unsupervised Learning (Self-Organization)

• process that incorporates no external teacher and relies upon only local information and

internal control.

• self-organizes presented data and discovers its emergent collective properties.

Error-Correction Learning (supervised)

• adjusts the connection weights between PEs in proportion to the difference between the desired and

computed values of each PE in the output layer.

wij = ai [cj - bj}

where

— Wij is the memory connection strength from ai to bj.

—  is the learning rate, typically 0 < 1.

Reinforcement Learning (supervised)

• weights are reinforced for properly performed actions and punished for poorly performed ones.

• requires only one value to describe the output layer's performance - a scalar error value.

• wij =  [r — j] eij where

— r is the scalar success/failure value.

— j is the reinforcement threshold value for the jth output PE.

— eij is the canonical eligibility of the weight from the ith PE to the jth PE.

— is the learning rate constant, 0 <  < 1.

Stochastic Learning (supervised)

• uses random processes, probability and an energy relationship to adjust the memory connection weights.

• makes a random weight change, determines the resultant energy after the change, and decides to keep

the weight change

1. if the energy is lower - accept the change

2. if the energy is not lower — accept the change according to a pre-chosen probability distribution

3. otherwise - reject the change.

• allows escape from local energy minima, e.g. simulated annealing.

Donald Hebb on Learning

• Hebb describes the adjustment of a connection weight according to the correlation of the values of the

two PEs it connects in his book, "The Organization of Behavior (1949)":

When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes

a part in firing it, some growth process or metabolic change takes place in one or both cells

such that A's efficiency as one of the cells firing B is increased.

Donald Hebb on Organization

• Cell Assemblies

—the cooperative nature of synaptic modifications is such as to induce the formation of subsets

made up of cells which are mutually activated. A single cell may belong to several assemblies,

and several assemblies may be active simultaneously. The information would then be contained

in these patterns of collective excitation, and would therefore possess a distributed nature.

Hebbian Learning or Correlation Learning

• the adjustment of a connection weight according to the correlation of the values of the two PEs it connects.

• simple Hebbian correlation

—the weight value Wij is the correlation (multiplication) of the PE a, with the PE a, using the discrete

time equation wij = ai aj

where wij represents the discrete time change to Wij.