Introduction to Artificial Neural Networks and Fuzzy Systems

ECE 539

Introduction to Artificial Neural Networks and Fuzzy Systems

- A Maximum Likelihood Approach

Table of Contents

Chapter 1Introduction3

Chapter 2Problem Description and Signal Model 5

2.1Signal Model5

2.2Problem Formulation6

Chapter 3Classification Strategies8

3.1Optimal Centralized Classifier9

3.1.1.Performance of Optimal Centralized Classifier 10

3.2Local Classifiers11

3.2.1Optimal Local Classifier11

3.2.2Sub-optimal Local Classifier12

3.2.2.1 Mixture Gaussian Classifier14

3.2.2.2 Single Gaussian Classifier14

3.3Fusion of Local Decisions at the Manager Node15

3.3.1 Decision Fusion with Ideal Communication Links16

3.3.2 Decision Fusion with Noisy Communication Links16

Chapter 4Simulations and Discussion18

4.1Real Data for Testing18

4.2Simulation Details18

4.3Numerical Results and Observations19

Chapter 5Conclusions28

References29

Chapter 1

Introduction

A wireless sensor network can be envisioned as consisting of hundreds or thousands of tiny sensors deployed in a region of interest to perform a specific task. They have recently attracted a great deal of research attention due to its wide range of potential applications such as surveillance in battlefield scenarios, disaster relief, environmental monitoring etc. Although each sensor may have limited sensing and processing capabilities, once they are empowered with a wireless communication component, they can coordinate among themselves to perform a big sensing task that cannot be achieved by a single sensor node.

Sensor nodes are supposed to run on batteries or scavenge energy from the environment. Therefore, energy efficiency is the major design objective of sensor networks. Multiple-hop relaying is generally essential in transmitting the information to a destination node. The advantage of a sensor network is that a complex task can be accomplished by proper coordination of extremely inexpensive nodes.

Distributed decision making is an important application of sensor networks; for example, the detection and classification of objects in a sensor field. Due to a variety of factors, such as measurement noise and statistical variability in target signals, collaborative processing of multiple node measurements is necessary for reliable decision-making. In a practical implementation of a sensor network, a manager node typically coordinates collaborative processing of sensor measurements collected in a specific region. Given the limited communication capability of sensor nodes, the key goal in developing collaborative signal processing (CSP) algorithms is to transmit the least amount of data from the sensing nodes to the manager nodes.

In this project, classification of multiple targets in a sensor field using CSP algorithms has been considered. This is an extension of the earlier work in [1] where single target classification algorithms have been discussed. Applications such as multi-target classification will be most desired in battlefield surveillance where tracking and classifying enemy vehicles becomes extremely important. The algorithms suggested are based on the principle of maximum likelihood (ML) detection. There are two broad types of classifiers considered, namely, the centralized classifier and the distributed classifier both of which are explained later.

The optimal centralized classifier and the optimal distributed classifier have an exponential complexity with respect to the number of targets which makes them less useful when the number of targets is very high. Therefore, two other sub-optimal classifiers have been introduced and these classifiers exhibit linear complexity with increasing number of targets. The main focus of this project is to compare the performance of the sub-optimal classifiers in relation to the optimal classifiers.

This report has been organized as follows. Chapter 2 formulates the problem of multi-target classification and explains the signal model employed. In Chapter 3, the various approaches to address this problem and mathematical analysis have been discussed. Chapter 4 talks about the simulations performed and discusses the results. Chapter 5 provides the final comments and possible future work.

Chapter 2

Problem Description and Signal Model

Consider a network query regarding the classification of multiple targets present in a region of interest. We assume that the maximum number of distinct targets (M) is known a priori. However, the actual number of distinct targets present in a given event is unknown. Thus, the multi-target classification problem corresponds to a N-ary hypothesis testing problem withhypothesis corresponding to the various possibilities for the presence or absence of each target. For example, when M = 2, then there are four hypotheses possible:

H0: No target present

H1: Target 1 alone is present

H2: Target 2 alone is present

H3: Both target 1 and target 2 is present

The objective is to classify an event as one of these hypotheses.

2.1 Signal Model

The algorithms proposed in [2] are based on modeling each target as a point source whose temporal signal characteristics can be modeled as a zero-mean Gaussian process. Each target generates a Gaussian space-time signal field whose statistical characteristics have a profound impact on classifier performance. In particular, the region of interest containing the targets can be divided into spatial coherence regions (SCR’s) over which the spatial signal field remains strongly correlated. The size of the SCR’s is inversely proportional to the target signal bandwidth. A very important property of the SCR’s is that the spatial signal in distinct SCR’s is approximately uncorrelated (independent in the Gaussian case). Thus, the number of SCR’s in the query region of interest determines the number of independent spatial measurements that can be collected at any given time.

There are two main sources of error in distributed decision making, sensor measurement noise and the inherent statistical variability in the signal. Since all nodes within each SCR sense a highly correlated target signal, the node measurements in each SCR can be aggregated to improve the effective measurement SNR. The independent node measurements from distinct SCR’s can be combined to reduce the impact of inherent variability in the target signal. Furthermore, since the node measurements in distinct SCR’s are approximately independent, local hard decisions can be first formed in each SCR and then the lower-dimensional decisions can be communicated to the manager node to make the final decision.

2.2 Problem Formulation

As mentioned earlier, M distinct targets give rise topossible hypotheses that can be denoted by . The probability of m-th target being present is assumed to be and is independent of other targets. Let denote the presence (=1) or absence (= 0) of the m-th target in the j-th hypothesis. Using these, the prior probabilities of the different hypotheses can be found to be:

(1)

where is the binary representation of the integer j. In this case, H0 corresponds to no target being present whereas HN-1 corresponds to all targets being present. Table 1 shows this representation for M = 2.

Hj / b2 / b1 /
H0 / 0 / 0 /
H1 / 0 / 1 /
H2 / 1 / 0 /
H3 / 1 / 1 /

Table 1: Hypothesis space for M = 2 targets

The final decision about the correct hypothesis is made at a manager node based oni.i.d. effective feature vectorscollected indistinct SCR’s. Each feature is of dimension. The signal component of the feature vector corresponding to m-th target is modeled as a zero-mean complex Gaussian vector with covariance matrix . The energy in each target is assumed to be the same i.e., tr() = for all m. It follows that the signal corresponding to each Hj is also Gaussian whose covariance matrix is the sum of the covariance matrices of the targets present. This is true because the sensor receives the sum of the signal from all targets corresponding to Hj. We know that the sum of Gaussian vectors is also Gaussian with a covariance matrix equal to sum of the individual covariance matrices. Consequently, the multi-target classification problem can be stated as the following N-ary hypothesis testing problem

(2)

Under Hj, the probability density function of the feature vector at the k-th SCR is

(3)

and the G feature vectors are i.i.d.

Chapter 3

Classification Strategies

In this project, two broad types of classification strategies have been investigated. They are the centralized and decentralized classifiers. The generic architecture for the multi-target classification algorithms is shown in Figure 1.

Figure 1. Basic Architecture for Multi-target Classification Algorithms

Centralized Fusion:In centralized fusion, the final decision is made using all the independent feature vectors obtained from the G SCR’s. The G independent feature vectors are combined in an optimal fashion to decide which is the true hypothesis. The exact equations that are involved are shown in the next section. Since this is the optimal way of making decisions, its performance serves as a benchmark for other classification algorithms. It should be noted that in this case, either all the feature vectors or their sufficient statistics have to be transmitted to the final manager node which puts a heavy communication burden on the sensor network.

Decision Fusion: In decision fusion, the manager node in each SCR makes a decision based on the feature vector available at that SCR. For example, SCRk uses the feature vector alone to make a decision which is just a scalar. Then the final manager node combines the individual decisions from G SCR’s to arrive at a final decision. This method is also known as hard decision fusion [3]. Since in this case only scalars are transmitted to the final node, the communication burden in this case is very less. Figure 1 shows two cases of decision fusion, first case where the decisions are transmitted over a noiseless channel and the other case where noisy communication channel is present.

In this project, the centralized classifier acting as a benchmark is considered. Then the optimal decision fusion classifier is considered. We will find that in both these cases the complexity grows exponentially with the number of targets. Therefore, two suboptimal decision fusion classifiers having linear complexity is tested. The performance of all the decision fusion classifiers over noisy communication channels is also investigated.

3.1 Optimal Centralized Classifier

The optimal centralized classifier decides the right hypothesis according to

(4)

due to the independence of measurements . Using logarithms, the above problem can be written as

(5)

It should be noted that the implementation of the optimal centralized classifier requires that the k-th SCR communicates the local log-likelihood for the N hypotheses, , to the manager node. The manager node then computes in (5) for j = 0,…,N-1 and makes the decision accordingly. Thus when the number of targets increase, the likelihoods to be calculated both at the individual SCR’s and the final manager node increases exponentially. In addition for centralized fusion, these likelihoods have to be transmitted over the communication channel.

3.1.1 Performance of Optimal Centralized Classifier

The average probability of error for the centralized classifier is given by

(6)

where is the conditional error probability under Hm and is explicitly written as a function of G. Direct computation of this error probability is difficult and hence it is shown in [2] that this can be bounded using Chernoff bounding technique described in [4] as

(7)

The above equations indicate that for a given measurement SNR, the probability of error decays exponentially with G. Therefore, when the number of SCR’s is higher, we obtain an exponentially decreasing probability of error.

The Kullback-Leibler distance between two pdfs andis a measure of the difference between them. If the KL distance is larger, it is easier to classify between the two pdfs than when the value is smaller. It is also shown in [2] that when all the pairwise KL distances between the individual pdfs of the hypotheses are strictly positive, then the probability of error goes to zero in the limit of large G. Thus, asymptotically we can achieve perfect classification if the pairwise KL distances are all positive. The effect of poor SNR will reduce the values of the KL distances.

3.2 Local Classifiers

The local classifiers can be employed to reduce the communication burden that is evident in the centralized classification procedure. Under this strategy, local decisions on the hypothesis in each SCR are computed based on the local measurement . The local decisions, from all SCR’s are communicated to the manager node, which makes the final optimal decision. This is optimal, given the statistics of the decisions and the nature of the communication link. The local classifier in the k-th SCR makes a decision on the hypothesis using only its local measurement according to a given decision rule:

(8)

Since all {} are i.i.d., so are {}. The decision statistics are characterized by the probability mass function (pmf), , of the decision U under different hypotheses

(9)

Note that since {} are i.i.d., the pmfs are identical for all k. In the following material, three local classifiers are discussed. The first one is the optimal classifier which has exponential complexity in the number of target M. The other two are sub-optimal classifiers, based on a re-partitioning of the hypothesis space, which have linear complexity in M.

3.2.1Optimal Local Classifier

The optimal local classifier in the k-th SCR makes the decision as

(10)

The optimal classifier is illustrated in Figure 2(a). The pmfs of U under different hypothesis are characterized by the following probabilities

(11)

From this we can find that the individual decision is made after calculating the N values shown in (10) and then choosing the argument of the minimum value. Thus in this case of the optimal local classifier, as M increases, the number of values to be calculated increases exponentially.

(a) Optimal Local Classifier

(b) Structure of Sub-optimal Local Classifiers

Figure 2. Local Classifier Structures

3.2.2 Sub-optimal Local Classifiers

To circumvent the exponential complexity of the optimal classifier, the sub-optimal classifiers conduct M tests, one for each target to determine the presence or absence of the target. Each test partitions the set of hypotheses into two sets, and , where contains those hypotheses in which the m-th target is present, and contains those hypotheses in which it is absent. Let

(12)

Recall that is the binary representation of the integer j. Then, the two hypotheses for the m-th test are

(13)

In the example in Table 1 for M = 2,

Under and , is distributed as a weighted sum of Gaussians (mixture Gaussian (MG)),

(14)

The two sub-optimal classifiers are based on this re-partitioning of this hypothesis space. The first one is the mixture Gaussian classifier (MGC) which is the optimal classifier for the re-partitioned space, and the second one a single Gaussian classifier (SGC) that approximates the mixture densities with a Gaussian density. Essentially, under Hj, the m-th test estimates the value of . Let {0,1} denote the value of the m-th test. The local decision {0,…,N-1} in the k-th SCR is equal to the integer representation of the binary decisions as illustrated in Figure 2 (b). In both MGC and SGC, the pmfs of the i.i.d. local decisions at each SCR are characterized by the following probabilities

(15)

3.2.2.1 Mixture Gaussian Classifier

This is the optimal classifier for the re-partitioning of the hypothesis space. In this classifier, for m =1,…,M, () denotes the value of the m-th binary hypothesis test between and in the k-th SCR, i.e, = 1 if

(16)

or = 0 otherwise.

For simple case of M =2, = 1 if

or = 0 otherwise.

A similar expression can be written for m = 2. It can be observed that the above test is a weighted sum of two tests. The first expression in square brackets is the Bayes test for detecting target 1 given that target 2 is absent, which is weighted by the probability of absence of target 2. Similarly, the second expression in square brackets is the Bayes test for detecting target 1 given that target 2 is present, which is weighted by the probability of the presence of target 2. This clearly reveals that each test optimally detects the target in question by integrating out the other targets.

3.2.2.2 Single Gaussian Classifier

The SGC is obtained by approximating the distributions in (14) by single Gaussians. This is achieved by preserving the first two moments of the distributions. Thus, we can approximate as

(17)

For m = 1,…,M, the value of the m-th test in the k-th SCR is equal to one if

(18)

Note that in both the MGC and SGC, only M binary hypothesis tests are performed and hence the complexity only grows linearly with M unlike the optimal classifiers.

3.3 Fusion of Local Decisions at the Manager Node

In the above discussion, three different local classifiers for generating the local hard decisions have been discussed. In this section, the communication of these local decisions to the manager node is discussed. Both noiseless communication channels and channels corrupted by additive white Gaussian nosie (AWGN) are considered. The performance of the final classifier at the manager node can be characterized in a unified fashion for all the three local classifiers. In the case of ideal communication links, the performance of the final classifier is governed by the pmfs of the local decisions which are different for the three local classifiers. In the case of noisy communication links, the performance is governed by the noisy pdf’s under different hypotheses induced by the pmf’s of local decisions in AWGN channel. Under ideal communication channel, the final decision could be just a majority voting of all the decisions obtained from the SCRs but that would not be the optimal way of combining decisions [5].

3.3.1 Decision Fusion with Ideal Communication Links

With ideal communication links, the final classifier at the manager node is given by

(19)

Note that the above expressions apply to all three local classifiers; the only difference is that the different local classifiers induce different pmfs.

It is also shown in [2] that the probability of error goes to zero as the number of independent measurements G goes to infinity if the pairwise KL distances between the decision pmfs are strictly positive. In spite of making decisions locally and then combining only the individual decisions, the classifier is still able to drive the error probability to zero. Although a higher value of G would be required to reduce the error probability below a certain threshold as compared to the centralized classifier, these local classifiers reduce the communication burden by a huge amount which makes them an attractive choice in practice.