Sensor Networks for Signal Detection
A Node Model with Algorithms for Centralized and Decentralized Processing
Kevin Duh
Michelle Lloyd
ELEC 531
December 6, 2002
Table of Contents
I. Statistical Model of Sensor Network Nodes
Motivation2
Source Model2
Hypothesis Testing3
Matlab Simulations3
II. Centralized Processing Scheme
Motivation6
Find Source Center7
Find Source Extent8
III. Decentralized Processing Scheme
Motivation9
Phase I: Compute Across9
Phase II: Compute Up10
Analysis10
IV. Comparative Performance Evaluation10
V. Conclusion12
Appendix A. Code for generating signal model13
Appendix B. Code for centralized processing scheme15
Appendix C. Code for decentralized processing scheme16
Appendix D. Code for generating test data18
I. Statistical Model of Sensor Network Nodes
Motivation
The first step in a statistical processing problem is usually the formulation of a model that describes the measured signals. This model should be complex enough to approximate the actual underlying physical processes, but simple enough to avoid bogging us down with details as we develop processing algorithms later. Therefore, we tried to think as comprehensively as possible in formulating the statistical model of our sensor network measurements. At the same time, we are keeping in mind that the model may be changed as we discover more about our problem in the next few weeks.
In the following, we’ll first talk about the model for our source. Then, we’ll talk about the hypothesis-testing in this context and the resulting model in our problem.
Source Model
Our goal is to model the statistics of measurements at the sensor nodes. Let’s start by thinking about the problem at hand. We have M sensors evenly distributed in an NxN square area, where N > √M. This allows us to change the spacing between sensors to determine how close the sensors must be placed to achieve an accurate estimate of source location and cloud extent. An emitting source is situated at an unknown location in this square. Our sensors will sense the intensity of the emissions at different spatial locations. Our ultimate project goal is to locate and track this source and its extent. Our goal for now is to formulate a model of the distribution of the set of intensity measurements at all sensors, given this unknown source location.
First, let us assume that the source is uniformly located in the NxN square. A non-informative prior such as this 2D uniform distribution is reasonable because we have no idea where the source is located. If we have already located the source and are now tracking it, we may want to use a prior distribution that reflects this knowledge (ie. Gaussian). But for now, we’ll just use a 2D uniform distribution.
The source emits at some rate and initial amplitude. These values may be random or deterministic, and are either known beforehand from the application at hand, or must be estimated with collected data. For now, we assume a known initial amplitude, A, but unknown and deterministic emission spread, l(t). Furthermore, we assume that the emissions decay exponentially across distance. Thus, the intensity of emission at location (x,y) in the square, I(x,y) can be written as:
Basically, the intensity of emission at each location is a function of the distance to the actual source location and the rate and amplitude of the source. If A is large, the intensity nearat the source as well as everything surrounding will be large; if l(t) is large, the spread of the emission is great and therefore the emission cloud will cover a large area. Moreover, as time increases, the spread increases. This is characteristic of the fact that the cloud of emissions will diffuse over a large area over time, and can be modeled simply as l(t) = r*t, where r is the rate of diffusion and t is the time since initial emission. Finally, the further the distance from the source, the lower the intensity. These are properties that we assume from the source and it strongly affects our statistical model.
Hypothesis Testing
The location of the source can be formulated as a multiple hypothesis testing problem. The hypotheses show where the source is located. For example, we can have the following hypotheses in a NxN square:
H(0,0): Source is located at coordinates (0,0)
H(0,1): Source is located at coordinates (0,1)
H(1,2): Source is located at coordinates (1,2)
…..
H(N,N): Source is located at coordinates (N,N)
For each hypothesis H(i,j), there will be an emission cloud centered at (i,j). This emission cloud will increase in extent as time grows; the intensities of points within the cloud are determined by the decaying exponential decaying equation specified above. Therefore, we can image, for model at each time instant, a cloud centered at (i,j) with some spread l(t). At then next instant, the spread will be bigger. If we assume a non-moving source, the center will stay the sameremain constant.
With this in mind, we can produce a set of statistics at the measurement nodes. There are only a finite number of sensor nodes in this continuous space of emission diffusion. Thus, these nodes basically “sample” the intensities of the diffusion at their own sensor locations. If we look at the entire set of sensor measurements, we will expect to see a set of discrete values that are somewhat spatially correlated. If we interpolate these values over various sensors, then we should be able to see the continuous emission cloud.
Therefore, the problem we have now is this: given these discrete measurements at the sensor nodes, how do we reconstruct the form of the emission cloud? Then, after we reconstruct this cloud, how do we use hypothesis testing to pinpoint the source location and the cloud spread? This will be the topic of the next project milestone.
Matlab Simulations
We created several different realizations of our statistical model and plotted the models as contour maps in Matlab (Fig. 1-5). The code that generated figures 1-5 can be found in Appendix A.
Fig. 1. Contour map showing the cloud spread across the sensor grid (in continuous space) at a diffusion rate of 5 without noise.
Fig. 2. Contour map of the cloud at a diffusion rate of 1 without noise.
Fig. 3. Mesh plot of the cloud at a diffusion rate of 1 without noise.
Fig. 1 and Fig. 2 show how changing the diffusion rate changes how quickly the cloud spreads across the sensor network. This confirms that our model for diffusion rate is behaving properly. The mesh plot shown in Fig. 3 helps to give a 3-dimensional view of our cloud model.
Fig. 4. Contour map of the cloud at a diffusion rate of 1 with a noise level of 0.5
Fig. 5. Mesh plot of the cloud at a diffusion rate of 1 with a noise level of 0.5.
Fig. 4 and Fig. 5 illustrate the noisy observations that the sensors make. In the contour map, the circles become jagged lines, and in the mesh plot, the smooth slope and flat ground become jagged edges. These plots show the results of adding Gaussian white noise to our signal model.
II. Centralized Processing Scheme
Motivation
Our centralized processing scheme was motivated by the concept of sufficient statistics. Instead of trying to find the center and extent by analyzing every piece of data, we add up the columns of the sensor network to create a row vector and the rows of the network to create a column vector. These two vectors are the sufficient statistics for our matrix. The result of our addition looks something like the sketch shown in Figure 6.
Figure 6. Contour map (in center) with representations of vectors (sufficient statistics) to either side.
We are able to create sufficient statistics like this only through a centralized processing scheme because it requires that all of the data is available at one processing center. This is feasible for sensors connected to a common location by wires but is probably not feasible for a wireless network.
Find Source Center
Once we have generated the sufficient statistics for our sensor network, we approach the problem of finding the source center by using the center of gravity equation. This allows us to interpolate between sensors in order to determine the exact location of the source center. To find the “center of gravity” of the sufficient statistic vectors, we multiply the height of the sufficient statistic (the value of the vector) by the sensor location (vector index) and then normalize by the sum of all the heights. Written as a formula, the center of gravity can be expressed by
By applying this formula to both the column and row statistic, we are able to arrive at a good estimate of both the x and y coordinate of the center.
In using this formula, we found that is does not work for asymmetric data. We had to assume that the source would always be in the middle region of the sensor network so our data would be symmetric. This is a reasonable assumption because a sensor network should be set up such that the expected sources are within the network and not on the fringe. Otherwise, good estimates cannot be made because not enough data can be collected about the source.
Find Source Extent
To find the extent of the source, we first threshold the data and then average. We find all values of each sufficient statistic that are above the noise level and keep those values. We find the extent for the column vector by subtracting the smallest value above threshold from the estimated center. We do the same for the row vector. Then, we average the extent values from each sufficient statistic to estimate the extent of the source. Our radially symmetric estimation for extent, as seen in Fig. 7, just touches the edge of where the noise starts.
Fig. 7. Contour map showing estimated center and extent on noisy data at one time-step.
This type of extent estimation is very dependent on the radially symmetric property of the source. While this seems to be a reasonable assumption, it does not account for wind or non-stationary diffusion rates throughout the cloud. Further development of this algorithm, however, could provide extensions to cover these more difficult cases.
III. Decentralized Processing Scheme
Motivation
The basic goal of a decentralized processing scheme is to save transmissions. We want to transmit as little data is possible, and yet our sensor network still has to come up with accurate estimates for both the source center and extent. Sensors in our network can only communicate to their four nearest neighbors: those sensors to the right left, up, and down. The basic paths of transmission can be seen in Fig. 8. The dotted arrows represent the transmissions across the network in what we call Phase I of processing. The small solid arrows represent transmissions that combine the information from each row. The final decision is then passed to the top, right hand corner of the network, as shown by the large, solid arrows.
Fig. 8. Path of information flow in distributed sensor network processing.
We have created a time-synced algorithm in which sensors that are below noise intensity do not transmit data so that we can save transmissions. Because our source is located in the middle region of our sensor network, the sensors on the edges and corners of the network that only contain noise will not transmit. Instead, only the sensors with useful data will be activated.
When the center and extent need to be found, the data at that moment is stored at each sensor and used for computations. The overall time needed to pass data through the sensor network will directly depend on the size of the network.
Phase 1: Compute across
The three pieces of data that need to be computed and transmitted as the sensors transmit from both sides of the network to the middle are the position of the first sensor to transmit, the current maximum height (sensor value), and the corresponding position of the sensor with maximum height. At each sensor, a check will be completed to see if the current sensor is the first one to transmit. This data will be used to calculate extent. A computation will be made to determine if the current sensor’s value is greater than the previous maximum. If so, the sensor will transmit its value and position; otherwise, it will transmit the previous maximum data. Each of these transmissions incurs a cost of 3 32-bit transmissions.
Care must be taken to get the data to the center of the network because the center column will depend on whether the number of sensors in each row is even or odd. If the number of sensors is even, the final left-hand side sensors can simply pass their data to the right by one; then, data will be consolidated in that column. If the number of sensors is odd, both the right and left sides transmit to the center column of sensors. That column is then responsible for making three comparisons instead of just two.
Phase 2: Compute up
The maximum height and corresponding sensor position for the overall grid are transmitted up the center column of sensors, as well as the difference between the first sensors to transmit from the right and the left. By using the sensors in the middle to compare data from the right and left side, we are able to arrive at an overall estimate for source center. By subtracting the absolute position of the first sensor to transmit on the left from the first sensor to transmit on the right, we get an estimate of the diameter of the source. Both the center and extent estimates are only accurate up to the position of a sensor.
Once we have reached the top of the sensor grid, all of data in the network has been compared. We then pass our estimates to the far right sensor where they can be accessed.
Analysis
When finding the source center, the decentralized processing scheme does not give the sensor closest to the source center but rather a sensor that is within the diagonal distance between two sensors. The major problem with this is that the direction of the true source center from the estimate is not known. A closer search would have to be done in the exact region of the sensor network to perform some sort of interpolation or additional search. Because the actual extent of the cloud is not known, it is difficult to determine how well the distributed scheme performs.
The major benefit of distributed processing is the reduction in cost. In our scheme, we have achieved this by only telling a sensor to transmit if it has valuable data. Otherwise, it saves power by not transmitting.
IV. Comparative Performance Evaluation
We compared the performance of our centralized and distributed processing scheme based on a noise intensity of 0.5 and a network of 64 sensors. The cost listed in Table 1 is the number of 32-bit floating point numbers that must be transmitted to the upper right of the sensor grid. In the centralized processing scheme, it is assumed that all sensors transmit their value and their position, or 2 floating point numbers, to the upper, right sensor. Processing then takes place in this sensor. The results are shown in Table 1 below.
Table 1. Results of centralized and distributed processing schemes on five iterations of data.
Model / Centralized / DistributedCenter / 6.4185, 9.7050 / 6.4193, 9.705 / 6, 10
Extent / 6.7050 / 6
Cost / 896 / 72
Center MSE / 0 / 0.087
Center / 10.8688, 5.1375 / 10.8667, 5.1509 / 10, 6
Extent / 4.1375 / 6
Cost / 896 / 75
Center MSE / 0 / 0.7439
Center / 10.7514, 9.5144 / 10.7513, 9.5144 / 10, 10
Extent / 6.5144 / 6
Cost / 896 / 78
Center MSE / 0 / 0.2358
Center / 7.3938, 7.0281 / 7.3939, 7.0281 / 8, 8
Extent / 7.0293 / 6
Cost / 896 / 75
Center MSE / 0 / 0.9446
Center / 10.0898, 7.1650 / 10.0878, 7.1652 / 10, 8
Extent / 5.1650 / 4
Cost / 896 / 74
Center MSE / 0 / 0.6972
Average Cost / 896 / 74.8
Average MSE / 0 / 0.5417
The centralized processing scheme is very good at interpolating between sensor data points to determine the exact source center. The distributed scheme, on the other hand, is unable to perform this interpolation. The sensor that it identifies as the source center may not actually be the closest sensor to the source center; however, the true source is not further away than the diagonal distance between sensors.
If power constraints are any sort of issue in the sensor network, then it is immediately clear that distributed processing is definitely the way to process data. By making sensors not transmit if they are below the noise threshold in our decentralized algorithm, we gain enormous savings in cost. For 64 sensors, the cost is 11.98 times greater to have 0 mean-squared error versus 0.5417 mean-squared error for computing the source location.
As the number of sensors in the grid increases, or the grid size increases, the savings increase, as shown in Figure 9.
Figure 9. Performance data for cost as a function of sensor network size.
When there are 100 sensors in the grid, the cost of centralized processing is 1800 transmissions whereas the cost of decentralized processing is only 87. It is possible that mean-squared error also increases as the grid becomes larger; however, this has not yet been tested. It is interesting to note that the general trend for both processing schemes is increasing but the decentralized curve is not monotonic.
V. Conclusion
We have created a model from intuition that seems to represent what happens during the diffusion of a cloud throughout space and time. Our equation results in a radially-symmetric function that acts as a diffusing bump over the source center. We modeled sensor measurement noise with Gaussian white noise. In our centralized processing scheme, we use methods drawn from physics and statistics, including the center of gravity formula and least statistics. Our distributed model is based on the concept of sufficient statistics where we only transmit meaningful data. It is some sort of non-linear implementation of sufficient statistics, which results in lossy, semi-sufficient statistics. Overall, we have shown that a distributed approach to processing provides enormous gains in saving power. Distributed systems should be used unless the incurred error is unacceptable for the desired implementation.
Appendix A: Code for generating signal model
clear;
spacing = 1; % spacing between sensors