Applying Branching Processes Theory for Building a Statistical Model for SEM Signal

Applying Branching Processes Theory for Building a Statistical Model for Scanning Electron Microscope Signal

Ira Cohena,b, Rotem Golana,band Stanley Rotmanb

a Opal Technologies (A company from the group of Applied Materials), Nes-Ziona, Israel.

b Department of Electrical & Computer Engineering, Ben-Gurion University, Beer-Sheva, Israel.

Abstract: Branching stochastic processes are used to describe random systems such as nuclear chain reactions, population development and gene propagation. In this work we show that the creation of the SEM (Scanning Electron Microscope) signal can be developed as a branching stochastic process. A statistical model is described step by step, as a function of the physical parameters of the process. Using the model, wea proposeda method for determining the unknown probability distribution of the secondary electron emission is given. Using this method, a Lognormal distribution is shown to approximate well the secondary electron (SE) emission, and a Poisson distribution is shown to have poor approximation results.

1 Introduction

SEM image enhancement and de-noise reductioning algorithms use some basic assumptions on the SEM image and its the noise statistical characteristics. For example Erasmus (1982)[O1] [1] assumes that the noise is an additive independent noise with zero mean. Podsiadlo et al (1995) [2] [O2] further assumes that this noise is Gaussian. However a full statistical model for the formation of the SEM signal is not available.

The notion of modeling the statistics of electron emission in SEM by the branching process theory was partially introduced earlier by Reimer [3]. Reimer describes the statistics of the SE and the BSE (backscattered electrons) emission in terms of their mean and variance only, using the results of branching processes theory.

In this work, we use the theory of branching processes to analyze of the formation of the SEM signal. As a first step, we present an overview of the theory of branching stochastic processes. In section 3, we apply the theory to the SEM signal formation process, and, subsequently, introduce a full statistical model, including the generating function and the moments of the process. In section 4, we describe examples of a typical simulated process. In section 5, we introduce a method to find an approximate distribution to the SE emission. We use the model to synthesize a SEM signal and statistically compare it to a real SEM signal. The synthesized signal is based on the detection of secondary electrons. Since the probability distribution of the SE emission is unknown, we can check several assumptions of the SE emission distribution by using statistical comparisons between the synthesized signal to the real signal. We compare two possible distributions, Poisson and Lognormal, and find that the Lognormal distribution is the betterbest fit of the two to the SE emission.

2 Branching Processes- Theoretical Review

A branching process, in general, is a process where an initial random number of objects ‘create’ more objects of the same or different type, and these objects continue to ‘create’ other objects, with the system developing in accordance with some probability law. An example of such a process is a nuclear chain reaction, where an initial number of neutrons hit nucleia which splits and some neutrons are emitted with a certain probability of creating other neutrons; the process continues statistically.

Another example is the imaging process of the SEM. An initial number of primary electrons hit the specimen, causing an emission of other electrons, which are partially detected by the system’s detectors, and cause an emission of other particles, electrons or photons, depending on the type of detector used in the particular system.

Figure 1 is a graphic illustration of a general multilevel branching process.

Figure 1: Graphic illustration of a multilevel branching process

First, we describe a two level branching process. Expanding this to a multilevel branching process will be straightforward.

Let N be a random variable, with a probability distribution function gN(n)=P(N=n), with mean and variance .

Let be a series of independent identically distributed (i.i.d) random variables, with a common distribution fX(X) and with as the mean and as the variance of each element in the series.

The sum of N elements of the series {Xi} is denoted by the following sum:

(1)

The mean of V is denoted by:

(2)

And the variance of V is denoted by

(3)

Proof for equations 2 and 3 can be found in [4][O3].

The distribution density function fV(v) of V can be derived from the basic formula for conditional probabilities [O4]:

(4)

Let us denote fX(x) andx() as the distribution density function and generation function of Xi respectively, and gN(n)=P(N=n) andN() as the distribution density function and generation function of N respectively. For a fixed n, the distribution of the sum X1+...+Xn is expressed by the n-fold convolution of {fX(x)}with itself, (due to the independence of the series {Xi}). Therefore equation (4) can be written in a more compact form:

(5)

(where { }n* symbolizes the n-fold convolution).

This formula can be simplified by using the generating functions. Since the n-fold convolution becomes multiplication in this form, we derive from (5) that the generation function V() of V is:

(6)

The right side of (6) is the Taylor expansion of N() with  replaced by X()[5]. This proves that the generating function of the sum V is the following compound function:

(7)

For multilevel branching processes, the expansion is simply a recursive use of (2), (3), (5) and (7) with a change of parameters in accordance with the probability law of the previous and new objects in the process.

It should be noted that all of the moments of the process, and not only the first and second moments, can be derived from the generating function described in (7).

3 The Statistical Model of the SEM signal

The creation of the SEM signal can be divided into three stages . The first stage is the electron beam itself. The electrons in the beam are called primary electrons. The interaction of a primary electron beam with a specimen creates a primary excitation within the specimen in which electrons are scattered. This scattering may be divided into two types: nearly elastic and inelastic.

In nearly elastic interactions, the electrons involved retain virtually all of their energy. The resulting high energy electrons are termed backscattered electrons if they are emitted back from the specimen surface.
In inelastic interactions, the electrons involved lose much of their energy and hence are of lowenergy. Those electrons of less than 50 electron volts may be termed secondary electrons. Secondary electrons are created throughout the primary excitation. Due to their low energy, most of them are absorbed by adjacent atoms in the specimen. As a result, only those secondary electrons that were created near the surface of the specimen are able to escape carrying surface topography information. In contrast to secondary electrons however, backscattered electrons can escape from greater depths within the specimen because of their higher energy.

The emitted electrons are detected by the detectors, with a certain detection efficiency.

3.1 The Beam Distribution.

It can be assumed that the number of electrons in the beam follows a Poisson distribution: the time  for one pixel can be divided into a large number n of time intervals, so that the probability x of observing one electron in one of these time intervals is much less than unity and the probability of observing more then one electron per time interval is negligible. We then expect that the mean value y of the number of electrons in the time interval will be the mean of a Poisson distributed random variable, i.e. y=nx.

Using physical parameters, the mean number of electrons per pixel in the primary beam can be expressed as : ,. where Ip is the beam current,  is the pixel time and e is the electron charge.

3.2 The Secondary Electron Emission

The emission distribution of the secondary electrons is unknown. We will describe their statistics by the first and second statistical moments and the distribution function of the resulting process.

For each pixel in the SEM image (or for each ‘pixel area’ in the specimen), an unknown random number of primary electrons (PE) hit the specimen.
Let us define as the initial number of PEs in the beam; .Each one of these PEs, causes emission of a random number Xi of secondary electrons emitted from the specimen; , where is the mean of the SE emission, and is the relative variance ( ), and Xi is distributed according to some unknown distribution.

If the total number of SE emitted from the specimen is denoted as V1, then,using equations (2) and (3), the mean of V1 is given by:

(8)

and the variance of V1is given by

(9)

The general form of the generating function of V1 is given by equation (7).

In our case, N follows a Poisson distribution with mean and a generating function:

(10)

Therefore the generation function of V is given by:

(11)

The distribution with this generating function is called the Compound Poisson Distribution.

If the distribution density function of the SE were known, a complete

statistical model of the SE emission could be described, using (11).

3.3 Back Scattered Electron emission

In the case of backscattered electrons, each PE from the beam causes the excitation of one or zero backscattered electrons, with a probability of success pb and probability of failure (1-pb). This means that each backscattered electron is a Bernoulli variable with a mean value of pb, variance pb(1-pb) and a generating function:

(12)

Inserting the previous(12) into equation (11) gives:

(13)

Which(13) is the generation function of the signal resulting from the BSE emission. The form of is that of a Poisson distributed random variable which is described entirely by its mean:

(14)

This result is general for any cascade of a Poisson process with Bernoulli trials.

3.4 Detection Efficiency.

In this model we assume that each electron which is emitted from the specimen has a probability of being detected. Therefore the detection efficiency can be described by a Bernoulli model.

As in the case of the backscattered electrons, the generation function of this stage is linear and is given by:

(15)

with mean pd and variance pd(1-pd).

It is reasonable to assume that the detection efficiency pdis different for BSE and SE electrons. If we denote pd1as the detection efficiency for the SE electrons and pd2 as the detection efficiency for the BSE. then, using equations (2), (3), (7) and the results in equations (11), (13) and (15), the generating functions and the means and variances corresponding to the signal resulting from SE emission and BSE emission can be expressed as:

(16)

with Z1 being the number of SE which enter the detector, with mean and variance:

(17)

(18)

For the BSE emission:

(19)

with Z2 being the number of BSE that enter the detector. Z2 follows a Poisson distribution with a mean value:

(20)

The total number of electrons that enter the detectors can be written as:

Z=Z1+Z2(21)

Assuming that BSE emission and SE emission are statistically independent, then the probability distribution function of V , its generating function , its mean, and variance are given by:

(22)

3.5 The detection model.

There are two possible approaches which can be used in order to describe the detection model.

The first is through the detector’s gain probability distribution (known also as the pulse height distribution, e.g. PHD). The PHD is the distribution of the output of the detector following excitation of a single electron. The probability distribution function of the detector’s gain can be measured and is usually given by the manufacturer. An example of the gain distribution function of an Microchannel plates (MCP) detector is shown in figure 2.

Figure 2: A typicalPpulse height distribution of the Opal 7830Si MCP detectors

For a known detector’s gain probability distribution function, the signal at the output of the detector is also a result of a branching process. The input process being the electrons that enter the detector and the output is the signal at the output of the detectors (normally current or voltage).

The second approach is to describe the detection model as a function of the physical processes which occur inside of it. The emitted electrons enter the detector and cause excitation of some type of particles in the detector. These particles follow somea known distribution, depending on the type of detectors that are being used. The signal at the output of the detector is the sum of these particles. For example, in the case of MCP detectors, the entering electrons cause electron emission in the detectors, and in the Everhart-Thornley detectors, photons are excited when the electrons hit the detectors. The disadvantage of this approach is that a full and exact knowledge of the processes in the detector is needed, where as in the first approach this knowledge is not necessary.

We will use the first approach to describe the detection model.

Let us denote fD(d) and D() as the distribution density function and generating function of the detectors, and its mean and variance as: . Using all of the results of the model up to the detectors, and equations (2), (3) and (7) we derive the following expressions for the generating function, mean, and variance of the signal at the output of the detectors:

(23)

In systems where the SE emission and/or detection is much greater thaen that of the BSE or vice-versa these expressions are reduced to a simpler forms since either Z1 or Z2 is negligible compared to the other.

4 Simulation Examples

In order to demonstrate the model, we performed simulations of a system using the following parameters:

a.The mean number of electrons in the beam is 5. (i.e. )

b. Only SE are participating in the process.

c. The detection efficiency is 33%

d. The detector gain distribution is normal, with SNR~3 (5dB) ()

Each simulation included 10000 trials.

We performed the simulations for two possible distributions of the SE emission. The first was the Poisson distribution and the second the Lognormal distribution.

Figure 3 show the signal probability densitynormalized histograms of the simulation with the Poisson distribution assumption for the SE. The first bar shows the probability of no-detection, i.e. the probability that an electron will not contribute to the output signal (around 0.25).

Figure 4 show the signal probability densitynormalized histograms of the simulation with the Lognormal assumption for the SE. In the figure, the first bar shows the probability of no-detection (about 0.35).

Figure 3: Signal Probability Density Normalized histogram of results based on the SE Poisson assumption

Figure 4: Signal probability densityNormalized histogram of results based on the SE Log-Normal assumption

5 Approximation of the SE distribution

In the statistical model presented above, the probability distribution of the SE emission is unknown.

By using simulations based on the model and by measuring real signals, the distribution of the SE can be approximated by trying to statistically fit a simulated signal to a real signal. This test gives reliable results when a large number of observations are available.

In our simulations, we tried to determine the likelihood of two possible distributions for the SE emission: Poisson distribution and Lognormal distribution.

Using the model, we simulated SEM signals. The detectors simulated were MCP detectors with a known gain distribution function (see figure 2). The mean and variance of the simulated signal were taken to be the same as the average and variance of the true signal. The real signal was obtainedtaken from a CD-SEM tool (Opal 7830Si). The specimen was a homogeneous flat metal surface (to avoid charging effects) which was scanned once at a fast TV rate. This procedure resulted in a large amount of samples (more then 10000) which can be assumed to be independent identically distributed (i.i.d) samples.

The statistical fit of the real SEM signal to the synthetic signal was done using the method known as the Quantile-Quantile plot test. By displaying the plot of the quantiles of two sequences of data, it can be determined whether they come from the same distribution if the plot is linear. The idea of the Q-Q plot is to look at pairs of quantiles from the two population (the real and synthetic) with the same associated cumulative probability. If the data of the synthetic signal arises from the same distribution asof the real signal, then the quantiles will be approximately linearly related, and therefore the entire plot will be linear.

Figures 5a, 5b and 5c show the histogram of the real signal, the synthesized signal using Lognormal SE emission and the synthesized signal using Poisson SE emission, respectively.

Figures 6a and 6b show the results of the Quantile-Quantile plot test of the synthesized signal using Lognormal and Poisson SE emission, respectively. From figure 6a it is clear that the synthesized signal has a very good statistical fit to the real signal. The number of samples which deviate from the linear line is very small (10 out of 10000), i.e. 99.9% of the data show a good fit to the real signal. This implies that the SE emission does follow a Lognormal distribution.

In contrast to the result in figure 6a, the fit of the real signal to the synthesized signal using a Poisson SE emission (Fig. 6b) is very poor. Approximately 18% of the quantiles of the synthesized signal deviate from the linear line. The two signals do not originatecome from the same distribution.