Probability Distribution of Discrete Random Variables

LECTURE#5

09/13/04

Probability Distribution of Discrete Random Variables:

Discrete R.V.? - R.V that takes only discrete values, say 1,2,3,….or win , lose etc.

Examples of Discrete Probability Distribution:

(1)Binomial Distribution

Many investigations involving probability distributions are based on attribute measures that result from the tallies or counts of one of two outcomes of a categorical variable having only two classifications, arbitrarily called ‘success’ or ‘failure’. When classifying each event studied as either a success or a failure, it is not important which outcome is classified as a success and which is classified as a failure. For example, say in the context of quality control, an item that has failed inspection could be classified as success, since the goal by the quality department may be to study items that fail. In such circumstances, we can apply the binomial probability distribution to analyze the number of successes (or failures).

The R.V. (discrete) X representing the Binomial Distribution is the number of successes in a sample of n trials. Thus the R.V. X takes any value from 0 though n. The Binomial distribution has two parameters n, the number of trials, and p, the probability of obtaining success. It is represented as,

P(X=x| n,p)= [n!/(x!(n-x)!] px (1-p)n-x

Here, n = sample size; p = probability of success, 1-p= probability of failure, x= number of successes in the same (X=1,2,3…..n)

Remember Binomial Distribution is specific to outcomes that are mutually exclusive.

Recall the example given in LECTURE#2. What is the probability of getting 3 heads if a coin was tossed 10 times?

We can use the Binomial Distribution to find this probability:

N=10, p=0.5, 1-p=0.5 and x=3

Therefore, P(X=3|10, 0.5) = [10!/3!(10-3)!] 0.53(1-0.5)10-3

= 0.1171875 (Ans)

A few more issues of the Binomial Distribution:

It can be symmetrical or skewed. Whenever p=0.5, it is always symmetrical. The mean of Binomial distribution is np, while the standard deviation is

For the previous problem (of tossing the coin 10 times), the mean of X would be 5 and the standard deviation 1.58.

(2)Poisson Distribution

Many investigations, including those dealing with the quality of products or services, are based on counts of the number of defects per samples continuum, or area of opportunity. An area of opportunity is a continuous unit or interval of time, volume, or such area in which more than one occurrence of an event may occur. Examples are the number of pits in a square meter of metal or the number of fleas on the body of a dog. Another example is say, the number of storm events in a year (or the number of cars passing in one hour on South Willow Avenue). In such circumstances, the Poisson probability distribution provides an underlying basis for calculating those types of probabilities. The distribution is characterized by only one parameter, λ. X refers to the discrete R.V following a Poisson Distribution, and parameter, λ, refers to the expected or average number of successes per area of opportunity.

P(X=x|λ)=e-λλx/x!

The mean and standard deviation of the Poisson Distribution are given as, λ,sqrt(λ).

Example:

Historically, it has been observed that on an average, 3 hurricanes strike the coast of Florida each year. What is the probability that 5 hurricanes will strike this year?

Here, x =5, λ = 3

Therefore P(X=5|λ=3) = e-3 35/5! = 0.1008 (Ans).

Parameter Estimation (Finding out the parameters that define a distribution)

The problem of estimation is defined as follows. Assume that a population can be represented by a random variable X whose density is fX( ,), where the form of the density is known except that it contains an unknown parameter set . Furthermore, consider that x1, x2, …, xn are observed values of the random variable X. It is desired to estimate the values of the parameter set  on the basis of the observed xi values.

For example, for the Normal distribution PDF f(x;, mX, X) the unknown parameter set is the mean (mX) and the standard deviation (X)of variable X.

Parameter estimation is important because often times, we are required to infer the probability distribution from data (experimental or synthetic) that we generate.

There are basically 3 major methods of estimating parameters:

(1) Method of Moments; (2) Maximum Likelihood Method and (3) Minimum X-square method.

(1) Method of Moments

Let f(, 1, 2, 3, …, k) be a PDF of a random variable X.

Let mr (rth moment about 0) = E[Xr], which is some function of , namely mr(1,…,k)

Let x1, …, xn be a random sample from the PDF, and let Mj be the jth sample moment; that is:

Mj =

From the k equations:Mj=mj(1,…,k), j=1, …, k, one can get unique solutions for the estimates (1,…,k) of 1, …, k.

Example:

Let x1, …, xn be a random sample from a Normal distribution N (, ). Estimate the parameters of the distribution

Let (1,2) = (,). Also, recall that for Normal distribution: m1=, and m2=2+2

Therefore,and

We shall see later that for distributions other than Normal there are better estimators than the ones shown above.

(2) Maximum Likelihood

If x1, …, xn is a random sample from a density f(X;), then the likelihood function [L()=f(x1,)f(x2,) f(xn,)] is only function of . The likelihood function L() gives the likelihood that the random variables assume a particular value x1, …, xn. The likelihood is the value of a density function; for discrete random variables it is a probability.

If is the value of  which maximizes L() for the x1, …, xn random sample is the maximum-likelihood estimator of . The following steps are performed to identify the maximum-likelihood estimator of f(X,).

Derive the function from the sample: L()=f(x1;)f(xn;)

Many times L() satisfy regularity conditions; so the maximum-likelihood estimator is the solution of the equation dL()/d=0. Also L() and Log{L()} have their maxima at the same value of , and it is some times easier to find the maximum of the logarithm of the likelihood.

If L() contains k parameters (1,…,k), then the maximum-likelihood estimators of the parameters is defined by the following set of k equations:

Example:

Assume that you have a random sample from a Normal distribution with N(,). Find the maximum-likelihood estimators of  and .

The logarithm of the function is

To find the location of the minimum, we set its derivatives with respect to  and 2 to zero.



which turn out to be the sample moments corresponding to  and 2. This is not always the case!!

Minimum X-square method

Let x1, …, xn be a random sample from a density function fx(X;), and let f1, …, fk be a partition of the range of X. The probability that an observation falls in the cell fj, j=1,…,k, denoted by pj(), can be found (for a continuous random variable) as:

pj()=.

Note that .

Let the random variable Nj denote the number of xi's in the sample which fall in the cell fj, where j=1, …, k; then n = is the sample size. Form the following summation:

The minimum--square estimate of  is that which minimizes 2.

Note: the minimum--square estimator depends on the selection of the partition f1,…,fk.

Simple example:

Let x1, …, xn be a random sample from a Bernoulli distribution; that is fx(x;)=x(1-)1-x [This distribution models the probability of success in a Bernoulli trial (success/failure)]. Take Nj the number of observations equal to fj, j=0,1. So 2 would be:

… and given that N0=n-N1… =

In order that 2=0  = N1/n = N1/(N0+N1)

Often it is difficult to derive analytically the , which minimizes 2. In that case 2 is given the definition of a cost function that is minimized using numerical approaches (optimization methods).

Often also the mimimum-chi-square estimator is modified to the following minimum-chi-square estimator:

Measure of closeness of an estimator

Unbiased estimator:

Minimum variance estimator:

Mean-square error: = … = , where T=

Example:

For Normal distribution we showed that the estimator for the mean and variance are and

. Find if the estimators are unbiased.

, it becomes unbiased when n.

A Few Important Points on Bayesian Statistics (BAYES’ LAW)

Bayes's Theorem is a simple mathematical formula used for calculating conditional probabilities. It figures prominently in subjectivist or Bayesian approaches to epistemology, statistics, and inductive logic. Subjectivists, who maintain that rational belief is governed by the laws of probability, lean heavily on conditional probabilities in their theories of evidence and their models of empirical learning. Bayes's Theorem is central to these enterprises both because it simplifies the calculation of conditional probabilities and because it clarifies significant features of subjectivist position. Indeed, the Theorem's central insight — that a hypothesis is confirmed by any body data that its truth renders probable — is the cornerstone of all subjectivist methodology.

Subjectivists think of learning as a process of belief revision in which a "prior" subjective probability P is replaced by a "posterior" probability Q that incorporates newly acquired information. This process proceeds in two stages. First, some of the subject's probabilities are directly altered by experience, intuition, memory, or some other non-inferential learning process. Second, the subject "updates" the rest of her opinions to bring them into line with her newly acquired knowledge.

Many subjectivists are content to regard the initial belief changes as sui generis and independent of the believer's prior state of opinion. However, as long as the first phase of the learning process is understood to be non-inferential, subjectivism can be made compatible with an "externalist" epistemology that allows for criticism of belief changes in terms the reliability of the causal processes that generate them. It can even accommodate the thought that the direct effect of experience might depend causally on the believer's prior probability.

READ HAND-OUT “Facts versus Factions: the Use and Abuse of Subjectivity in Scientific Research. ()

A few critical points to remember about Bayes’ theorem:

(A) It is a multiplicative process (see the right hand side of the Bayes’ equation, Lecture 2 and 3). Therefore, it assigns too much weight to likelihoods over prior probabilities (unconditional) and hence the current observation. Non-Bayesian camp often cites this as a major weakness.

(B) Bayes’ Theorem is subjective (see hand-out). It depends on subjective definitions of prior probability such that two different reasoning can allow two totally different conditional probabilities.

Trivia#5

Cooperation and Defection in Nature

A wrasse is a small fish that earns living eating parasites off larger fish. The larger the fish not only refrain from eating wrasse, but they also have been known to wait at an established station for the privilege of being cleaned (remember, ‘Finding Nemo’?). Things however get complicated by another small fish called blenny that not only looks like a wrasse but has adapted to mimic one as well. The blenny will approach a larger fish as if it is going to clean it, but instead it will take a bite of the larger fish’s tail fin. Thus, nature has a form of defection that is much more subtle than the usual predator-prey type of competition. Morever, the blenny is partially a victim of its own success, since if its population ever significantly exceeds the wrasse’s, it will be more difficult for it to dupe the larger fish. So if you think about it, the probability of a larger fish getting cleaned by a wrasse is actually a dynamic one varying in time according the wrasse-blenny competitive structure.