CHAPTER 1

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

1.1 Random variable

A random variable is the numeric result of operating a non-deterministic mechanism or performing a non-deterministic experiment. For example, rolling a die and recording the outcome yields a random variable with range { 1, 2, 3, 4, 5, 6 }. Tossing a fair coin yields another random variable with possible outcomes {Head, Tail}.

1.2 Probability Distribution

A probability distribution is the list of all possible outcomes of a random variable and their associated probabilities of occurrence. Recording all the probabilities of output ranges of a real-valued random variable X yields the probability distribution of X. Every random variable gives rise to a probability distribution, and this distribution contains most of the important information about the variable. If X is a random variable, the corresponding probability distribution assigns to the interval [a, b] the probability Pr[a ≤ X ≤ b], i.e. the probability that the variable X will take a value in the interval [a, b]. Probability distributions can either be discrete or continuous.

1.3 Discrete Random Variables and the Probability Mass Function

A random variable is discrete if its probability distribution is discrete; a discrete probability distributionis one that is fully characterized by a probability mass function. In probability theory, a probability mass function (abbreviated pmf) gives the probability that a discreterandom variable is exactly equal to some value.

Thus X is a discrete random variable if

∑ / P(X = u) = 1
u

as u runs through the set of all possible values of the random variable X.

The Poisson distribution, the Bernoulli distribution, the binomial distribution, the geometric distribution, and the negative binomial distribution are among the most well-known discrete probability distributions.

If a random variable is discrete then the set of all possible values that it can assume is finite or countably infinite, because the sum of uncountably many positive real numbers (which is the smallest upper bound of the set of all finite partial sums) always diverges to infinity.

A Discrete Probability Distributionis one where the possible outcomes of a random variable can take only specific values, usually integers. A distribution is called discrete if its cumulative distribution function consists of a sequence of finite jumps, which means that it belongs to a discrete random variableX: a variable which can only attain values from a certain finite or countable set.

Properties of a valid Probability Mass Function

------(1)

------(2)

------(3)

------(4)

You would notice that condition (3) is a combination of (1) and (2). For the purposes of testing the validity of the pms, condition (4) is the essential property since it is only satisfied upon the fulfillment of all other conditions.

1.4 Continuous Random Variables and the Probability Density Function

A probability density function serves to represent a probability distribution in terms of integrals. If a probability distribution has density f(x), then intuitively the infinitesimal interval [x, x + dx] has probability f(x) dx. A probability density function can be seen as a "smoothed out" version of a histogram: if one empirically measures values of a random variable repeatedly and produces a histogram depicting relative frequencies of output ranges, then this histogram will resemble the random variable's probability density (assuming that the variable is sampled sufficiently often and the output ranges are sufficiently narrow).

Formally, a probability distribution has density f(x) if f(x) is a non-negative integrable function such that the probability of the interval [a, b] is given by

for any two numbers a and b. This implies that the total integral of f must be 1. Conversely, any non-negative integrable function with total integral 1 is the probability density of a suitably defined probability distribution.

Not every probability distribution has a density function; for instance the distributions of discrete random variables do not. A distribution has a density function if and only if its cumulative distribution functionF(x) is absolutely continuous. In this case, F is almost everywheredifferentiable, and its derivative can be used as probability density. If a probability distribution admits a density, then the probability of every one-point set {a} is zero. (It is a common mistake to think of f(a) as the probability of {a}, but this is incorrect; in fact, f(a) will often be bigger than 1.)

This is associated with an infinite number of possible outcomes, such that probabilities can only be calculated over a range rather than for particular x values. A distribution is called continuous if its cumulative distribution function is continuous, which means that it belongs to a random variable X for which Pr[ X = x ] = 0 for all x in R. The probability distribution of the variable X can be uniquely described by its cumulative distribution functionF(x), which is defined by

for any x in R.

The continuous distributions can be expressed by a probability density function: a non-negative integrable function f defined on the reals such that

for all a and b.

Properties of Valid Probability Density Function

------(1)

------(2)

------(3)

------(4)

In the continuous case, the integral sign takes over from the summation. Again condition (4) is the ‘acid test’ for a genuine pdf.

1.5 Some Special Discrete Distributions

1.5.1 Bernoulli Distribution

The Bernoulli distribution, named after Swiss scientist James Bernoulli, is a discreteprobability distribution, which takes value 1 with success probability p and value 0 with failure probability q = 1 − p. The probability mass functionf of this distribution is

The expected value of a Bernoulli random variable is p, and its variance is pq = p(1 − p).

1.5.2 Binomial distribution

The binomial distribution is a discrete probability distribution which describes the number of successes in a sequence of nindependent experiments, each of which yielding success with probabilityp. Such a success/failure experiment is also called a Bernoulli experiment. For instance, in tossing a fair coin each trial has the probability of success is half and the probability of failure is also half. If the random variable X is the distribution of successes, then X takes the value of 1, when success is recorded and 0, when failure is recorded:

If n trial are involved, the binomial distribution can be generalised such that the probability of getting exactly x successes is given by:

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p).

Example 1:

You are given that 5% of the population at MidlandsStateUniversity is HIV-positive. You pick 500 people randomly. How likely is it that you get 30 or more HIV-positives? The number of HIV-positives you pick is a random variableX which follows a binomial distribution with n = 500 and p = 0.05. We are interested in the probability P[X ≥ 30].

Example 2:

An insurance broker has five contacts and she believes for each that the probability of making a sale is 0.4. The distribution of the number of sales is binomial. Calculate the probability of making zero sales. What is the probability of striking more than four deals?

Answer

Let the random variable X be the number of deals struck, X ~ B(n, p).

Then the probability of zero sales becomes:

The probability of striking more than four deals is:

Expected value

In general expectation is what is considered the most likely to happen. If something happens that is not at all expected it is a surprise.

In probability (and especially gambling), the expected value (or expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff ("value"). Thus, it represents the average amount one "expects" to win per bet if bets with identical odds are repeated many times. Note that the value itself may not be expected in the general sense, it may be unlikely or even impossible.

If X is a discrete random variable with values x1, x2, ... and corresponding probabilities p1, p2, ... which add up to 1, then EX can be computed as the sum or series

If the probability distribution of X admits a probability density functionf(x), then the expected value can be computed as

The expected value operator (or expectation operator) E is linear in the sense that

E(aX + bY) = aE(X) + bE(Y)

for any two random variables X and Y (which need to be defined on the same probability space) and any two real numbersa and b.

The expected values of the powers of X are called the moments of X; the moments about the mean of X are also defined as certain expected values.

In general, the expected value operator is not multiplicative, i.e. E(XY) is not necessarily equal to E(X)E(Y), except if X and Y are independent. The difference, in the general case, gives rise to the covariance and correlation.

To empirically estimate the expected value of a random variable, one repeatedly measures values of the variable and computes the arithmetic mean of the results. This estimates the true expected value and has the property of minimizing the sum of the squares of the errors away from the expected value.

Variance

In mathematics, the variance of a real-valued random variable is its second central moment, and also its second cumulant (cumulants differ from central moments only at and above degree 4). If μ = E(X) is the expected value of the random variable X, then the variance is

σ2 = E((X - μ)2),

i.e., it is the expected value of the square of the deviation of X from its own mean. It is the mean squared deviation.

We can conclude two things:

The variance is never negative because the squares are positive or zero. When any method of calculating the variance results in a negative number, we know that there has been an error, often due to a poor choice of algorithm.

The unit of variance is the square of the unit of observation. Thus, the variance of a set of heights measured in centimeters will be given in square centimeters. This fact is inconvenient and has motivated statisticians to call the square root of the variance, the standard deviation and to quote this value as a summary of dispersion.

This variance is a nonnegative real number.

The expected value of the Binomial Distribution

If X ~ B(n, p), then the expected value of X is

E(X) = np

The varianceof the Binomial Distribution

Var(X) = np(1-p).

1.5.3 Poisson distribution

The Poisson distribution is a discreteprobability distribution (discovered by Siméon-Denis Poisson (1781-1840) that describes among other things, a number of discrete occurrences (sometimes called "arrivals") that take place during a time-interval of given length. The Poisson distribution applies to various phenomena of discrete nature (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. The probability that there are exactly x occurrences (x being a natural number including 0, x = 0, 1, 2, ...) is:

Where:

  • e is the base of the natural logarithm (e = 2.71828...),
  • x! is the factorial of x,
  • λ is the average rate of occurrence that occur during a specific period. For instance, if the events occur on average every 2 minutes, and you are interested in the number of events occcurring in a 10 minute interval, you would use as model a Poisson distribution with λ = 5.

Example 1:

Research has indicated that for a typical plant with 2000 employees in Zimbabwe, the number of strikes in a given year can be represented by the poisson distribution with mean λ = 0.4 strikesper year.

a)Specify the probability mass function for the number of strikes in a given year.

b)Calculate the probability of at least 2 and at most four strikes.

Answer

for x =0,1,2,………..

Example 2

Customers arrive at a photocopying machine at an average rate of 12 per hour. What is the probability of more than two arrivals in a five-minute period?

Answer

Expected Value for the Poisson Distribution

Variance of the Poisson Distribution

The following are examples of occurrences that can be characterised by the poisson distribution:

  • The number of cars that pass through a certain point on a road during a given period of time.
  • The number of spelling mistakes a secretary makes while typing a single page.
  • The number of phone calls you get per day.
  • The number of times your web server is accessed per minute.

1.6 Some Special Continuous Probability Distributions

1.6.1 Uniform Distribution

This describes an experiment in which the value of the random variable assumes some value in the domain: . All values of x in the domain are equally likely and the density function is:

for

The density function has the value of zero for all values outside the domain. The definite integral of f(x) with the limit of integration a and b is equal to 1 as shown below:

Genuine pdf!

Example 1:

Buses on a certain route run every thirty minutes. What is the probability that a person arriving at a random time to catch a bus will have to wait at least twenty minutes?

Answer

Let the random variable x be the time that the person will have to wait for the next bus, and it is uniformly distributed with . The probability that the person will wait for at least twenty minutes is:

The Expected Value for the Uniform Distribution

We are going to use integration to derive the expected value of the uniform distribution.

The Variance of the Uniform Distribution

But

1.6.2 Exponential distribution

The exponential distribution is a continuous probability distribution with the probability density function:

for 0 ≤ x ≤ ∞

Where: x is the time between arrivals.

k is the average number of arrivals per period.

e =2.71828 and is a constant.

The probability of an arrival within time (T) is:

We can prove that the exponential function satisfies the requirements of a genuine probability distribution function:

Genuine pdf!

Example 1:

Customers are known to arrive at a service station according to the exponential distribution with an average of k = 10 arrivals per hour. Determine the probability of an arrival within 12 minutes.

Answer

Example 2:

Aircraft have been observed to arrive at a certain airport according to the exponential distribution with k = 20 arrivals per hour. Determine the probability of an arrival of an aircraft within a three-minute period.

Answer

Example 3:

The time between breakdowns of a certain part is determined by the exponential distribution with k = 2 breakdowns per year. Determine the probability of a breakdown within a nine-month period.

Answer

The expected value of an exponential

The expected value of the exponential distribution can be derived as follows:

Let:

Variance of the Exponential Distribution

Where:

Let:

To further integrate the remaining component!

Let:

The exponential distribution may be viewed as a continuous counterpart of the geometric distribution, which describes the number of Bernoulli trials necessary for a discrete process to change state. In contrast, the exponential distribution describes the time for a continuous process to change state.

Examples of variables that are approximately exponentially distributed are:

  • the time until you have your next car accident
  • the time until you get your next phone call
  • the distance between mutations on a DNA strand
  • the distance between roadkill

1.7 Median of a Probability Distribution

The median (m) of a probability distribution is the value of the random variable for which the probability that and . The median is therefore the value of x such that:

for

This can be also be determined as:

Example1:

Consider the following probability distribution:

for

Determine the median of the distribution.

Answer

Example 2:

A probability density function is described by:

for

Determine the median of the distribution.

Answer

It is also possible to find the first and third quartiles of a distribution. This proceeds as follows:

The first quartile:

The third quartile:

Needless to mention that the second quartile is the median! EC102 told you so!

1.8 Geometric distribution

The geometric distribution is a discrete probability distribution -- the probability distribution of the number of Bernoulli trials needed to get one success, if the probability of success is p. The probability that the first success is on the nth trial is:

P(X = n) = (1 - p)n - 1p

for n = 1, 2, 3, ....

The expected value of a geometrically distributed random variable is 1/p and the variance is (1 − p)/p2.

It is the special case of the negative binomial distribution in which r = 1. Like its continuous analogue (the exponential distribution), the geometric distribution is "memoryless"; in fact, it is the only memoryless discrete distribution.

1.9 Normal distribution

The normal distribution is an extremely important probability distribution in many fields. It is also called the Gaussian distribution. It is actually a family of distributions of the same general form, differing only in their location and scale parameters: the mean and standard deviation. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one. Because the graph of its probability density resembles a bell, it is often called the bell curve.

The normal distribution was first introduced by de Moivre in an article in 1733 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors.

The name "bell curve" goes back to Jouffret who used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875 [Stigler]. This terminology is unfortunate, since it reflects and encourages the fallacy that "everything is Gaussian".

The probability density function of the normal distribution with mean μ and standard deviation σ (equivalently, variance σ2) is an example of a Gaussian function,

1.10 Central limit theorem

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the so-called central limit theorem.

The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.

Central limit theorems are a set of weak-convergence results in probability theory. Intuitively, they all express the fact that any sum of many independent identically distributed random variables is approximately normally distributed. These results explain the ubiquity of the normal distribution.