Topic #1: Probability, Mean, Variance, Covariance,
and Correlation
I. Random Variable and Probability
A random variable is a variable that has yet to be determined. It may take on various values and each value has a probability associated with it.
Example: You flip a coin. Heads X = 1 and Tails X = -1. If the coin is fair, then the probability of a Heads is equal to ½. The same is true for Tails, i.e., P[Tails] = ½. Therefore, f(x) = ½ if x = 1 and ½ if x = -1.
Example: Suppose that the probability your next customer will arrive before x minutes is equal to F(x) = 1- e-λx for x ≥ 0 and λ > 0 .
Example: Suppose that the probability that your business will have x customers today is equal to f(x) = (1-λ)λx for x = 0,1,2,3,... and for 0 λ < 1.
There are many types of probability functions. Each economic phenomenon will have its own special probability function. These functions can be divided into discrete functions and continuous. Discrete functions are where x can only take on countably many values, like x = 0, x = 1, x = -1, etc. Continuous functions are where x can take on any value on the real line.
A very commonly used probability function is the normal probability density function. We write this function as
for - ∞ < x < ∞.
We find that this function has the familiar bell-shape we have all seen before. If μ = 0 and
σ2 = 1 we have the following graph of a standard normal density.
The area underneath this curve is equal to 1. Therefore, any part of it is bounded between 1 and 0 and can be thought of as a probability. For example, the probability that x is less than 1.5 is equal to the yellow area in the graph below.
The graph makes it clear how that we can calculate things such as P[ x≤ 1.5], or even
P[-1 ≤ x ≤ 1.5] which is the area under the curve from -1 to 1.5.
Usually, we reserve the capital letter X for the random variable and the small letter x for an observation (or data) on this random variable. From now on, we will follow this convention and let X be random and x be non-random. Therefore, we should write the above probabilities as P [X ≤ 1.5] and P[ -1 ≤ X ≤ 1.5] using the capital (random) X.
The yellow area in the graph above is equal to a definite integral from -∞ to 1.5 and we write this as
P[ X ≤ 1.5 ] = .
This integral cannot be evaluated so easily, so we use a numerical table to approximate the integral. This table is called the standard normal probability table and can be found on the Internet at
Example: Using the table above, find P[X ≤ 1.5]. The answer is 0.9332.
Example: Using the table above, find P[X ≤ -1.25]. The answer is 0.1056.
Example: Using the table above, find P[ 1.5 ≤ X ≤ 2.7]. The answer is 0.0633.
Example: Using the table above, find Xα such that P[ |X| ≤ Xα] = 0.05.
The answer is Xα = 1.96. Draw a graph
showing the standard normal and mark off -Xα and Xα.
II. Mean and Variance
The mean of a random variable is its expected value. For symmetric distributions this is in the middle. For example, it is clear that the above standard normal density is symmetric since to the left of zero looks the same as to the right. It is the mirror image. However, not all densities are symmetric. The negative exponential density is not symmetric since if f(x) = λe-λx, then f(-x) ≠ f(x).
To compute the theoretical mean of a random variable we write
E[X] =
which means simply that you multiply x to f(x) and then add it up (or integrate it).
Example: Suppose that f(x) = for 0 ≤ x ≤ 1. Show that E[X] = .
Example: Suppose that f(x) = for 0 ≤ x ≤ 4. Show that E[X] = .
Example: Suppose that for - ∞ < x < ∞.
Show that E[X] = μ. Consider where the midpoint of the distribution is
located.
The variance of random variable X is defined as var[X] = E[ (X-E[X])2 ]. This can be rewritten in the following way:
var[X] = E[ (X-E[X])2 ]
= E[ X2 - 2XE[X] + E[X]2 ]
= E[ X2] – E[2XE[X]] + E[E[X]2]
= E[ X2] – 2E[X]E[X] + E[X]2
= E[X2]– E[X]2.
To compute a theoretical variance we need to compute the mean E[X] and then compute the 2nd raw moment about the origin, E[X2]. The latter of these is computed as follows:
E[X2] = ,
which means that we multiply f(x) by x2 and then add things up (or integrate). The variance measures the dispersion of probability about the mean. One might say (intuitively) that an increase in the variance of X, without a change in the mean of X, increases the randomness of X.
Example: Suppose that f(x) = for 0 ≤ x ≤ 1. Show that var(X) = .
Example: Suppose that f(x) = for 0 ≤ x ≤ 4. Show that var(X) = .
Example: Suppose that for - ∞ < x < ∞.
Show var(X) = σ2.
III. Covariance and Correlation
Finally, we turn to a consideration of two random variables, say X and Y. If we can understand the operations on two random variables, then we can operate on any number of them.
Just as X had a probability density function f(x), the two random variables X and Y have a joint probability density function f(x, y). Everything is just about the same as before, except that we now have to add up in two directions – one direction for x and one for y. You already know this. For example, you must count in two directions to count the number of people in the classroom. A double integral is just adding along two directions, x and y. The joint probability density for random X and Y adds to 1. We write this as
1 = .
Of course, this means is that something MUST happen or adding the probabilities for all x and y together gives unity.
Expected values and variances are computed as before, except we use a double integral. For example,
E[X] = and var(X) =
The mean and variance of Y is computed in the same fashion.
The covariance of X and Y is defined as follows
Cov(X,Y) =
assuming of course that E[X] and E[Y] exist and the integral converges. The covariance of X and Y tells how X changes when Y changes. If it is positive, then the two random variables tend to move together in the same direction (on average). If the covariance is negative, then X and Y tend to move inversely (again, on average). For example, the covariance of X = economic growth and Y = unemployment rate is generally thought to be negative. Higher growth is associated with lower unemployment—theoretically. The data we have appears to confirm this theory, as well. But, we have not reached the stage of data. We are speaking of the theoretical covariance now.
Example: Suppose f(x, y) = () for 0 < x < 1 and 0 < y < 1. Show f(x, y)
integrates to 1. Show E[X] = , E[Y] = 11/18, and E[XY] = . From this
show that cov(X,Y) = E[XY] – E[X]E[Y] = -. The integration here is
very simple, but involves multiple integration along the x and y axes.
The definition of the theoretical correlation coefficient is quite simple and is given by
corr(X,Y) = .
The correlation coefficient normalizes the covariance, so that correlation remains between -1 and 1. A correlation of 1 means a perfect positive correlation, or that X and Y movein the same positive linear direction. A correlation coefficient of -1 means a perfect negative correlation, or that X and Y move in exactly opposite linear directions. A zero correlation means that X and Y are not linearly related. The correlation coefficient measures the degree and direction of the linear relation between two random variables.
Example: Suppose f(x, y) = () for 0 < x < 1 and 0 < y < 1.
Show var(X) = . Show var(Y) = . Therefore, corr(X,Y) =-.03381
roughly.