Lesson 6-1Random Variables

A Random Variable is a variable taking numerical values determined by the outcome of a chance process.

A Continuous Random Variabletakes all values in some interval of numbers. A density curve describes the probability distribution of a continuous random variable. The probability of any event can be described by area under the density curve (for a continuous random variable).

  • ADiscrete Random Variablehas a fixed set of possible values with gaps between them. The probability distribution assigns each of these values a probability between 0 and 1, such that the sum of all the probabilities is exactly 1. The probability of any event (for a discrete random variable) is the sum of the probabilities of all the values that make up that event.

The Expected Value(or mean) is the balance point of the probability distribution histogram or density curve. Since the mean is the long-run average value of the variable after many repetitions of the chance process, it is also known as the expected value of the random variable.

Mean and Variance of a Discrete Random Variable

Suppose that X is a discrete random variable whose distribution is

Value of X / X1 / X2 / X3 / … / Xn
Probability or P(X) / p1 / p2 / p3 / … / Pn

To find the mean of X, also written as E(X), multiply each possible value by its probability, then add all the products.

Example (from The Practice of Statistics 4e P. 345):

On an American roulette wheel, there are 38 slots numbered 1 through 36, plus 0 and 00. Half the slots from 1 to 36 are red; the other half are black. Both the 0 and the 00 slots are green. Suppose that a player places a simple $1 bet on red. If the ball lands in a red slot, the player gets the original dollar back, plus an additional dollar for winning the bet. If the ball lands in a different colored slot, the player loses the dollar bet to the casino. Let’s define X to be the net gain from a single $1 bet on red. Find the expected value and variance for X.

Mean and variance for continuous random variables

Since it would be much too cumbersome to list the infinite possible values for a continuous random variable (and because their probabilities would all be approximately 0) we will rely on other tools for finding the expected value and variance for continuous random variables. Usually we look for these continuous random variables to be Normally distributed. If they are, all of our Normal density curve problems still apply.

Example (from The Practice of Statistics 4e P. 351):

The heights of young women closely follow the Normal distribution with mean of 64 and standard deviation 2.7. This is a distribution for a large set of data. Now choose one young woman at random. Call her height Y (this will be our continuous random variable). If we repeat the random choice very many times, the distribution of values of Y is the same Normal distribution that describes the heights of all young women. Find the probability that the chosen woman is between 68 and 70 inches tall.

Homework

Page 353: 3-4, 7, 9-13, 15-17, 20-25, 31-34

Benford’s Law

Faked numbers in tax returns, invoices, or expense account claims often display patterns that aren’t present in legitimate records. Some patterns, like too many round numbers, are obvious and easily avoided by a clever crook. Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model known as Benford’s law. Call the first digit of a randomly chosen record X. Benford’s law gives this probability model for X:

X / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
P(X) / .301 / .176 / .125 / .097 / .079 / .067 / .058 / .051 / .046

Find the expected value and variance for the probability distribution (probability model).

Show this is a legitimate probability distribution.

Make a histogram of the probability distribution and describe what you see.

Describe the event X>6 in words. What is P(X>6)?

Express the event “first digit is at most 5” in terms of X. What is the probability of this event?

Calculate the standard deviation of X.

If someone was trying to fake financial records (assuming they do not know Benford’s law) what might they assume about the distribution of first digits in legitimate records?

Calculate the standard deviation for the distribution you assumed in the above question.