7.1 Expected Value and Variance of Sums of Random Variables

Collections of Random Variables

The previous chapters have focused on the description of a single random variable and its associated probabilities. In this chapter, we deal with properties of collections of random variables. Special attention is paid to the probability that the sum or average of a group of random variables falls in some range, and important results in this area include the laws of large numbers and the central limit theorem.

7.1 Expected Value and Variance of Sums of Random Variables

Recall from Section 3.1 that two events A and B are independent if P(AB) = P(A)P(B). One similarly refers to random variables as independent if the events related to them are independent. Specifically, random variables X and Y are independent if, for any subsets A and B of the real line,

P(X is in A and Y is in B) = P(X is in A) P(Y is in B).

Technically, the statement above is not exactly correct. For X and Y to be independent, the relation above does not have to hold for all subsets of the real line, but rather for all measurable subsets of the real line. There are some very strange subsets of the real line, such as the Vitali set R/Q, and the probabilities associated with random variables taking values in such sets are not generally defined. These types of issues are addressed in measure-theoretic probability courses.

Even if X and Y are not independent, E(X + Y) = E(X) + E(Y), as long as both E(X) and E(Y) are finite. To see why this is true, suppose X and Y are discrete, and note that

P(X = i) = P{Uj (X = i and Y = j)}

= Σj P(X = i and Y = j)

by the third axiom of probability, and similarly

P(Y = j) = Σi P(X = i and Y = j). As a result,

E(X + Y) = Σ k P(X + Y = k)

= Σi Σj (i + j) P(X = i and Y = j)

= Σi Σj iP(X = i and Y = j) + Σi Σj jP(X = i and Y = j)

= Σi i {Σj P(X = i ,Y = j)} + Σj j {Σi P(X = i,Y = j)}

= Σi i P(X = i) + Σj j P(Y = j)

= E(X) + E(Y).

A similar proof holds for the case where X or Y or both are not discrete, and it is elementary to see that the statement holds for more than two random variables, i.e., for any random variables X1, X2,…, Xn, as long as E(Xi) is finite for each i, E(Σni=1 Xi ) = Σni=1 E(Xi ).

Example 7.1.1 — At a 10-handed Texas Hold’em table, what is the expected number of players who are dealt at least one ace?

Answer — Let Xi = 1 if player i has at least one ace,

and Xi = 0 otherwise.

E(Xi) = P(player i is dealt at least one ace)

= P(player i has 2 aces)+P(player i has exactly one ace)

= {C(4,2) + 4 × 48}/C(52,2) ~ 14.93%.

Σi Xi = the number of players with at least one ace, and

E(Σi Xi) = Σi E(Xi) = 10 × 0.1493 = 1.493. ■

Note that, in Example 7.1.1, Xi and Xj are far from independent, for i ≠ j. If player 1 has an ace, the probability that player 2 has an ace drops dramatically. See Example 2.4.10, where the probability that players 1 and 2 both have at least one ace is approximately 1.74%, so the conditional probability

P(player 1 has at least one ace | player 1 has at least one ace) = P(both players have at least one ace)

÷ P(player 1 has at least one ace)

~ 1.74%/[1 – C(48,2)/C(52,2)] ~ 11.65%,

whereas the unconditional probability

P(player 2 has at least one ace) = [1 –C(48,2)/C(52,2)]

~ 14.93%.

While the sum of expected values is the expected value of the sum, the same is not generally true for variances and standard deviations. However, in the case where the random variables Xi are independent, var(Σni=1 Xi ) = Σni=1 var(Xi ).

Consider the case of two random variables, X and Y.

var(X + Y) = E{(X + Y)2} – {E(X) + E(Y)}2

= E(X2) – {E(X)}2 + E(Y2) – {E(Y)}2

+ 2E(XY) – 2E(X)E(Y)

= var(X) + var(Y) + 2[E(XY) – E(X)E(Y)].

Now, suppose that X and Y are independent and discrete. Then

E(XY) = Σi Σj i j P(X = i and Y = j)

= Σi Σj i j P(X = i) P(Y = j)

= {Σi i P(X = i)} {Σj j P(Y = j)}

= E(X) E(Y).

A similar proof holds even if X and Y are not discrete, and for more than two random variables: in general, if Xi are independent random variables with finite expected values, then E(X1 X2 … Xn) = E(X1) E(X2) … E(Xn), and as a result, var(Σni=1 Xi ) = Σni=1 var(Xi ) for independent random variables Xi.

The difference E(XY) – E(X)E(Y) is called the covariance between X and Y, and is labeled cov(X,Y). The quotient cov(X,Y)/[SD(X) SD(Y)] is called the correlation between X and Y, and when this correlation is 0, the random variables X and Y are called uncorrelated.

Example 7.1.2 — On one hand during Season 4 of High Stakes Poker, Jennifer Harman raised all-in with 10♠ 7♠ after a flop of 10♦ 7♣ K♦. Daniel Negreanu called with K♥ Q♥. The pot was $156,100. The chances were 71.31% for Harman winning the pot, 28.69% for Negreanu to win and no chance of a tie. The two players decided to run it twice, meaning that the dealer would deal the turn and river cards twice (without reshuffling the cards into the deck between the two deals), and each of the two pairs of turn and river cards would be worth half of the pot, or $78,050. Let X be the amount Harman has after the hand running it twice, and let Y be the amount Harman would have after the hand if they had decided to simply run it once. Compare E(X) to E(Y) and compare approximate values of SD(X) and SD(Y). (In approximating SD(X), ignore the small dependence between the two runs.)

Answer — E(Y) = 71.31% × $156,100 = $111,314.90.

If they run it twice, and X1 = Harman’s return from the 1st run and X2 = Harman’s return from the 2nd run, then

X = X1 + X2 , so

E(X) = E(X1) + E(X2)

= $78,050 × 71.31% + $78,050 × 71.31%

= $111,314.90.

Thus, the expected values of X and Y are equivalent.

For brevity, let B stand for billion in what follows.

E(Y2) = 71.31% × $156,1002 ~ $17.3B, so

V(Y) = E(Y2) – [E(Y)]2 ~ $17.3B – [$111,314.92] ~ $5.09B, so SD(Y) ~ √$5.09B ~ $71,400.

V(X1) = E(X12) – [E(X1)]2

= $78,0502 × 71.31% – [$78,050 × 71.31%]2

~ $1.25 B.

Ignoring dependence between the two runs,

V(X) ~ V(X1) + V(X2) ~ $1.25B + $1.25B = $2.5B,

so SD(X) ~ √$2.5B = $50,000.

Thus, the expected values of X and Y are equivalent ($111,314.90), but the standard deviation of X ($50,000) is smaller than the standard deviation of Y ($71,400). ■

For independent random variables, E(XY) = E(X)E(Y) (see Exercise 7.10); independent random variables are always uncorrelated. The converse is not always true, as shown in the following example.

Example 7.1.3 — Suppose you are dealt two cards from an ordinary deck. Let X = the number on your 1st card (ace = 14, king = 13, queen = 12, etc.), and let Y = X or –X, respectively, depending on whether your 2nd card is red or black. Thus, for instance, Y = 14 if and only if your 1st card is an ace and your 2nd card is red. Are X and Y independent? Are they uncorrelated?

Answer — Consider for instance the events (Y = 14) and

(X = 2).

P(X = 2) = 1/13, and counting permutations,

P(Y = 14) = P(1st card is a black ace and 2nd card is red)

+ P(1st card is a red ace and 2nd card is red)

= (2 × 26)/(52 × 51) + (2 × 25)/(52 × 51)

= 102/(52 × 51)

= 1/26.

X and Y are clearly not independent, since for instance

P(X = 2 and Y = 14) = 0, whereas

P(X = 2)P(Y = 14) = 1/13 × 1/26.

Nevertheless, X and Y are uncorrelated because

E(X)E(Y) = 8 × 0 = 0, and

E(XY) = 1/26 [(2)(2) + (2)(–2) + (3)(3) + (3)(–3) + …

+ (14)(14) + (14)(–14)]

= 0. ■

7.2 Conditional Expectation

In Section 3.1, we discussed conditional probabilities in which the conditioning was on an event A. Given a discrete random variable X, one may condition on the event {X = j}, for each j, and this gives rise to the notion of conditional expectation. A useful example to keep in mind is where you have pocket aces and go all in, and Y is your profit in the hand, conditional on the number X of players who call you. This problem is worked out in detail, under certain assumptions, in Example 7.2.2 below. First we will define conditional expectation.

If X and Y are discrete random variables, then E(Y|X = j) = Σk k P(Y = k | X = j), and the conditional expectation E(Y | X) is the random variable such that E(Y | X) = E(Y | X = j) whenever X = j. We will only discuss the discrete case here, but for continuous X and Y the definition is similar, with the sum replaced by an integral and the conditional probability replaced by a conditional pdf. Note that

E{E[Y | X ]}= Σj E(Y | X = j ) P(X = j )

= Σj Σk k P(Y = k | X = j ) P(X = j )

= Σj Σk k [P(Y = k and X = j )/P(X = j )] P(X = j )

= Σj Σk k P(Y = k and X = j )

= Σk Σj k P(Y = k and X = j )

= Σk k Σj P(Y = k and X = j )

= Σk k P(Y = k).

Thus, E{E[Y | X]} = E(Y).

Note that using conditional expectation, one could trivially show that the two random variables in Example 7.1.3 are uncorrelated. Because in this example E[Y | X] is obviously 0 for all X, E(XY) = E[E(XY | X)] = E[X E(Y | X)] = E[0] = 0.

Example 7.2.1 — Suppose you are dealt a hand of Texas Hold’em. Let X = the number of red cards in your hand and let Y = the number of diamonds in your hand. (a) What is E(Y)? (b) What is E[Y | X]? (c) What is P{E[Y | X] = 1/2}?

Answer — (a) E(Y) = (0)P(Y=0) + (1)P(Y=1) + (2)P(Y=2)

= 0 + 13×39/C(52,2) + 2C(13,2)/C(52,2)

= ½.

(b) Obviously, if X = 0, then Y = 0 also, so

E[Y | X = 0] = 0,

and if X = 1, Y = 0 or 1 with equal probability, so

E[Y | X = 1] = 1/2.

When X = 2, we can use the fact that each of the C(26,2) two-card combinations of red cards is equally likely and count how many have 0, 1, or 2 diamonds. Thus,

P(Y = 0 | X = 2) = C(13,2)/C(26,2) = 24%,

P(Y = 1 | X = 2) = 13 × 13/C(26,2) = 52%,

and P(Y = 2 | X = 2) = C(13,2)/C(26,2) = 24%.

So, E[Y | X = 2] = (0)(24%) + (1)(52%) + (2)(24%) = 1. In summary,

E[Y | X] = 0 if X = 0,

E[Y | X] = 1/2 if X = 1,

and E[Y | X] = 1 if X = 2.

= 26 × 26/C(52,2) ~ 50.98%. ■

The conditional expectation E[Y | X] is actually a random variable, a concept that newcomers can sometimes have trouble understanding. It can help to keep in mind a simple example such as the one above. E[Y] and E[X] are simply real numbers. You do not need to wait to see the cards to know what value they will take. For E[Y | X], however, this is not the case, as E[Y | X] depends on what cards are dealt and is thus a random variable. Note that, if X is known, then E[Y | X] is known too. When X is a discrete random variable that can assume at most some finite number k of distinct values, as in Example 7.2.1, E[Y | X] can also assume at most k distinct values.

Example 7.2.2 — This example continues Exercise 4.1, which was based on a statement in Volume 1 of Harrington on Hold’em that, with AA, “you really want to be one-on-one.” Suppose you have AA and go all in pre-flop for 100 chips, and suppose you will be called by a random number X of opponents, each of whom has at least 100 chips. Suppose also that, given the hands that your opponents may have, your probability of winning the hand is approximately 0.8X. Let Y be your profit in the hand. What is a general expression for

E(Y | X)? What is E(Y | X) when X = 1, when X = 2, and when X = 3? Ignore the blinds and the possibility of ties in your answer.

Answer — After the hand, you will profit either 100X chips or –100 chips, so E(Y | X) = (100X)(0.8X) + (–100)(1 – 0.8X) = [100(X + 1)] 0.8X – 100. When X = 1, E(Y | X) = 60, when X = 2, E(Y | X) = 92, and when X = 3, E(Y | X) = 104.8. ■

Notice that the solution to Example 7.2.2 did not require us to know the distribution of X. Incidentally, the approximation P(winning with AA) ~ 0.8X is simplistic but may not be a terrible approximation. Using the poker odds calculator at www.cardplayer.com, consider the case where you have A♠ A♦ against hypothetical players B, C, D, and E who have 10♥ 10♣, 7♠ 7♦, 5♣ 5♦, and A♥ J♥, respectively. Against only player B, your probability of winning the hand is 0.7993, instead of 0.8. Against players B and C, your probability of winning is 0.6493, while the approximation 0.82 = 0.64. Against B, C, and D, your probability of winning is 0.5366, whereas 0.83 = 0.512, and against B, C, D, and E, your probability of winning is 0.4348, while 0.84 = 0.4096.

7.3 Laws of Large Numbers and the Fundamental Theorem of Poker

The laws of large numbers, which state that the sample mean of iid random variables converges to the expected value, are among the oldest and most fundamental cornerstones of probability theory. The theorems date back to Gerolamo Cardano’s Liber de Ludo Aleae (Book on Games of Chance) in 1525 and Jacob Bernoulli’s Ars Conjectandi in 1713, both of which used gambling games involving cards and dice as their primary examples. Bernoulli called the law of large numbers his “Golden Theorem,” and his statement, which involved only Bernoulli random variables, has been generalized and strengthened to form the following two laws of large numbers. For the following two results, suppose that X1, X2, …, are iid random variables, each with expected value µ<∞ and variance σ2<∞.

Theorem 7.3.1 (weak law of large numbers).

For any ε > 0, P(|X_n – μ| > ε) → 0 as n → ∞.

Theorem 7.3.2 (strong law of large numbers).