Chapter 2

Probability

Introduction

This chapter introduces the concept of probability. It is a central part of statistics and one that gives many students second thoughts about dropping the course. Some of the basic concepts you will find to be straightforward (e.g., sample space, subjective and objective probabilities, complements of an event, etc.) while other concepts (joint and conditional probabilities and Bayes’ theorem) may prove to be somewhat confusing at first glance. Perseverance on your part will get you through the chapter and on to the more application-oriented topics in the subsequent chapters.

Applicable Excel Templates used in this Chapter:

Bayes Revision.xls

Contingency Table.xls

Probability of at Least 1.xls

Permutation & Combination.xls

Applicable MegaStat commands:

MEGASTAT→ Probability → Counting Rules

Applicable MINITAB commands:

None

2-7.

Sample Space
First Toss / Second Toss > First
1 / 2
3
4
5
6
2 / 3
4
5
6
3 / 4
5
6
4 / 5
6
5 / 6
6 / none

There are 36 possible outcomes tossing two dice.

There are 15 possible outcomes where the second

toss is greater than the first.

P(Second Toss > First) = 15/36 = 0.417

2-8.Let R be “exposed to radio advertisement.” Let T be “exposed to television advertisement.”

a.Then RT represents the event that a randomly selected person will be exposed to either a radio or a television advertisement, or both.

  1. Then RT represents the event that a randomly selected person will be exposed to both a radio and a television advertisement.

2-12.We are given that 5 million Blackberry users were unable to use their devices. We also know that there are 18 million users of handheld devices of this kind. If a user is chosen at random, what is the probability that their device will not work?

P(device not working) = 5MK / 18M = 0.2778

2-13. Continuing with the Blackberry problem in 2-12, 3 million of the 18 million users could not use their devices as cellphones. An additional 1 million could not use their devices as either a cellphone or a data device. Determine the probability of a randomly selected user not being able to use their device as either a cellphone or a data device.

P(nonfunctioning cell phone) = 3/18 = 0.1667

P(nonfunctioning data device and nonfunctioning cell phone) = 1/18 = 0.0556

P(nonfunctioning data device or nonfunctioning cell phone) = 0.2778 + 0.1667 – 0.0556 = 0.3889

2-18. Given that P(Detect 1st) = .98 and P(Detect 2nd) = .94 and P(Detect 1st and Detect 2nd) = .93, compute the probability that at least one of the two will be detected (that is, the 1st or the 2nd or both will be detected).

In general, P(A  B) = P(A) + P(B)  P(A  B).

Here, P(Detect 1st  Detect 2nd) =

P(Detect 1st) + P(Detect 2nd)  P(Detect 1st  Detect 2nd) = .98 + .94  .93 = 0.99.

2-20.We are given age and sex data for 20 managers.

34F, 49M, 27M, 63F, 33F, 29F, 45M, 46M, 30F, 39M, 42M, 30F, 48M, 35F, 32F, 37F, 48F,

50M, 48F, 61F

A manager will be chosen at random.

  1. Compute the probability the manager will be either a woman or over 50 years old, or both

 (P(F  50) = P(F) + P(50) - P(F  50)

= = 0.60.

2-24. Continuing with problem 2-12, determine the probability that a randomly chosen user could use their Blackberry device.

From 2-12, we know the probability of a nonfunctioning device is 0.2778. To calculate the probability of a functioning device, we need to use the complement rule:

P(functioning data device) = 1 – P(nonfunctioning data device) = 1 – 0.2778 = 0.7222

2-27.If a large competitor buys a small firm, the firm’s stock will increase with probability 0.85. In other words: P(stock will Rise | Bought by large firm) = 0.85.

The purchase of the company has a probability of 0.40 of taking place, or P(being Bought by a large firm) = 0.40.

The probability that the purchase will take place and the firm’s stock will rise is determined by the intersection of the two events, P(R  B), found by:

P(R  B) = P(R | B) P(B) = (.85)(.40) = 0.34

2-28. If interest rates decrease, then the probability the market will go up is 0.80. Using the symbol ““ for “given,” this may be written P(market goes up  interest rates decrease) = 0.80. That is, the conditional part is the interest rates; if they decrease, then the market goes up, so the given part is interest rates decreasing. Also, we assume that the probability is 0.40 that interest rates will decrease, which may be written P(interest rates decrease) = 0.40. Then, we wish to compute the probability that the market will go up and the interest rates go down, so we seek P(market goes up  interest rates go down). The conditional law is

P(A|B) = ,

so P(Market goes up | interest rates decrease) =,

or P(M.up | Int.down) = 0.80 = ,

and solving for the intersection, P(M.up  Int.down) = 0.80(0.40) = .32.

2-33.Given the following table of counts:

Price Increase / No Price Increase / Total
Paid / 34 / 78 / 112
Not Paid / 85 / 49 / 134
Total / 119 / 127 / 246

a.Compute the probability a randomly selected stock increased in price:

P(price increase) = = .484.

b.Compute P(paid dividends) = = .455.

c.Compute P(price increase  paid dividends)

= = .138.

d.Compute P(not paid  no price increase)

= = .199.

e.Given a price increase, compute the probability it also paid dividends:

P(paid dividendsprice increase) =

= = = .285.

Another way to view this is from a reduced-space perspective.

Compute P(paid dividendsprice increase) = = .286.

In the previous version we considered the 34 out of 246 compared to 119 out of 246, but in the reduced space version, we recognize that “given price increase” restricts us to the 119 which had a price increase, and then out of this 119, what proportion also paid dividends?

It was 34 out of 119, or a proportion of .286. The viewpoints are equivalent, and the answers differ due to rounding.

  1. P(increased in pricepaid no dividends)

=

= = = .6343.

g.Compute P(price increase  paid dividends) (i.e., either price increase or paid dividends or both) =

P(price increase) + P(paid dividends)  P(price increase and paid dividend)

= = .801.

2-36.According to a report, 65% of Americans are overweight or obese. The problem asks you to determine the probability that in a group of five randomly selected Americans at least one is overweight or obese. How do we start? Let’s determine what we do know. First, a random sample of five people is selected, which implies independence. Second, it is given that 65% of all Americans are overweight or obese. If this is so, then we also know that 35% are not overweight or obese.

So how do we proceed? We could set up the entire sample space of all the possible outcomes for five people being overweight or obese, starting at none are overweight up to all are overweight, and then calculating the probability of each outcome and adding up the relevant probabilities. This is the more time consuming way to do it. Let’s first determine the probability that no one in the group of five is overweight or obese, which would be one of the outcomes in the sample space. The probability that none of the five selected people are overweight is:

P(not overweight) = (0.35)(0.35)(0.35)(0.35)(0.35) = (0.35)5 = 0.0053 (approximately).

Since all the possible outcomes in our sample space must add up to 1.00, we can use the complement rule to determine the sum of the remaining probabilities of the other outcomes. The remaining outcomes include one person being overweight, two being overweight, three…,etc. Since the question ask us to determine the probability that at least one is overweight, we simply subtract our probability of none being overweight from 1.00:

P(at least one is overweight) = 1.00 – 0.00525 = 0.9947 (approximately).

Using the template (Probability of at least 1.xls), enter the probability of success (overweight) for the sample of size 5 in column C. The result is shown cell H4.

Probability of at least one success from many independent trials.
Success Probs
1 / 0.65 / Prob. of at least one success / 0.9947
2 / 0.65
3 / 0.65
4 / 0.65
5 / 0.65

2-38.We want to be sure that a package is delivered within one day so we decide to send the same package by three different delivery services. The three delivery firms have different success rates for on-time delivery: Firm A has a 90% success rate [P(A) = 0.90], Firm B has an 88% success rate [P(B) = 0.88], and Firm C has a 91% success rate [P(C) = 0.91]. We want to determine the probability that at least one of the packages arrives on time.

First, we assume independence in the events; i.e., the delivery by one service has no impact on the delivery of the other two services.

Second, let’s determine the probability that none of the three packages are delivered on time. To do this we need to calculate the failure rate for each firm. The failure rate for each firm is found by subtracting their success rate, expressed in decimal format, from 1.00, or:

Firm A failure rate: 1.00  0.90 = 0.10

Firm B failure rate: 1.00  0.88 = 0.12

Firm C failure rate: 1.00  0.91 = 0.09

The probability that none of the packages will be delivered on time is found by the multiplication of the three failure rates, since we assumed the events were independent of each other.

P(none are delivered on time) = (0.10)(0.12)(0.09) = 0.00108

Finally, the probability that at least one of the packages is delivered on time is same as asking for the probability that one or two or all three packages were delivered on time. The easiest way to do this is to use the “complement rule.” To determine the probability that at least one of the packages is delivered on time we subtract the probability that none of the packages were delivered on time from 1.00.

P(at least one arrives on time) = 1  P(all three fail to arrive)

= 1.00  (1  .90)(1  .88)(1  .91) = 1.00  0.00108 = 0.99892

Using the template (Probability of at least 1.xls), enter the probability of success for each package delivery company in column C. The result is in cell H4.

Probability of at least one success from many independent trials.
Success Probs
1 / 0.9 / Prob. of at least one success / 0.9989
2 / 0.88
3 / 0.91

There is a 99.89% chance that at least one of the packages will be delivered on time.

2-42. We are given the probabilities of three credit derivatives for making a profit, and we want to know the probability of at least on will make a profit.

Using template: Probability of at least 1.xls

Enter the three probabilities in cells: C4:C6

Probability of at least one success from many independent trials.
Success Probs
1 / 0.9 / Prob. of at least one success / 0.9900
2 / 0.75
3 / 0.6

The probability of at least one of the three investments makes a profit is 0.9900

2-44.This problem pertains to the data of problem 2-31, which provided the following table of the number of claims at an insurance company.

East / South / Midwest / West / Totals
Hospitalization / 75 / 128 / 29 / 52 / 284
Physician’s visit / 233 / 514 / 104 / 251 / 1,102
Outpatient treatment / 100 / 326 / 65 / 99 / 590
Totals / 408 / 968 / 198 / 402 / 1,976

The question is whether the event “hospitalization” is independent of the event “Midwest.” We know that in general (whether the events are independent or not) P(AB) = P(AB)P(B), but if A and B are independent, P(AB) = P(A), since the fact that B is given or came first doesn’t make any difference to factor A. Therefore, if P(AB) = P(A), then events A and B are independent, and consequently, P(AB) = P(A)P(B).

  1. One test for independence: The events are independent if

P(hospitalizationMidwest) = P(hospitalization).

  1. P(hospitalizationMidwest) =

= =

P(hospitalizationMidwest) = 0.1465.

(This can also be computed more directly by = 0.1465.)

2.Also, P(hospitalization) = = 0.1437

  1. Then: is P(hospitalizationMidwest) = P(hospitalization)? Here 0.1465  0.1437; therefore the two events are not statistically independent. Where the hospitalization occurs does make a difference.

b.Another test for independence: The two events are independent if P(hospitalizationMidwest) = P(hospitalization)P(Midwest).

1.P(hospitalizationMidwest) = = 0.0147.

2.P(hospitalization)P(Midwest)? = 0.0144.

  1. Since P(hospitalizationMidwest)  P(hospitalization)P(Midwest), because

0.0147  0.0144, the two events are not statistically independent.

Permutations and Combinations Introduction (Reference 2-52)

To determine how many ways there are to order potential participants, we use either the formula for combinations or the formula for permutations. To decide which, we ask whether the order in which they are selected matters. For instance, if selecting two people from among five (call them A, B, C, D, and E), there are several approaches, depending on whether order makes a difference.

Permutations.

If the order in which they are selected matters, we have a permutation. That is, a permutation of ways to order the outcome occurs if the order of selection is important, such as “the first person chosen is the chairman and the second person is treasurer.” Here the outcome AB (selecting A first and then B) is different from the outcome BA (selecting B first) since in one case A is the chairman and in the other A is the treasurer.

The formula for permutations is nPr = .

For the example of selecting two from five,

5P2 = = 20 permutations, which may be listed as:

AB / BA / CA / DA / EA / AC / BC / CB / DB / EB
AD / BD / CD / DC / EC / AE / BE / CE / DE / ED

and note that both AB and BA are counted as separate outcomes, because we are assuming order matters; a permutation.

Combinations.

If, however, the order of selection does not matter, as in “the two people selected will be equal members of the committee,” then a combination formula is used instead of a permutation formula. Then, the outcome AB (A selected first) is the exact same outcome as BA (B selected first), since in both cases A is a member and B is a member. So we cannot count AB and BA as separate outcomes; AB and BA are the same outcome (with slightly different names). The number of outcomes for the combination case is less than for the permutation case. The combination formula is nCr = . For the example of selecting two items from five we have 5C2 = = = = = 10 combinations, which may be listed as AB AC AD AE BC BD BE CD CE DE.

(Note that BA is not listed, since it would be equivalent to AB, which is listed.)

2-50.Use Template: Probability of at least 1.xls

To determine the probability that at least one driver makes it home safely is the same as determining the probability that one or two or all three drivers make it home safely. The sample space consists of four possible outcomes: none make it home safely, one makes it home safely, two make it home safely, and all three make it home safely. Since we are interested in all the possible outcomes except the one where none make it home safely, we can approach the problem using the complement rule. First, we convert each probability of a successful trip from the problem into a probability of an unsuccessful trip. Since driver one has a probability of 0.50 of making a successful trip, then he has a (1 – 0.50 = 0.50) probability of not making it home. The corresponding probabilities for driver #2 and #3 are: (1 – 0.25 = 0.75) and (1 – 0.20 = 0.80) respectively. Using these probabilities of failures gives us:

P(at least one drives home safely) = 1 – P(none drive home safely)

= 1 – [(0.50)(0.75)(0.80)]

= 1 – 0.30 = 0.70

The template approach is much easier. Insert the given probabilities of success in cells C4:C6, and the result is displayed in cell H4.

Success Probs
1 / 0.5 / Prob. of at least one success / 0.7000
2 / 0.25
3 / 0.2

2-52.To select the four representatives (one from each department), we use Rule 8 to give the number of possible sequences. There are 55 to select from in manufacturing, 30 in distribution, 21 in marketing, and 13 in management. The total number of possible outcomes is then

(55)(30)(21)(13) = 450,450 combinations.

(Another approach is to treat each department as a separate task of selecting one representative, and use the combination formula, since the order doesn’t matter:)

Manufacturing: Select 1 from 55;55C1 = = 55.

Distribution: Select 1 from 30;30C1 = = 30.

Marketing: Select 1 from 21;21C1 = = 21.

Management: Select 1 from 13;13C1 = = 13.

Use Template: Permutation & Combination.xls

For this problem, we enter the respective department sizes under “n” and the desired number of representatives from each department under “r” (in this case only one representative from each department) in the cells pertaining to combinations (order is not important): F5:G8. The combination values are displayed in cells H5:H8.

Combination
n / r / nCr
55 / 1 / 55
30 / 1 / 30
21 / 1 / 21
13 / 1 / 13

Now, there are 55 combinations from manufacturing, and for each one there are 30 from distribution, and for each of these there are 21 from marketing, and for each of these there are 13 from management. Thus, there are

(55)(30)(21)(13) = 450,450 combinations.

Of course, using Rule 8 is much simpler than using Rule 11 four times, but we present this approach just for practice in using combinations.

2-53.9! = (9)(8)(7)(6)(5)(4)(3)(2)(1) = 362,880 different orders.

2-54.In this problem, sequence or order is important, so we use the permutation formula. We wish to select 8 out of 15, so:

15P8 = =

= 259,459,200 different sequences.

Use Template: Permutation & Combination.xls

In this problem, order is important. Therefore, we need to use the permutations section of the template. Data is entered in cells B5:C5 and the permutation result is displayed in cell D5. (You will need to increase the width of cell D5 to see the value by first unprotecting the sheet. Under Tools|Protection select Unprotect Sheet. Then move your cursor to the column heading “D E” until it changes to a vertical line with a dual arrow line horizontally through it. Holding down the left-button on the mouse, slide the cursor to the right until the value changes from scientific notation to its basic numerical display.)

Permutation
n / r / nPr
15 / 8 / 259459200

2-56.In selecting two from seven, where the order doesn’t matter, we use the formula for combinations

7C2 = = 21 combinations,

or 21 ways to get pairs out of seven populations.

2-61. In setting up this problem we must convert the word statements to probabilities. Something equals 0.95, and it is a conditional probability because it says that if tests show no side effects, then the probability is 0.95 that the FDA will approve. So P(approveno side effects) = 0.95. Also, if the tests indicate side effects, then the probability is 0.50 that the FDA will approve, so *9

2-62. P(approveside effects) = 0.50. Finally, there is 0.20 probability that the tests will show side effects so P(side effects) = 0.20. Of course, either a test will indicate side effects or it won’t, so

P(no side effects) = 1  P(side effects) = 1  0.20 = 0.80.

Now, using the law of total probability (Rule 13), we want to find the probability the drug will be approved, which is not a conditional probability.

P(approve) = [P(approve given side effects)] times [ the likelihood of tests showing side effects]

+ [P(approve give no side effects)] times [the likelihood of showing no side effects].

P(approve) = P(approveside effects)P(side effects)

+ P(approveno side effects)P(no side effects)

= 0.50(0.20) + 0.95(0.80) = .86.

Bayes’ Theorem

Bayes’ Theorem allows us to compute certain probabilities by reversing the order of events.

2-65. This problem pertains to an electronic door lock system. Note that an over-bar such as

means a logical “not,” which here would be “should not open.”

The first step is to write down what is known and the compliments; for some problems these compliments will be needed, and it is useful to write them down each time. It is given that 90% of the time the door should open, so

P(should open) = 0.90 and the compliment is P() = 0.10.

Then, given that the door should open, 98% of the time a green light will appear, so

P(green lightshould open) = P(GOSO) = .98 and its compliment is

P(should open) = P(SO) = .02.