Practice Problems for Midterm #2

1. According to scientists, asteroids 500 meters in diameter or larger are expected to strike the earth once every 10,000 years (on average). Fill in the blank: “Mathematically, there is a 10% chance that the earth will be struck by a large asteroid within the next ______years.”

Answer: P(x £ x0) = 0.1 = 1-e-x0 / 10,000 ® x0 = 1053.6 years.

2. You have collected the following data from a population that is approximately normal in distribution:

41.19 / 40.04 / 35.27 / 18.91 / 15.60
47.02 / 58.67 / 39.38 / 35.04 / 47.16
30.60 / 45.50 / 64.91 / 39.95 / 20.61

The sample has a mean of 38.66 and a standard deviation of 13.70. Suppose you plan to collect one additional data point.

a) What is the probability that the sample point will be less than 13.452?

Answer: z13.452 = (13.452 – 38.66)/13.70 = -1.84. From the normal table, we see that –1.84 corresponds tom P(-1.84 < z < 0) = 0.4671. Therefore, P(x<13.452) = 0.5-0.4671 = 0.0329.
b) Find the point x0 such that P(x > x0) =1.1%.
Answer: We want the upper tail to have a probability of 0.011. We must therefore find 0.5-0.011 = 0.489 in the table. This corresponds to a z of 2.29. Therefore x0 = 38.66+2.29´13.70 = 70.03.

3. Suppose there is a test for a disease which affects 1 person out of every 2000. The test correctly identifies a sick person 99% of the time (i.e., 1% of the time, the test says that the sick person is actually healthy). Unfortunately, the test returns false positives (a healthy person is identified as being sick) 2% of the time. A friend tests positive for the disease. What is the probability that your friend actually has the disease?

Answer: P(sick) = 1/2000 = 0.0005; P(failed test|sick) = 0.99; P(failed test|healthy) = 0.02. Using Bayes’ Rule,

4. Serious traffic accidents occur, on average, twice per weekday in a city. The city has emergency services capable of handling four serious accidents in a day. When more than four accidents occur, emergency services are requested from surrounding communities. What percentage of weekdays will the city need help from surrounding communities?

Answer: We want P(x>4) = f(5)+f(6)+f(7)+f(8)+….. and can use the Poisson distribution. This is an infinite sum, so we can simplify it by calculating P(x>4) = 1 - f(4) - f(3) - f(2) - f(1) – f(0).
So, P(x>4) = 1-0.0902-0.1804-0.2707-0.2707-0.1353 = 0.0527 = 5.27%

5. BRIEFLY evaluate the validity of the following statement: “Even if a variable is continuous in nature, we cannot measure it in a continuous way. For example, we cannot measure things out to an infinite number of decimal places. An implication of this is that continuous probability distributions are only useful in theory and are not useful when applied to real-world situations.”

Answer: There are two strong arguments here. First, many distributions have so many discrete possibilities that they are nearly continuous (suppose we measure temperature to three decimal places, for instance). In such cases, a continuous distribution is a reasonable approximation. Second, the Central Limit Theorem says that the sample mean from any distribution (even if it is discrete) will be approximately normal in distribution if n is large. Thus a continuous distribution is the theoretical distribution for sample means. For both these reasons, continuous distributions are of critical importance in real-world settings.

6. You are interested in selecting a simple random sample of 200 households from the phone book. You know the number of entries in the book, but some of them are for businesses. A colleague suggests the following method. You randomly select 200 numbers from a discrete uniform distribution between 1 and the number of entries in the phone book. You then select the entries corresponding to those numbers. If a selection happens to be a business, you simply move to the next item in the phone book and include that in your sample. Does this methodology meet the criteria for a simple random sample? BRIEFLY explain your answer.

Answer: Households immediately following businesses are more likely to be chosen than those not following businesses. The strategy does not therefore meet the criteria.

7. Explain the difference between unbiasedness and efficiency. Which is more important in choosing a point estimate? BRIEFLY justify your answer.

Answer: An unbiased estimate is one that is correct in expectation. An estimate is more efficient than another estimate if its standard deviation is lower. Having an unbiased estimate is usually considered more important than having the most efficient estimate.

8. You have collected the following data from an unknown population:

47 / 44 / 42 / 0 / 8 / 4
1 / 49 / 4 / 2 / 43 / 43
42 / 5 / 42 / 46 / 5 / 8
45 / 3 / 49 / 3 / 9 / 46
5 / 48 / 44 / 9 / 1 / 47

BRIEFLY explain how you would estimate the distribution of the sample mean. IN ONE SENTENCE, justify your method. There is no need to do any calculations here.

Answer: The expected value of the sample mean would be the average of the numbers. The standard deviation of the sample mean would be the sample standard deviation divided by the square root of 30. The sample mean would be approximately normal in distribution. This approach is justified by the Central Limit Theorem.

9. Suppose that you plan to flip a fair coin 1000 times and record the number of heads. What is the probability that at least 520 heads will be recorded?

Answer: By the Central Limit Theorem, the sample proportion will be approximately normal in distribution. . Then, . From the normal table, we see that z=1.27 corresponds to an area of 0.3980 between z=1.27 and z=0. P(>0.52) = 0.5-0.3980 = 0.1020.

10. A company purchases electronic switches from three different suppliers. 55% of the switches come from supplier A, 30% from supplier B, and 15% from supplier C. Supplier A switches are defective with probability 1.0%. Supplier B switches are defective with probability 2.0%. Supplier C switches are defective with probability 3.0%.

a) Assuming that the company manufactures a radio that uses exactly one of the switches, what is the probability that a randomly chosen radio will have a defective switch?
Answer: P(def) = P(A)´P(def|A) + P(B)´P(def|B) + P(C)´P(def|C)
= 0.55´0.01 + 0.3´0.02 + 0.15´0.03
= 0.016
b) Suppose that a radio is found to have a defective switch. Which supplier is most likely to have provided the switch?
Answer: Using Bayes’ Rule,
Supplier B is the most likely to have supplied the switch.
c) What is the probability that the supplier in b) provided the switch?
Answer: See answer to b).

11. On average, 10 people go through a supermarket checkout line every hour. The probability of someone entering the checkout line is the same for any two time intervals of equal length.

a) What is the probability that between 2 and 5 (inclusive) people will approach the checkout line during a 30 minute period?
Answer: Using the Poisson distribution,
b) What is the probability that no customers will approach the checkout line in the next 6 minutes?
Answer: m = 10/12 = 0.8333

12. You have conducted a study of spring term behavior and have found the following: 60% of all students both attend class regularly and go to Goshen every week. Of the students who do not go to Goshen every week, 85% attend class regularly. 75% of all students go to Goshen every week. What percentage of students attend class regularly?

Answer: ; ;
Þ .

13. As part of a promotion, the supermarket randomly chooses customers who then receive a 20% discount on their purchases. The randomization process is such that the probability that a given customer receives the discount is 0.12. What is the probability that more than 4 of the first 30 customers will receive the discount?
Answer: Using the Binomial distribution,

14. You work in a building that has a notoriously slow elevator. On average, the elevator arrives at a floor 3 minutes after the button is pressed. The probability that the elevator will arrive during a given 5-minute interval is the same as for any other 5-minute interval.

a) What is the probability that you will have to wait longer than 5 minutes for the elevator to arrive?
Answer: The exponential applies. .
b) What is the probability that you will have to wait for between 1 and 2 minutes?
Answer: and , so .

15. An oil well has produced an average of 100 barrels per day over the last 30 days. The standard deviation of daily production is 10 barrels per day. The data appears to be normal in distribution.

a) What is the probability that the well will produce at least 110 barrels tomorrow?
Answer: gives an area of 0.3413 between z=1 and z=0. P(x>110) = 0.5-0.3413 = 0.1587.

b) What is the probability that the well will produce between 90 and 120 barrels tomorrow?
Answer: and give areas of 0.3413 and 0.4772. So P(90<x<120) = 0.3413+0.4772 = 0.8185.

c) Suppose that a geologist originally told you that the well should produce an average of 105 barrels per day. Evaluate the validity of his assessment.
for 105, which would not necessarily be considered an outlier. It is far enough away from the mean that we should be concerned, however.

16. On average, it rains 55 days per year (assume 365 days per year). What is the probability that it will rain 2 days or less over the next 7 days? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?).
Answer: P(rain) = 55/365 = 0.1507. Using the binomial distribution,

f(0)+f(1)+f(2) = 0.9254

The binomial is a reasonable assumption when we have a two-state outcome and observations are independent.

17. On average, the February temperature is 40 degrees with a standard deviation of 8 degrees. What is the probability that the temperature will be between 36 and 50 degrees on a given day in February? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?).
Answer: Using the normal distribution, we have z36 = (36-40)/8 = -0.5 and z50 = (50-40)/8 = 1.25. From the normal table, we see that z36 ® -0.1915 and z50 ® 0.3944. P(36<x<50) = 0.1915+0.3944 = 0.5859. In using the normal distribution, we assume that the distribution of temperature is shaped, at least approximately, like the bell curve.

18. Consider the following data:

Day: / Monday / Tuesday / Wednesday / Thursday / Friday
Week 1 Sales / 43 / 50 / 57 / 47 / 55
Week 2 Sales / 37 / 58 / 56 / 44 / 42
Week 3 Sales / 34 / 59 / 54 / 44 / 44
Week 4 Sales / 32 / 54 / 59 / 57 / 49
Week 5 Sales / 43 / 54 / 59 / 52 / 41
Week 6 Sales / 31 / 54 / 51 / 58 / 59
Week 7 Sales / 42 / 42 / 55 / 59 / 52
Week 8 Sales / 49 / 45 / 54 / 47 / 42
Week 9 Sales / 48 / 57 / 42 / 42 / 55
Week 10 Sales / 33 / 45 / 41 / 54 / 41

Your boss asks you to calculate the average daily sales. How do you respond? What is the probability of getting sales between 38 and 41 (inclusive) next Monday? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?).

Answer: The mean daily sales during the period is 48.44. This might be misleading because Monday may be an outlier (it is the only day with any sales in the 30s and it is the only day without sales in the 50s). The mean sales figures by day are

Day / Mean Sales
Monday / 39.2
Tuesday / 51.8
Wednesday / 52.8
Thursday / 50.4
Friday / 48.0

One approach in addressing the potential outlier is to calculate the standard deviation of daily sales. In this case, s = 5.47. Monday is therefore (39.2-48.44)/5.47 = -1.69 standard deviations away from the mean. For a small sample, this suggests (although it is far from conclusive) that Monday may be an outlier. Coupling this with the intuition above (which basically says that the ranges are vastly different) leads me to believe that Monday is quite likely an outlier. I would therefore tell my boss that although the mean daily sales is 48.44, Monday averages 39.2 while the rest of the week averages 50.75.

There are several reasonable approaches to estimating the probability of getting between 38 and 41 sales next Monday. Treating Monday as an outlier, we must use Monday’s mean and standard deviation (6.66) in the analysis. The data is clearly discrete, but we may choose to use the normal distribution as an approximation. z37.5 = (37.5-39.2)/6.66 = -0.255 and z41.5 = (41.5-39.2)/6.66 = 0.345. From the normal table, z37.5 ® 0.21055 and z41.5 ® 0.13495 (in both cases, I interpolated between the numbers). P(38£x£41)0.21055+0.13495 = 0.3455.

An alternative approach is to assume that the data follows a discrete uniform distribution. This seems reasonable when we observe that the numbers are spread pretty evenly between 30 and 50. Given a 20 unit distribution and four outcomes of interest (38, 39, 40, and 41), we estimate P(38£x£41) = 4/20 = 0.20.

The two estimates differ greatly because the assumptions behind the discrete uniform and normal distributions are so different. Ideally, we would gather as much data as possible to determine what distribution is most appropriate. Note that we might also consider a binomial distribution if we can get data on the number of sales attempted, etc.