STAT 11

May 7, 2006

A few sample problems…

1. Confidence interval for a mean.

Mars Rover Two measures 64 daily high temperatures, and finds that the average of the 64 measurements is –42.5 degrees C. Assume that the standard deviation of daily temperatures is exactly 16 degrees C, based on data from Mars Rover One.

a. In this problem, what are n, , , , and s ?

b. What is the standard error of the mean temperature?

c. Give a 95% confidence interval for .

d. Give an 82% confidence interval for .

2. Confidence interval for a proportion.

You randomly selected 100 of your latest homemade hand grenades, and very carefully tested them. Unfortunately(?) 80 of them failed.

a. What is ?

b. What is the SE ?

c. What is a 95% CI for the true failure rate?

d. If you felt like using Wilson’s method for this problem, what would change?

3. Confidence interval for a difference of means.

You found that 30 randomly selected Sunoco stations had an average gas price of $3.10, with s = $ 0.05.

Also, 40 randomly selected Lukoil stations had an average price of $3.12, with s=$0.08.

Let D = (average of all Lukoil prices) minus (average of all Sunoco prices).

What is a 95% confidence interval for D ? What can you conclude about the relative prices at Lukoil and Sunoco stations generally ?

4. Interpreting a scatterplot.

a. Look at this scatterplot, and estimate…

The mean of the “x” variable;

The standard deviation of the “x” variable;

The mean of the “y” variable;

The standard deviation of the “y” variable;

The correlation coefficient (r) for the two variables.

b. Describe the relationship in words.

5. More on standard errors.

Science News reported that the average length of a junk-DNA sequence in Speciesus Inventedus is 128 bases, based on a sample of 100 measurements.

a. What is the SE for , the true average length?

b. Actually, Science News gave the SE: They said it’s 8. What is the standard deviation of the original sample?

6. One-way chi-square problem.

A poker-dealing machine is supposed to deal cards at random, as if from an infinite deck.

In a test, you counted 1600 cards, and observed the following:

Spades404

Hearts420

Diamonds400

Clubs376

Could it be that the suits are equally likely? Or are these discrepancies too much to be random?

7. Another one-way chi-square problem.

Same as before, but this time jokers are included, and you counted 1662 cards, with these

results:

Spades404

Hearts420

Diamonds400

Clubs356

Jokers 82

a. If a deck contains 54 cards and two of them are jokers, what is the probability that any particular randomly-chosen card would be a joker?

b. How many jokers would you expect out of 1662 random cards? How many of each suit?

c. Is it possible that the cards are really random? Or are the discrepancies too large?

------

SOLUTIONS

1a. n = 64

 = the true (long-run) average daily temperature at the Mars Rover Two site

(we don’t know the value)

= –42.5 degrees C

 = 16 degrees C (by assumption)

s = the sample standard deviation – this could be computed from the sample, but it isn’t

given in the problem

1b. SE = exactly

1c. Use z* = 1.96, since you are NOT using s as a substitute for . MOE = 1.96 × 2.00 = call it 3.9 degrees C, sothe CI is

[ -42.5 – 3.9, -42.5 + 3.9 ]

or

[ -46.4, -38.6 ]

1d. The problem here is to compute z* when C = 0.82. You want to leave a probability of

/2 = 0.09 in each tail, so you could either…

Find 0.09 under PHI(z) in the z table, and see that it corresponds to

z = - 1.34, which is –z*, or

Find 0.91 under PHI(z) in the z table, and see that it corresponds to

z = + 1.34, which is +z*.

Using z* = 1.34, we get MOE = 1.34 × 2.00 = call it 2.7 degrees, so the CI is

[ -45.2, -39.8 ] .

2a. = 0.80 ( = 80 divided by 100 )

2b. SE = exactly.

2c. Use z* = 1.96 (always use z* for a proportion) . The MOE is 1.96 × 0.04 = 0.08, so the CI is [ 0.72, 0.88 ].

(Be careful here, if you use percentages. It would be ok to write [ 72%, 88% ], but if you write in percentage points, then be sure to do the same thing with the MOE. In particular, don’t write (WRONG) [ 79.92, 80.08 ]. )

2d. The center of the CI would still be = 0.80 (exactly), but you would also compute

and you would use it to compute SE:

SE = .

The resulting confidence interval would be very slightly larger than the one in 2c.

3. There’s no reason to pool the “s” values here, and that isn’t supposed to be part of the course material anyway. So compute the two SE’s separately:

Sunoco:

n = 30, = 3.10, s = 0.05, SE = s/sqrt(n) = $ 0.00913

Lukoil:

n = 40, = 3.12, s = 0.08, SE = s/sqrt(n) = $ 0.01265

Combined SE:

= $ 0.01560

MOE: Use z* = 1.96 again, so MOE = (1.96) (0.01560) = $ 0.0306…call it 3 cents.

The observed difference is

= 0.02 (that’s 3.12 – 3.10)

so the confidence interval for the difference is

[ –$0.01, +$0.05 ].

You can’t tell from these reports – at least you can’t tell with 95% confidence – whether Sunoco or Lukoil has generally higher prices.

4a. The true values are as follows:

mean of x:2.086

sd of x:2.933(using n-1)

mean of y:4.300

sd of y:1.925

correlation:–0.832

4b. In words: The relationship is STRONG, NEGATIVE, LINEAR.

(Although really, how strong it is depends on context. You might ask, strong compared to what? )

(The same is true of linearity. You might detect a slight downward bending at the left edge of the picture; or, you might decide that the apparent bending is caused by just one point and is probably accidental. In fact, the data were generated using a linear equation with normally distributed errors.)

5a. Trick question – you can’t tell what the SE is, because you would need to know  or s.

5b. You have SE =  / sqrt(100) =  / 10. Since SE = 8, you have 8 =  / 10, so they must be using 80 for . (You still can’t tell whether they got this from the sample or estimated it in some other way.)

6.

expected / expected
observed / (percent) / (counts) / z
404 / 0.25 / 400 / 0.200
420 / 0.25 / 400 / 1.000
400 / 0.25 / 400 / 0.000
376 / 0.25 / 400 / -1.200
chi-square-> / 2.480
critical value-> / 7.815

Compute each z from its own row as (observed-expected)/sqrt(expected). Be sure to use the counts in this formula, not the percentages. The chi-square statistic is the sum of the squares of the z-values.

The number of degrees of freedom is 3 (number of categories minus 1).

The critical value is from a table you’ll have on the exam (using  = 0.05). But you don’t need it in this case. The chi-square value is about what you would expect with 3 degrees of freedom, and none of the z statistics are out of line (not even as large as 2, certainly not beyond 4). So, DO NOT REJECT the null hypothesis. There is no reason to suspect that the cards are not random.

7.

expected / expected
observed / (percent) / (counts) / z
404 / 0.2407 / 400.1 / 0.194
420 / 0.2407 / 400.1 / 0.994
400 / 0.2407 / 400.1 / -0.006
356 / 0.2407 / 400.1 / -2.205
82 / 0.0370 / 61.6 / 2.606
1662 / 1662
chi-square-> / 12.680
critical value-> / 9.488

This time, the chi-square statistic (12.68) is above the =0.05 critical value, so you could reject the null hypothesis and declare that the cards are not random. The problem is clearly that there are too many jokers at the expense of clubs – you can see that from the z statistics.

On the other hand, the p-value is only 0.013 (you can’t compute that on the test) so the test isn’t totally convincing. You wouldn’t want to arrest the machine designer on this evidence.

(end)

1