STAT 200 Final Exam Solutions

Summer 2008 (modified version)

Note: This modified version is slightly shorter than a standard 2.5 hour final exam.

Problem 1

(a)

Since 51+74=125 patients had duration between 1 and 2 days,

and among them 5+10=15 patients got infections,

P(getting an infection given that duration is between 1 and 2 days) = = 0.12

(b)

H0: infection and duration are independent vs. Ha: infection and duration are dependent

Table of expected counts:

Duration (days) / Total
1 / 2 / 3 / 4
Infection / =7.86 / =11.41 / =7.24 / =14.49 / 41
No infection / =43.14 / =62.59 / =39.76 / =79.51 / 225
Total / 51 / 74 / 47 / 94 / 266

Test statistics =++++

+++= 2.536

df = (2-1)(3-1) = 3, and critical value is = 11.34

Since = 2.536 < = 11.34, we fail to reject H0 and conclude that there is not enough evidence to say infection and duration of catheterization are associated.

(c)

The distribution of duration conditioned on having an infection is:

Duration (days) / Total
1 / 2 / 3 / 4
# Infection / 5 / 10 / 8 / 18 / 41
Proportion / = 0.122 / = 0.244 / = 0.195 / = 0.439 / 1

(d)

The marginal distribution of duration of catheterization is:

Duration (days) / Total
1 / 2 / 3 / 4
Total # / 51 / 74 / 47 / 94 / 266
Proportion / = 0.192 / = 0.278 / = 0.177 / = 0.353 / 1

Problem 2

(a)

Let be the true mean gene expression for a stem cell

Let be the true mean gene expression for a mesoderm cell

Let be the true mean gene expression for a neuronal cell

H0: == vs. Ha: for some ij

Since N = 11+9+11 = 31, I = 3, == 8.245

SSG == 11+9+11= 66.937

SSE == (11-1)+(9-1)+(11-1)= 98.8

MSG === 33.469, MSE === 3.529

=== 9.48, and the critical value is = 3.34

Since = 9.48 >= 3.34, we reject H0 and conclude that the true mean gene expression for at least one of the cell types is significantly different at 5% significance level.

(b)

If the conclusion from part (a) were incorrect in reality, we would have made a type I error by rejecting H0 when H0 is indeed true. In the context of this example, this means we concluded that the true mean gene expression for at least one of the cell types is significantly different, when in reality the true mean expression for all three cell types are equal.

Problem 3

(a)

Let X be the time 1st runner finishes their share of the race

=10.5 seconds, =0.35 seconds, and X ~ N(10.5, 0.35)

P(X < 10) = P(Z <) = P(Z <) = P(Z < -1.43) = 0.0764

(b)

P(at least one of the four runners finishes their share of the race in under 10 seconds)

= 1 – P(none of the four runners finishes their share of the race in under 10 seconds)

= 1 – P(a runner does not finish their share of the race in under 10 sec

= 1 – (1 - 0.0764 [by independence and use the result from pat (a)]

= 0.2723

(c)

Let be the time 1st runner finishes their share of the race

Let be the time 2nd runner finishes their share of the race

Let be the time 3rd runner finishes their share of the race

Let be the time 4th runner finishes their share of the race

and T = the total time a team finishes the race

==+++= 10.5+10.5+10.5+10.5 = 42 seconds

==+++= (0.35+(0.35+(0.35+(0.35= 0.49

== 0.7 second

So, T ~ N(42, 0.7)

P(T < 40) = P(Z <) = P(Z <) = P(Z < -2.86) = 0.0021

Problem 4

(a)

Given =30, =45.8, =1.2, and =30, =46.2, =1.1

Assuming equal variances,

=== 1.325

=== 0.2972

df = +-2 = 30+30-2 = 58, and = 1.671

(-)= (45.8 - 46.2) 1.6710.2972 -0.40.497(-0.897, 0.097)

Therefore, we are 90% confident that the true mean hardness readings determined by instrument 1 is between 0.897 lower and 0.097 higher than the true mean hardness readings determined by instrument 2.

(b)

Based on this interval, we would fail to reject the null hypothesis because zero is within the interval.

(c)

H0:= vs. Ha:

Assuming equal variances, and from part (a), =0.2972

Test statistic is === -1.346

df = +-2 = 30+30-2 = 58, and = 1.671

p-value = 2P(T||) = 2P(T |-1.346|) = 2P(T1.346), and 0.10 < p-value < 0.20

Since p-value >=0.10, we fail to reject H0 and conclude that there is not enough evidence to say the true mean hardness readings from the two instruments are significantly different at 10% significance level.

Problem 5

(a)

n=32, =6.15 hours, s=45 min=0.75 hour

df = n-1= 32-1= 31, and = 2.457

= 6.152.457= 6.150.326(5.824, 6.476)

Therefore, we are 98% confident that the true mean lifetime of a fully charged battery is between 5.824 hours and 6.476 hours.

(b)

Margin of error m = 10 min, s=45 min, = 2.326

n ===109.558 ipods

So the sample size should be n = 110 ipods

(c)

H0:= 6 hours vs. Ha: > 6 hours

Test statistic is === 1.131

df = n-1 = 32-1 = 31, and = 2.457

p-value = P(1.131)P(1.131), and 0.10 < p-value < 0.15

Since p-value >=0.05, we fail to reject H0 and conclude that there is not enough evidence to say the true mean lifetime of a fully charged battery is greater than 6 hours at a significance level of 5%.

Problem 6

Let X = # of hours per month working as a barista, =40 hours, =10 hours

Let a = $9/hour

So, = a= $9/hour40 hours = $360, = a= $9/hour10 hours= $90

Let Y = # of hours per month working as a tutor, =15 hours, =3 hours

Let b = $25/hour

So, = b= $25/hour15 hours = $375, = b= $25/hour3 hours= $75

Let E = total earnings for a month

==+= $360+$375 = $735

==+= += 13725 , so == $117.1537

So, E ~ N(735, 117.1537)

P(E > 850) = P(Z >) = P(Z >) = P(Z > 0.98) = 1 - P(Z0.98)

= 1-0.8365 = 0.1635

Problem 7

(a)

Let X = # of correct answers, n=10, p=, and X ~ Bin(10, )

P(X3) = 1 – P(X=0) – P(X=1) – P(X=2)

= 1 – – –

= 1 – 0.0563 – 0.1877 – 0.2816

= 0.4744

(b)

Let X = # of correct answers, n=100, p=, and X ~ Bin(100, )

Since np = 100= 25 > 10, n(1-p) = 100(1-) = 75 > 10, we can use normal approximation to binomial

= np = 25, === 4.33, and X N(25, 4.33)

P(X30) = P(X29.5) = P(Z) = P(Z) = P(Z1.04)

= 1– P(Z1.04) = 1 – 0.8508 = 0.1492

(c)

They are neither independent nor disjoint.

By definition, two events are independent if the occurrence of one event gives no information about whether or not the other event will occur. That is, the events have no influence on each other. However, in this case, if we know someone has obtained at least 35 correct answers (event B occurs), then we are sure that he has obtained at least 30 correct answers (event A must also occur). The occurrence of B alters the probability of A to 1. Thus, A and B are not independent.

By definition, two events are disjoint if it is impossible for them to occur together. However, suppose the student got 35 answers correct; then event A and B both occur. Thus, A and B are not disjoint.

Problem 8

(a)

Let X = midterm grade, Y = final exam grade

Given =80%, =16%, =73%, =12%

= r() = 0.63= 0.4725, and =-= 73 - 0.472580 = 35.2%

So =+x = 35.2 + 0.4725x

When x=65%, = 35.2+0.472565 = 65.9125%

That is, we predict a final grade of 65.9125% for a student who scored 65% on the midterm.

(b)

For every 1% increase in the midterm grade, we expect a 0.4725% increase in the student’s final exam grade.

(c)

This new observation would decrease the correlation. This is because it deviates largely from the regression line and is likely to increase scatter.

(d)

Let Y = final exam grade, =73%, =12%, and Y ~ N(73, 12)

P(Z)=0.25, = -0.67

=, =+= 73%+(-0.67)(12%) = 64.96%

So, the first quartile of the final exam grades is 64.96%

Problem 9

(a)

The response variable is students' scores on a reading test.

(b)

The factors are teaching methods and teachers.

(c)

A

(d)

A

Problem 10

(a)

B

(b)

D

(c)

C