Statistics and Experimental Design

Practice Final Exam

Time Limit: 3 hours

Name______

1) Dr. Atkins, a proponent of high-protein, high-fat, low-carbohydrate diets, uses the following argument to support his program. Over the last 20 years, the proportion carbohydrates in the American diet has steadily risen. Over that same period, the proportion of obese Americans has also risen. Therefore, a high carbohydrate diet - far from being healthy - is actually the cause of much of the obesity observed today. How persuasive do you find Dr. Atkins’ claim? Explain your answer including examples to illustrate your argument. (7 pts)

Dr. Atkins' argument is not persuasive because correlation does not imply causation. It is possible that other factors have also changed over the course of his study. For example, perhaps people exercise less than they did 20 years ago. This might be the cause of the increased heart disease rather than the dietary changes he examined.

2) Some researchers have found that mood affects the likelihood that a person will help a stranger; people in a good mood are more likely to help others than people in a neutral mood. You are interested in whether the relationship between mood and helping behavior holds equally for both males and females. Briefly describe an experimental design that will help answer this question. Be sure to operationally define the key experimental constructs (the relevant independent and dependent variables). What kind of analysis would you perform and what results from this analysis would be most relevant to the question proposed above? (7 pts)

Here is one example. The independent variable would be mood. This could be manipulated by standing in the campus center and randomly giving half of the people you see a free candy bar. The people who get the candy bar will be in the good mood condition; the people who don't get a candy bar will be in the bad mood condition. Have a confederate stand just outside the campus center and drop their books whenever a subject walks by. The dependent variable would be whether or not the subject helped. This study would use an experimental design. I should use a two-way ANOVA to analyze the data.
3) Frito-Lay recently conducted a product preview in which 4 adults were asked to taste and rate 3 new potato chip flavors that are being developed. The subjects rated each flavor on a 10-point scale, with higher values reflecting greater preference. The data are presented below:

Subject / Bacon Cheeseburger / Mocha Frappucino / Cajun / sum
Bob / 2 / 2 / 8 / 12
Carol / 7 / 4 / 6 / 17
Ted / 3 / 5 / 9 / 17
Alice / 4 / 1 / 9 / 14
mean = 4
sum(x) = 16
sum(x2) = 78 / mean = 3
sum(x) = 12
sum(x2) = 46 / mean = 8
sum(x) = 32
sum(x2) = 262

Conduct an ANOVA to determine whether there were differences in the subjects' preferences for the three potential chip flavors (Fcrit = 5.14). Your answer should include a statement of the null and alternative hypotheses, all relevant SS, the observed value of your test statistic, and a decision regarding the null. (18 pts)

SStotal = (78+46+262) - [(16+12+32)2 / 12]

= 386 - (602 / 12)

= 386 - 300= 86

SSbt= [(T2/n)] - (G2/N)

= (162 / 4) + (122 / 4) + (322 / 4) - 300

= 64 + 36 + 256 - 300= 56

SSwi = [(x2) - (T2/n)]

= [78 - (162 / 4)] + [46 - (122 / 4)] + [262 - (322 / 4)]

= (78-64) + (46-36) + (262-256)= 30

SSbs= [(P2/p)] - G2/N

= 122/3 + 172/3 + 172/3 + 142/3 -300

= 48 + 96.33 + 96.33 + 65.33 - 300= 6

SSe= SSwi - SSbs

= 30 - 6= 24

Anova Table

SourcedfSSMSFobsFcrit

Model 25628.07.05.14

Within 930

B/S 3 6

Error 624 4.0

Total1186

Ho: All three chip flavors preferred equally.

Ha: One flavor preferred over the others.

Decision: Reject the null because Fobs> Fcrit. Therefore, at least one of the flavors was rated differently from the others.

b) Assuming that Tukey’s HSD = 2.25, which levels of the independent variables are significantly different from one another? What does this information mean for Frito Lay? (4 pts)

I would conclude that the Cajun flavor was preferred relative to the other two which were not significantly different from one another.

4) An experiment was conducted to examine the effect of distraction on cognitive performance. Subjects in the experiment completed either an easy (single-digit addition) or a difficult (long division) math test. Half of the subjects were distracted during the test; they had to press a button on the keyboard every time a tone sounded. Both test difficulty (easy vs. hard) and distraction (present vs. absent) were between subjects variables. The dependent measure was the number of math problems solved correctly. Use the data in the table below (n = 5 in each treatment) to conduct a 2-way ANOVA to determine the effects of the two variables and the interaction on performance. Be sure to report AND interpret the results of all appropriate tests. (20 pts)

No Distract / Distract
Hard / (x) = 31
(x2) = 199 / (x) = 22
(x2) = 102
Easy / (x) = 39
(x2) = 307 / (x) = 42
(x2) = 356

SStotal = (x2) - G2/N

= (199+102+309+356) - [(31+22+39+42)2/20]

= 966 - 897.8= 66.2

SSmodel= [(T2/n)] - G2/N

= (312/5) + (222/5) + (392/5) + (422/5) - 897.8

= 192.2 + 96.8 + 304.2 + 352.8 - 897.8= 48.2

SSE= SST - SSM= 18.0

SSdistract= [(Ta2/n)] - G2/N

= [(31+39)2/10] + [(22+42)2/10] - 897.8

= 490 + 409.6 - 897.8= 1.8

SSease= [(Tb2/n)] - G2/N

= [(31+22)2/10] + [(39+42)2/10] - 897.8

= 280.9 + 656.1 - 897.8= 39.2

SSdxe= SSM - (SSA + SSB)= 7.2

ANOVA table

SourcedfSSMSFobsFcrit

Model 348.216.06714.2813.24

Error1618.0 1.125

Total1966.2

Distract 1 1.8 1.8 1.604.49

Ease 139.239.234.844.49

DxE 1 7.2 7.2 6.404.49

I would reject the null for the omnibus test; this indicates that at least one of my independent variables influenced math problem solving. I would fail to reject the null for distraction, meaning that distraction did not influence the number of math problems solved. I would reject the null for ease, meaning that the subjects solved more easy math problems than hard math problems. I would also reject the null for the interaction effect. This indicates that whereas distraction had little effect for easy problems, it significantly reduced the number of difficult problems that the subjects could solve.

5) As part of a larger study recently conducted at Amherst College, data was collected on the relationship between undergraduates’ anxiety levels and the number of assignments due by the end of the semester. Anxiety levels were collected on a 10-point scale, with higher numbers denoting more intense feelings of anxiety. The relevant data are presented below (n=25).

x (assign.) / y (anxiety) / x2 / y2 / x*y
 = 200 /  = 175 /  = 2024 /  = 1297 /  = 1480

a) Calculate the regression equation relating x to y? (10 pts)

SSxx = (x2) - [(x)2/n]SSxy = (x*y) - [(x*y)/n]

= 2024 - 2002/25 = 1480 - [(200*175)/25]

= 424 = 80

1 = SSxy / SSxx=80/424= .19

0 = mean(y) - 1*mean(x)=7 – .19*(8)=5.49

Regression Equation: E(y) = 5.49 + .19x

b) Is there a significant relationship between x and y (tcrit = 2.069)? (5 pts)

SSyy = (y2) - [(y)2/n]

= 1297 - 1752/25

= 72

SSe= SSyy - (SSxy*1)

= 72 - (80*.19)=56.91

s2= SSe / (n-2)=56.91 / (25-2)= 2.47

s= sqrt(s2)=sqrt(2.47)= 1.57

t= 1 - 0=.19 = 2.47

s / sqrt (SSxx)1.57 / sqrt (424)

tcrit = 2.064; because tobs > tcrit, I would reject the null and conclude that 1 is different from zero. Because 1 > 0, I would conclude that there is a direct relationship between anxiety and the number of assignments due; that is, the more assignments one has due, the more anxious that person will be.

c) Calculate r. Is this value significantly different from 0? How do you know? (5 pts)

r= SSxy / [sqrt(SSxx*SSyy)

= 80 / sqrt(424*72)=80 / 174.72=.458

The correlation coefficient is significantly different from 0. I know that based on the test statistic calculated in part b).

d) Which of the two variables is the dependent variable? (2 pts)

Anxiety level was the dependent variable.

e) How much would you expect a person's stress level to increase if their professor suddenly added one extra assignment to their workload? Base your answer on the regression equation, not your personal opinion about whether the professor should be killed. (2 pts)

Based on the slope of the regression equation, I would expect my stress level to increase by .19 points on the 7-point scale.

6) My brother just took up fly-fishing. He asked me to analyze some data that he collected over the last year. In particular, he would like some help determine what factors influence the size of the fish he catches. The dependent measure was the total pounds of fish caught on a given day. The independent variables were: DEPTH of the stream, speed of the CURRENT, WATER temperature, AIR temperature and % of CLOUD cover (0% is a cloudless day; 100% is a totally overcast day). The results of a multiple regression analysis appear below. Use this information to answer the following questions.

Source df SS MS F p-value

Model 6 334.50 55.75 7.24 .0257

Error 41 315.70 7.70

Total 47 650.20

Variable df Param Est. Std Err tp-value

Intercept 1 10.70 1.180 9.07 .0032

Depth 1 1.32 0.410 3.22 .0679

Current 1 -2.15 1.060-2.03 .1860

Water 1 -0.05 0.012-4.07 .0497

Air 1 -0.05 0.027-1.85 .2963

Cloud 1 0.13 0.03 4.33 .0371

a) Which of the identified factors are significant predictors of fishing success? Explain. (6 pts)

The factors with a p-value less than .05: water temperature and cloud cover.

b) What is the regression equation relating the predictor variables to fishing success? (4 pts)

E(y) = 10.70 + 1.32D – 2.15Cu – .05W – .05A + .13Cl

c) Which single factor is the best predictor of fishing success? Explain your answer. (3 pts)

Cloud cover. I know this because it has the smallest p-value / largest tobs.

d) In general, is one likely to catch more fish in a slow-moving, or fast-moving stream? Explain. (3 pts)

The Beta parameter relating current and fish catching is negative. That means there is an inverse relationship between current and fish catching. Consequently, one is more likely to catch fish in a slow-moving stream.

e) I went fishing with my brother last week. The stream was 4 feet deep, the current was 6 knots, the water temperature 54, the air temperature was 60 degree and the cloud cover was 60%. How many pounds of fish would you expect me to catch? (4 pts)

E(y) = 10.70 + 1.32D – 2.15Cu – .05W – .05A + .13Cl

= 10.70 + 1.32(4) – 2.15(6) – .05(54) – .05(60) + .13(60)=5.18