STA 240 HW #6page 1

STA 240 – Homework 6

1.Marijuana and Testosterone

(a)Equality of means among controls, mild users, and heavy users.

H0: control=mild=heavy

HA: not all j are equal

Compare to an F on 2 and 27 df. This yields a p-value < 0.0001. Therefore, we reject H0 and conclude that at least two means are different.

(b)Pairwise comparison of means between users and controls and between mild and heavy users

H0: all=users

HA: all is not equal to users

There are 2 ways to do this problem. First, the null can be written as a contrast. This is the most logical approach following the results in (a). In this case, we have Ho: mu_control - .5 (mu_mild+mu_heavy). In this case, the calculated value of T is 8.47, which can be compared to the quantiles of the t on 37df for a 2-sided test. We reject Ho at alpha=.05.

A second way is to use a two-sample t-test to compare the “control” to the “users”. Note this test is based on 38df, and that the s_pooled is based on the pooled standard deviation of the “control” and “user” columns. This spooled will be different than that you found in (a).

H0:mild=heavy

HA: mild does not equal heavy

Reject H0 if |T|>t0.95,18 which is approximately equal to 2.101

T>t, therefore we can reject H0 and conclude that, based on this sample, mild marijuana users and heavy marijuana users do not have the same mean testosterone levels (p-value < 0.001, 2-sample, 2-sided t-test).

(c)Tukey method to form 95% confidence intervals for all pairwise comparisons of means

control-heavy

CIcontrol-heavy: (311, 555)

control-mild

CIcontrol-mild: (125, 353)

mild-heavy

CImild-heavy: (57, 331)

We can conclude that a significant difference exists in mean plasma testosterone between (1) control and heavy users, (2) control and mild users, and (3) mild and heavy users. All 3 confidence intervals indicate that there are differences between each of the pairs of groups. We are 95% confident that all 3 intervals capture their true population values, and thus we are 95% confident that all three pairwise differences are greater than zero.

(d)Scheffe methods to obtain simultaneous 95% confidence intervals of all contrasts

1-2|control-mild

CIcontrol-heavy: (120, 359)

1-3|control-heavy

CIcontrol-heavy: (305, 561)

1- ½ (2+3)|control-all

CIcontrol-all: (225,427)

(e)Bonferroni methoed for planned comparisons Here k=3, since we have 3 planned comparisons. The t-statistic to use in the half-width calculation is t(1-.05/6, 37)=2.51. The three intervals are: mu1-mu2: [121,357]; mu1-mu3: [307,559]; mu1-.5(mu2+mu3): [236,436].

(f)Results interpretation in context of LA Times article The data support the LA times article. Both groups of users have lower testosterone levels than non-users, and heavy users have lower testosterone than mild users. This shows that users of marijuana tend to have lower levels of testosterone than non-users. In addition, there is evidence of a significant difference in the hormone level between mild and heavy users.

2. Black-Legged Kittiwake Study

(a)

This regression shows a positive trend, where population increases as area increases (1 = 3.302 breeding pairs/km2 and 0 = -735 breeding pairs). However, not all the points fit to the line; there are several points that spread out from the line as x and y increase.

(b)

Using the diagnostic plots above, these data appear to violate 3 assumptions of linear regression:

The values of the residuals at each value of y do not have the same standard deviation.

The values of y do not appear normally distributed, and outliers are apparent from the qq plot.

(c)

Variable / Coefficient / Standard Error / t-Statistic / p-Value
(log) Constant / 2.6272 / 1.2936 / 2.0310 / 0.0558
(log) Area / 0.6783 / 0.2004 / 3.3845 / 0.0029
Estimate of  = 1.044 (20 df)

A summary analogous to that on page 180 is as follows:

Log of est. breeding pairs=2.63+0.68* log of est. foraging area

(1.29) (0.20)

Estimated SD of log breeding pairs = 10.44 (20df)

(this writeup format can be seen in published articles.)

(d)H0: 1 = 0

HA:1 0

Based on the S-plus output above, we can reject H0 under these conditions ( = .05). This is convincing evidence that there is a linear relationship between log of population and log of foraging area (p-value = 0.0029, two-sided t-test).

(e)H0: 1 = 0

HA:1 > 0

Reject H0 if T> t.95, 20 = 1.725

According to S-plus printout, 1 = 0.6783 and sd = 0.2004

T> t, therefore we can reject H0 under these conditions ( = .05). There is strong evidence that log population levels increase with increasing levels of log foraging area (p-value = 0.0015, one-sided t-test).

(f){log (population) log (area)} = 0 + 1 log(area), then median{log (population) log (area)}= exp (0) X1. From page 208, . Then a doubling of the foraging area is associated with a multiplicative change of = 1.6 in the median of breeding pairs. Doubling the foraging area is associated with increasing the median of breeding pairs by a factor of 1.6.

3. Age and Growth Characteristics of Mussel Species

(a)

(b)

The simple regression line appears to have a reasonably good fit to the data. The QQ normal plot shows that the residuals have a slightly heavy right tail, otherwise they are approximately normally-distributed. The plot of residuals vs. age shows that most points are evenly spread about zero, although there is a general widening of the spread as age increases.

(c)Lack of fit test:

HO: the simple regression model is adequate

HA: the simple regression model is not adequate

Source of Variation / Sum of Squares / df / Mean Square / F-statistic / p-value
Between Groups / 106.295 / 13 / 8.176541 / 11.37358 / 9.840507e-007
Regression / 96.14173 / 1 / 96.14173 / 125.6489 / 8.612e-013
Lack of Fit / 10.153 / 12 / 0.8461 / 1.1769 / 0.10
Within Groups / 15.097 / 21 / 0.718906
Total / 121.392 / 34

F-statistic = Mean Square lack of fit / Mean Square Within Groups = (0.8461)/(0.7189) = 1.1769

p-value > 0.10

Therefore, we do not reject the null hypothesis that the simple regression model is adequate.