STA 240 HW #6page 1
STA 240 – Homework 6
1.Marijuana and Testosterone
(a)Equality of means among controls, mild users, and heavy users.
H0: control=mild=heavy
HA: not all j are equal
Compare to an F on 2 and 27 df. This yields a p-value < 0.0001. Therefore, we reject H0 and conclude that at least two means are different.
(b)Pairwise comparison of means between users and controls and between mild and heavy users
H0: all=users
HA: all is not equal to users
There are 2 ways to do this problem. First, the null can be written as a contrast. This is the most logical approach following the results in (a). In this case, we have Ho: mu_control - .5 (mu_mild+mu_heavy). In this case, the calculated value of T is 8.47, which can be compared to the quantiles of the t on 37df for a 2-sided test. We reject Ho at alpha=.05.
A second way is to use a two-sample t-test to compare the “control” to the “users”. Note this test is based on 38df, and that the s_pooled is based on the pooled standard deviation of the “control” and “user” columns. This spooled will be different than that you found in (a).
H0:mild=heavy
HA: mild does not equal heavy
Reject H0 if |T|>t0.95,18 which is approximately equal to 2.101
T>t, therefore we can reject H0 and conclude that, based on this sample, mild marijuana users and heavy marijuana users do not have the same mean testosterone levels (p-value < 0.001, 2-sample, 2-sided t-test).
(c)Tukey method to form 95% confidence intervals for all pairwise comparisons of means
control-heavy
CIcontrol-heavy: (311, 555)
control-mild
CIcontrol-mild: (125, 353)
mild-heavy
CImild-heavy: (57, 331)
We can conclude that a significant difference exists in mean plasma testosterone between (1) control and heavy users, (2) control and mild users, and (3) mild and heavy users. All 3 confidence intervals indicate that there are differences between each of the pairs of groups. We are 95% confident that all 3 intervals capture their true population values, and thus we are 95% confident that all three pairwise differences are greater than zero.
(d)Scheffe methods to obtain simultaneous 95% confidence intervals of all contrasts
1-2|control-mild
CIcontrol-heavy: (120, 359)
1-3|control-heavy
CIcontrol-heavy: (305, 561)
1- ½ (2+3)|control-all
CIcontrol-all: (225,427)
(e)Bonferroni methoed for planned comparisons Here k=3, since we have 3 planned comparisons. The t-statistic to use in the half-width calculation is t(1-.05/6, 37)=2.51. The three intervals are: mu1-mu2: [121,357]; mu1-mu3: [307,559]; mu1-.5(mu2+mu3): [236,436].
(f)Results interpretation in context of LA Times article The data support the LA times article. Both groups of users have lower testosterone levels than non-users, and heavy users have lower testosterone than mild users. This shows that users of marijuana tend to have lower levels of testosterone than non-users. In addition, there is evidence of a significant difference in the hormone level between mild and heavy users.
2. Black-Legged Kittiwake Study
(a)
This regression shows a positive trend, where population increases as area increases (1 = 3.302 breeding pairs/km2 and 0 = -735 breeding pairs). However, not all the points fit to the line; there are several points that spread out from the line as x and y increase.
(b)
Using the diagnostic plots above, these data appear to violate 3 assumptions of linear regression:
The values of the residuals at each value of y do not have the same standard deviation.
The values of y do not appear normally distributed, and outliers are apparent from the qq plot.
(c)
Variable / Coefficient / Standard Error / t-Statistic / p-Value(log) Constant / 2.6272 / 1.2936 / 2.0310 / 0.0558
(log) Area / 0.6783 / 0.2004 / 3.3845 / 0.0029
Estimate of = 1.044 (20 df)
A summary analogous to that on page 180 is as follows:
Log of est. breeding pairs=2.63+0.68* log of est. foraging area
(1.29) (0.20)
Estimated SD of log breeding pairs = 10.44 (20df)
(this writeup format can be seen in published articles.)
(d)H0: 1 = 0
HA:1 0
Based on the S-plus output above, we can reject H0 under these conditions ( = .05). This is convincing evidence that there is a linear relationship between log of population and log of foraging area (p-value = 0.0029, two-sided t-test).
(e)H0: 1 = 0
HA:1 > 0
Reject H0 if T> t.95, 20 = 1.725
According to S-plus printout, 1 = 0.6783 and sd = 0.2004
T> t, therefore we can reject H0 under these conditions ( = .05). There is strong evidence that log population levels increase with increasing levels of log foraging area (p-value = 0.0015, one-sided t-test).
(f){log (population) log (area)} = 0 + 1 log(area), then median{log (population) log (area)}= exp (0) X1. From page 208, . Then a doubling of the foraging area is associated with a multiplicative change of = 1.6 in the median of breeding pairs. Doubling the foraging area is associated with increasing the median of breeding pairs by a factor of 1.6.
3. Age and Growth Characteristics of Mussel Species
(a)
(b)
The simple regression line appears to have a reasonably good fit to the data. The QQ normal plot shows that the residuals have a slightly heavy right tail, otherwise they are approximately normally-distributed. The plot of residuals vs. age shows that most points are evenly spread about zero, although there is a general widening of the spread as age increases.
(c)Lack of fit test:
HO: the simple regression model is adequate
HA: the simple regression model is not adequate
Source of Variation / Sum of Squares / df / Mean Square / F-statistic / p-valueBetween Groups / 106.295 / 13 / 8.176541 / 11.37358 / 9.840507e-007
Regression / 96.14173 / 1 / 96.14173 / 125.6489 / 8.612e-013
Lack of Fit / 10.153 / 12 / 0.8461 / 1.1769 / 0.10
Within Groups / 15.097 / 21 / 0.718906
Total / 121.392 / 34
F-statistic = Mean Square lack of fit / Mean Square Within Groups = (0.8461)/(0.7189) = 1.1769
p-value > 0.10
Therefore, we do not reject the null hypothesis that the simple regression model is adequate.