1. A study investigating the association between spontaneous abortion and various risk factors showed the following data on the risk of spontaneous abortion for women with different level of alcohol use:

Alcohol use
(Drinks per week) / Number of Pregnancies / Spontaneous Abortions
0 / 33164 / 6793
1-2 / 9099 / 2068
3-6 / 3069 / 776
7-20 / 1527 / 456
21+ / 287 / 98

a)For each level of alcohol consumption, estimate the rate of the spontaneous abortion for women who become pregnant.

b)If spontaneous abortion is independent of alcohol use, how many pregnancies would you expect to have spontaneous abortion for women with 7-20 drinks per week?

c)Do women who become pregnant have different rates of spontaneous abortion based on their alcohol use? Perform a hypothesis test using α = 0.01.

d)Suppose that the investigator is interested in comparing the rates of spontaneous abortion for pregnant women with no alcohol use and those with very high alcohol consumption of 21+ drinks per week.

  1. Construct a 95% confidence interval for the difference in the rates of spontaneous abortion between these two groups of women.
  2. Based on the confidence interval, do you think the rates of spontaneous abortion are different for these two groups of women? Justify your answer.
  1. FEV (forced expiratory volume) is an index of pulmonary function that measures the volume of air expelled after one second of constant effort. The data fev.csv contains determinations of FEV on 654 children ages 3-19 who were seen in the Childhood Respiratory Disease Study in East Boston, Massachusetts. The variables in the data include:

StudyID: subject’s study ID number

Age: age in years

FEV: FEV in liters

Height: height in inches

Sex:Male or Female

Smoker: non = nonsmoker, Current = current smoker

a)The investigator is interested in examining whether FEV is associated with age.

  1. Make a boxplot of the FEV for children with age 3-7 years, 8-10 years, and 11-12 years, and 13 or above. Make sure to sort the boxes on the plot in the increasing order of age. Does it appear that the FEV is the same for children from these age groups?
  2. Is FEV the same across the four age groups? Perform a hypothesis test to answer the question. Use α = 0.05.
  3. Is it appropriate to perform the hypothesis test in ii)? Justify your answer.

b)The investigator is also interested in how height is associated with age for females and males.

  1. Construct the scatter plot of height against age, with different symbols for males and females. What is the relationship between height and age for males? For females? What are the similarities and differences in the relationship between height and age for males and females?
  2. Regardless of what you observed in i), fit the regression model with height as the response and age, sex, and the interaction between age and sex as the independent variable. What is the fitted regression equation for males? For females?
  3. Is the regression equation between age and height parallel for males and females?Perform a hypothesis test to answer the question. Use α = 0.05.
  4. Is it appropriate to use the above regression model? Justify your answer. If the regression model is not appropriate, describe how you might modify the model so that it better describes the relationship between height and age in the data.

c)The investigator is also interested in examining whether FEV is more strongly related to sex, smoking status, age, or height.Carry out appropriate statistical analysis to answer the question.

1