Data Analysis (Math 206) Exam 2Name

Spring 2008 - Hartlaub

Solve all of the problems below and be careful not to spend too much time on a particular part. The point values for each part are in parentheses. The SAS program is located in the folder p:\data\math\hartlaub\dataanalysis.

1. The computer output below provides a summary of data on ear pierces and tattoos for a sample of 678 college women. The ear pierce response is the total number of pierces for a woman, and this variable has been categorized.

Tabulated statistics: Pierces, Tattoo

Rows: Pierces Columns: Tattoo

No Yes All

2 or less 245 19 264

42.91 17.76 38.94

222.3 41.7 264.0

3 or 4 210 26 236

36.78 24.30 34.81

198.8 37.2 236.0

5 or 6 91 32 123

15.94 29.91 18.14

103.6 19.4 123.0

7 or more 25 30 55

4.38 28.04 8.11

46.3 8.7 55.0

All 571 107 678

100.00 100.00 100.00

571.0 107.0 678.0

Cell Contents: Count

% of Column

Expected count

Pearson Chi-Square = 90.544, DF = 3, P-Value = 0.000

Likelihood Ratio Chi-Square = 74.137, DF = 3, P-Value = 0.000

a.Describe the relationship between these two variables. (10)

b.Write the null and alternative hypothesis for the researchers. (10)

c.Conduct a hypothesis test for the hypotheses you specified in part (b). Use for the level of significance. (10)

d.An article on the Gallup Poll website entitled “SOCIAL AUDIT, Gambling in America” included a table of summary responses for a sample of teenagers and a sample of adults. Each individual was asked to answer the question, “Generally speaking, do you approve or disapprove of legal gambling or betting?” Would you recommend the chi-square test of association for these data? Explain. (10)

2. A poll of 811 adults aged 18 or older asked about purchases that they intendedto make for the upcoming holiday season. One of the questions asked whatkind of gift they intended to buy for the person on whom they intended tospend the most. Clothing was the first choice of 487 people.

a.What proportion of adults said that clothing was their first choice? (5)

b.What are the odds that an adult will say that clothing is his or her firstchoice? (5)

c.What proportion of adults said that something other than clothing wastheir first choice? (5)

d.What are the odds that an adult will say that something other thanclothing is his or her first choice? (5)

e.How are your answers to parts (a) and (d) related? (5)

Researchers collected a random sample of 500 birth records from the North CarolinaStateCenter for Health and Environmental Statistics in 2005. The variables of most interest are:

Variable Label / Description
fage / Age of Father (years)
mage / Age of mother (years)
weeks / Completed Weeks of Gestation (weeks)
visits / Number of prenatal visits to doctor
marital / Marital status (1=married, 2=not married)
gained / Weight gained during pregnancy (pounds)
lowbw / 0=infant was not low birth weight
1=infant was low birth weight (2500 grams or less – about 5.5 lbs or less)
tpounds / Weight of child (pounds)
premie / 0=infant was not premature
1=infant was premature
premature defined at 36 weeks or sooner
fewvisits / 0=6 or more prenatal doctor visits
1=0 to 5 prenatal doctor visits

Use the SAS program births.sas to answer the following questions.

3. One analyst was interested in a possible relationship between the ages of the parents.

a.Is female age (mage) useful at predicting male age (fage)? Test the appropriate hypotheses and comment on the fit of the model. (20)

b.Identify the 95% confidence interval for the mean change in the father’s age for a 1 year increase in the mother’s age. (5)

4. Another analyst was interested in predicting the weight of the child.

a.Do the automatic search procedures (forward, backward, and stepwise) identify the same set of predictor variables for the final model? Explain. (10)

b.Identify the best two variable model based on this data for predicting the weight of a baby. (10)

c.What model would you recommend for predicting the weight of a baby? (5)

d.Would you recommend any additional steps to the analyst or do you think the SAS code provides a complete analysis for predicting the weight of a baby? (5)

5. A medical doctor is most interested in predicting the probability of a low birth weight baby.

  1. Fit the logistic regression of low birth weight (lowbw) on weight gained during pregnancy (gained). Formally identify the regression model and report estimated coefficients and their standard errors. (10)
  2. Test whether the coefficient of weight gained during pregnancy is 0. Report the p-value and your conclusion. (10)
  3. Provide a point and interval estimate for the odds ratio. Be sure to identify the odds ratio and interpret your interval for the medical doctor. (10)
  4. Consider the multiple logistic regression model of low birth weight (lowbw) on weight gained during pregnancy (gained), age of the father (fage), age of the mother (mage), completed weeks of gestation (weeks), number of prenatal visits (visits), and whether or not the baby was premature (premie). How many parameters does this logistic regression model include? (10)
  5. Write down the equation for the fitted values, that is, the estimated logistic probabilities. (5)
  6. Assess the significance of the slope coefficients for the variables in the model using the likelihood ratio test. (10)
  7. Use the Wald statistics to assess the significance of the individual slope coefficients for the variables in the model. Would you suggest using a reduced model that eliminates certain variables? Explain. (15)

6. Another analyst is interested in considering an ANCOVA model to examine the possible effects of marital status and premature birth on the total weight of the baby, using weight gained during pregnancy as a covariate. Specify this model. (10)