PubH 6414 Fall2011 Homework 3. (20 points)

We encourage you to work together in computing and discussing the problems. However, each student is expected to independently write up the submitted assignment using her or his own computing and giving explanations in her or his own words. Identical or nearly identical homework submissions will not receive credit.

·  Turn in this completed Word document in class by the homework due date.

·  You may use R commander to do the calculations needed for each question. Paste in ONLY the parts of the output needed to answer the question. (You may use another statistical software package to do the calculations, if you prefer, but the instructor and TAs cannot provide assistance with other packages.)

·  Data needed for this homework assignment are on the website link: http://www.biostat.umn.edu/~susant/FALL11PH6414HMK.html .

Problem 1: Multiple Choice Questions. (2 points)

Part A. Give the appropriate range for the following statistics.

Choices:

a) Any number between -1 and +1

b) Any number between 0 and 1

c) Any non-negative whole number (0, 1, 2, 3, …)

d) Any positive number between 0 and infinity

e) Any positive or negative number

A1. Mean:

A2. Variance:

A3. Frequency (count):

A4. Proportion:

A5. Correlation coefficient (r):

A6. Coefficient of determination (r2):

A7. Experimental event rate (EER):

A8. Relative Risk:

A9. Odds of disease:

A10. Odds ratio:

Part B. Select the appropriate answer for each question.

Choices:

a)  0.0

b)  any number above 0.0

c)  1.0

d)  any number above 1.0

e)  -1.0

B1. What value of the correlation coefficient indicates that there is no linear association between X and Y?

B2. What value of the relative risk indicates that there is no association between exposure and outcome?

B3. What value of the odds ratio indicates that there is no association between exposure and outcome?

B4. What value of the standard deviation indicates that there is no variation at all in the dataset (highly suspicious)?

Problem 2: Odds Ratios. (4 points)

From their own experiences, Kaku and Lowenstein (1990) noted that stroke seemed to be occurring more frequently in young people who used recreational drugs than in those who did not. To investigate this problem, they identified all stroke patients between 15 and 44 years of age admitted to a given hospital, and selected sex- and age-matched controls from patients admitted to the hospital with other acute medical or surgical conditions. Of the 214 stroke patients, 73 had used recreational drugs, and of the 214 controls, 18 had used recreational drugs.

Part A. What design was used in this study?

Part B. Fill in the 2 x 2 table describing this study:

Stroke / Control / Total
Drug Use
No Drug Use
Total

Part C. Calculate the odds of exposure (use of recreational drugs is the exposure) for the cases (stroke patients). Calculate the odds of exposure for the controls. Calculate the exposure odds ratio for cases compared to controls. Provide an interpretation of the odds ratio. (Note: In statistics, interpreting the result means to state the result and its meaning in a sentence.)

Part D. Calculate the exposure odds ratio (use of recreational drugs is the exposure) for controls compared to cases. Provide an interpretation for this odds ratio. How does it differ from the odds ratio you calculated in Part C?

Part E. Calculate the odds of stroke for the exposed (to recreational drugs). Calculate the odds of stroke for the unexposed. Calculate the DISEASE odds ratio for the exposed compared to the unexposed. Provide an interpretation for this odds ratio. How does it differ from the odds ratios calculated in Parts C and D above? Why?

Part F. Do the study results confirm the observations made by Kaku and Lowenstein? Why or why not?

Problem 3. More Odds Ratios. (3 points)

Postmenopausal women who develop endometrial cancer tend to be heavier than women who do not develop the disease. One proposed explanation is that heavier women produce more endogenous estrogens and estrogen exposure is known to be associated with endometrial cancer. In order to test this proposed explanation, a case-control study was designed to explore whether the association between endometrial cancer and exogenous estrogens from hormone replacement therapy (HRT) differed for women in different weight groups. Data from the study are in the table below.

HRT
Weight (kg) / Yes / No
< 57
Cases / 20 / 12
Controls / 61 / 183
57 - 75
Cases / 37 / 45
Controls / 113 / 378
> 75
Cases / 9 / 42
Controls / 23 / 140

Part A. Calculate the exposure odds ratio for cases compared to controls for each weight group.

Part B. Provide an interpretation of the odds ratio for each weight group.

Part C. Which weight group has the highest odds ratio of exposure to HRT for cases compared to controls?

Part D. Are these results consistent with the proposed theory (mentioned above) that heavier women produce more endogenous estrogens and thus have a higher risk of endometrial cancer?

Problem 4: Relative Risks. (3 points)

Two clinics are participating in a clinical trial. A new drug is being tested to see if it can reduce allergy symptoms for patients. Patients were either given the new treatment drug or a placebo and the outcome was recorded. A success is considered to be a considerable decrease in allergy symptoms defined by a physician and the study protocol. Use the data below to answer the questions.

Success / Failure / Total
Clinic 1 / Treatment drug / 22 / 14 / 36
Placebo / 10 / 18 / 28
Total / 32 / 32 / 64
Success / Failure / Total
Clinic 2 / Treatment drug / 16 / 18 / 34
Placebo / 19 / 17 / 36
Total / 35 / 35 / 70

Part A. What design was used in this study?

Part B. Calculate the relative risk of successful allergy treatment for the treatment drug compared to the placebo for each clinic.

Part C. Provide an interpretation of each relative risk you calculated.

Part D. Does it appear that there are differences in response to the drug for clinic 1 versus clinic 2? Do you think this is a potential problem for the study? Why or why not?

Problem 5: More Relative Risks. (4 points)

A study was conducted to evaluate the effects of oral contraceptive (OC) use on heart disease in 40 - 44 year old women. 5000 women were OC users at baseline, and 10,000 women did not use OC at baseline. The women were followed for 3 years. During the 3 years, 13 of the women in the OC users group developed a myocardial infarction (MI) and 7 women in the non-OC users group developed an MI.

Part A. What design was used in this study?

Part B. Fill in the 2 x 2 table describing this study:

MI / No MI / Total
OC Use
No OC Use
Total

Part C. Calculate the relative risk of MI for OC users compared to non-OC users. Provide an interpretation of this result.

Part D. Calculate the Absolute Risk Increase (ARI) for MI. Provide an interpretation of this result.

Part E. Calculate the Number Needed to Harm (NNH) for one MI. Provide an interpretation of this result.

Part F. Is it reasonable to calculate the Relative Risk Reduction for this study? Explain why or why not.

Problem 6: Correlation. (4 points)

Researchers are searching for a predictor for total body fat. They are first interested in the relationship between percent body fat and tricep skinfold thickness, midarm circumference, and thigh circumference. Measurements for percent body fat were taken via a water immersion method. Data was collected on 20 healthy women aged 25-34.

§  Data on percent body fat, tricep skinfold thickness, midarm circumference, and thigh circumference for the 20 healthy women can be found in the ‘HW3 Fall2011 Datasets.txt’ file. Source: Kutner, Nachtsteim, Neter, and Li. Applied Linear Statistical Models, 5th ed. McGraw-Hill/Irwin, 1996, p. 257.

Part A. What design was used in this study?

Part B. Graph the data of the relationship between percent body fat and each of the three remaining variables. Put percent body fat on the y-axis and tricep skinfold thickness, midarm circumference, and thigh circumference on the x-axis. You should have 3 graphs total. Paste the graphs here.

Part C. Comment on the direction and strength of the relationship in each graph. Compare the relationships in the three graphs.

Part D. Calculate the correlation coefficient between percent body fat and each of the three remaining variables. How do the calculated correlation coefficients compare with your answers in Part C?

Part E. Based on your answers in Parts C and D, which variables may be of most interest to the investigators in their search for a predictor for body fat percentage? Which variables may be of the least interest? Why?

Part F. If a high correlation coefficient exists between body fat and tricep skinfold thickness, midarm circumference, and/or thigh circumference, can we say that tricep skinfold thickness, midarm circumference, and/or thigh circumference leads to high percent body fat? Why or why not?