Chapter 3-15. Homework Problems

Logging Results

These homework problems needs to be turned in if you are taking the course for credit. Please send the log file as an e-mail attachment to the course TA:

@hsc.utah.edu (if taking class at U of Utah)

That is, log the contents of the Results window while working on this, and e-mail the log file.

To begin logging:

Click on the scroll icon (4th from left on 2nd row) on the menu bar

File name: homework1.smcl < for example >

Save

This begins logging. When you exit Stata, everything you did that session will be saved in this file. (note: the graphs will not show up in the log file, but the commands that created them will, and that is all that is needed to tell that you did it correctly.)

(Homework #1)

3-1. Introduction to epidemiologic thinking

Problem 1) Read article.

Read the article, Gary Taubes, Epidemiology faces its limits, Science 1995;269(Jul 14):164-169, which is on the course CD in the articles subdirectory.

This is a fun article to get you thinking about epidemiology. One thing you will notice is the importance that epidemiologists frequently place on the size of the effect, even though Rothman has repeatedly reminded epidemologists that “strength of a cause cannot be equated to the biology of causation.”

Problem 2) Email the course TA simply stating you read the assigned article.

3-2. Sufficient/component cause theory of disease

No problems.

3-3. Hill’s causal criteria

No problems.

______

Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010.


3-4. Logic and errors

No problems.

(Homework #2)

3-5. Effect measures

Problem 1) Read in the data file

Open the file evans.dta inside Stata, which is the same dataset used in Chapter 3-5. (we did this in Chapter 3-5, p. 33)

Problem 2) Compute a risk ratio

Compute risk ratio for high blood pressure as the exposure variable and CHD as the disease variable. (hint: we did something similar in Chapter 3-5, p. 34, and variable name descriptions are on p. 32)

Problem 3) Compute an attributable fraction

Using the display command, compute the attributable fraction exposed, to gain some practice with the formula. Your answer should agree with the “Attr. frac. ex.” line of the output for Problem 2. (hint: the formula for AF is shown in Chapter 3-5, on page 28. An example of a display command, although not the one you need for this problem, is shown in Chapter 3-5, on page 34.)

(Homework #3)

3-6. Study designs

Problem 1) Computing a Prevalence Odds Ratio

Look at Table 3 in Lee et al (2006)[on course CD]. Compute a prevalance odds ratio (POR) for increased LV mass (LV hypertrophy), the line showing 177 (17.0) and 123 (26.9), with “no parental heart failure” as the nonexposed, or referent, group.

[hint: this is done the same way that an odds ratio is calculated for a case-control study, Chapter 3-6, page 19]

Just to give you immediate feedback, your answer should be POR = 1.79.

3-7. Randomization using Excel

No problems.

(Homework #4)

3-8. Bias and confounding

Problem 1) Assessing Confounding

Using the evans.dta dataset, determine if the smoking-CHD association is confounded by cholesterol (simply using cholesterol as a continuous variable). Do this by computing an unadjusted logistic regression and then an adjusted logistic regression. Finally, use a display command to see if the effect changed by more than 10%.

[hint: something similar was done in Chapter 3-8, pages 21 and 22]

3-9. Random error and statistics

No problems.

3-10. Crude analysis

No problems.

(Homework #5)

3-11. Stratified analysis

Evans County Dataset (evans.dta)

The data are from a cohort study in which n=609 white males were followed for 7 years, with coronary heart disease as the outcome of interest.

Source

dataset to accompany Kleinbaum and Klein (K&K chapter 2)

http://www.sph.emory.edu/~dkleinb/logreg2.htm#data

Brief Description

Data are from a cohort study in which n=609 white males were followed

for 7 years, with coronary heart disease as the outcome of interest.

Codebook

outcome

chd coronary heart disease (1=presence, 0=absence)

predictors

cat catecholamine level (1=high, 0=normal)

age age in years (continuous)

chl cholesterol (continuous)

smk smoker (1=ever smoked, 0=never smoked)

ecg electrocardiogram abnormality (1=presence, 0=absence)

dbp diastolic blood pressure (continuous)

sbp systolic blood pressure (continuous)

hpt high blood pressure (1=presence, 0=absence)

defined as: DBP ³ 160 or SBP ³ 95

data management

id subject identifier

Problem 1) Compute the Mantel-Haenzsel summary risk ratio with multiple stratification

variables

Although not done in Chapter 3-11, you can ask for a set of stratification variables, not just one, when computing the summary risk ratio. This will give the summary Mantel-Haenzsel risk ratio, along with all of strata created by the possible combinations of categories of the stratification variables. Compute the summary risk ratio for the smk-CHD association, controlling for cat, hbp, and ecg. [Hint: an example is given in Chapter 3-11, page 5, where one stratification variable was specified—you will simply need to provide three stratification variables.]

Problem 2) Compute a Modified Poisson Regression

Fit a modified Poisson regression to get the adjusted risk ratio for the smk-CHD association, controlling (adjusted) for cat, hbp, and ecg. [Hint: an example is given in Chapter 3-11, page 12, where one covariate, or potential confounder, variable was specified.]

Problem 3) Assessing confounding

Fit a modified Poisson regression to get the risk ratio for the smk-CHD association, without the three covariates.

Look at the crude and adjusted RRs, using either the two Poission models, or with the table from problem 1. Using the 10% change in effect rule, decide if the smk-CHD association was confounded by cat, hbp, and ecg. [Hint: something like this was done in Chapter 3-8, page 22.]

Enter you decision into the Stata log file by adding a comment to the log file, done by running either of the following comment lines in the Command window.

* Yes, confounded

or

*No, not confounded

3-12. Standardization

No problems.

(Homework #6)

3-13. Sensitivity (bias) analysis

We will use the Evans County Dataset (evans.dta) to conduct a sensitivity analysis. The codebook is shown above for the Chapter 3-11 homework problem.

Problem 1) logistic regression model

Suppose, or pretend, we are interested in how important age and high blood pressure are as predictors of an electrocardiogram abnormality. Open the data file evans.dta inside Stata. Fit the following logisitic regression model.

logistic ecg age hpt

Problem 2) logistic regression model with sensitivity analysis

Suppose we are not confident that the ecg was read correctly. As one scenario of a sensitivity analysis, we are interested in the effect of misclassification error for the ecg, where we consider that 10% of the time the ecg reader did not detect an abnormality when it was actually present, and 5% of the time the ecg reader detected an abnormality when it was actually absent.

Translate these numbers into a sensitivity and specificity and re-fit the logistic regression model that adjusts for these misclassification errors. That is, fit the following model with the ?’s replaced with proportions representing sensitivity and specificity of the ecg for this scenario.

logitem ecg age hpt ,sens(?) spec(?)

hints: 1) use the definitions for sensitivity and specificity of the disease

misclassication (Chapter 3-13, page 1) to help you think about it

2) review the Madger and Hughes example at the top of page 10 in

Chapter 3-13.

(Homework #7)

3-14. Case-cohort study design

Case-Cohort Study with Time to Event Data (density case-control design)

The assignment is to conduct a case-cohort study of the Framingham Heart Study dataset, 2.20.Framingham.dta.

The dataset comes from a long-term follow-up study of cardiovascular risk factors on 4699 patients living in the town of Framingham, Massachusetts. The patients were free of coronary heart disease at their baseline exam (recruitment of patients started in 1948).


Data Codebook

Baseline exam:

sdp systolic blood pressure (SBP) in mm Hg

dbp diastolic blood pressure (DBP) in mm Hg

age age in years

scl serum cholesterol (SCL) in mg/100ml

bmi body mass index (BMI) = weight/height2 in kg/m2

sex gender (1=male, 2=female)

month month of year in which baseline exam occurred

id patient identification variable (numbered 1 to 4699)

Follow-up information on coronary heart disease:

followup follow-up in days

chdfate CHD outcome (1=patient develops CHD at the end of follow-up,

0=otherwise)

Problem 1) read in data

Read in the dataset 2.20.Framingham.dta, which is in the datasets & do-files subdirectory of the course CD.

Problem 2) Cox regression model using total dataset (no case-cohort sampling)

Fit a Cox regression to the total sample using,

followup as the time variable

chdfate as the event variable

age and sbp as the list of predictor variables

[hint: see page 13 of Chapter 3-14]

Problem 3) update Stata to get needed commands for this study design

While connected to the Internet, update your Stata to include the commands needed for case-cohort analysis [see page 31 of Chapter 3-14]

Problem 4) case-cohort sampling and analysis

We can see from the following frequency table that there are 1,473 cases and 3,226 controls in the full cohort.

. tab chdfate

Coronary |

Heart |

Disease | Freq. Percent Cum.

------+------

Censored | 3,226 68.65 68.65

CHD | 1,473 31.35 100.00

------+------

Total | 4,699 100.00

Sample 31.35% of the total sample for controls (a one-to-one sampling ratio of cases to controls), and fit a Cox regression model similar to that computed above, but with appropriate variance estimators. The required commands are:

stset followup , failure(chdfate==1) id(id)

stcascoh, alpha(.3135) seed(999)

stselpre age sbp

Notice how close the estimate is to what it should be (compared to problem 1).

Chapter 3-15 (revised 16 May 2010) p. 1