I. Definition of Observational Studies; Examples; Goals of Course

Stat 921 Notes 1

Reading:

Observational Studies, Chapter 1, 2.1.

I. Definition of Observational Studies; Examples; Goals of Course.

Observational Study (Cochran, 1965): An observational study is an empiric investigation in which “… the objective is to elucidate cause-and-effect relationships … [in which] it is not feasible to use controlled experimentation, in the sense of being able to impose the procedures or treatment whose effects it is desired to discover, or to assign subjects at random to different procedures.”

Examples:

Long-term psychological effects of the death of a close relative: In an attempt to estimate the long-term psychological effects of bereavement, Lehman, Wortman and Williams (Journal of Personality and Social Psychology, 1987) collected data following the sudden death of a spouse or a child in a car crash. They matched 80 bereaved spouses and parents to 80 controls drawn from 7581 individuals who came to renew a drivers license. Specifically, they matched for gender, age, family income before the crash, education level, number and ages of children. Their outcome was the depression level of the subject five years after the crash.
Effects on criminal violence of laws limiting access to handguns: Do laws that ban purchases of handguns by convicted felons reduce violence? Wright, Wintemute, and Rivara (American Journal of Public Health, 1999) compared two groups of individuals in California: (a) individuals who attempted to purchase a handgun butwhose purchase was denied because of a prior felonyconviction, and (b) individuals whose purchase wasapproved because their prior felony arrest had notresulted in a conviction. The comparison looked forwardin time from the attempt to purchase a handgun,recording arrest charges for new offenses in thesubsequent three years.
Effects on children of occupational exposure to lead: Morton et al. (American Journal of Epidemiology, 1982) asked whether children were harmed by lead brought home in the clothes and hair of parents who were exposed to lead at work. They matched 33 children whose parents worked in a battery factory to 33 unexposed control children of the same age and neighborhood. Their outcome was the level of lead found in the child’s blood.

Features of the definition of an observational study:

An observational study concerns treatments, interventions or policies and the effects they cause and in this respect it resembles an experiment. A study without a treatment is neither an experiment nor an observational study. Most public opinion polls, most forecasting efforts and many other important empirical studies are neither experiments nor observational studies.

In an experiment, the assignment of treatments is controlled by the experimenter, who ensures that subjects receiving different treatments are comparable. In an observational study, this control is lacking. Although an experiment is ideal, it is often only possible to conduct an observational study.

The treatment, perhaps cigarette smoking or radon gas, may be harmful and cannot ethically be given to human subjects for experimental purposes.
The treatment may be controlled by a political process that, perhaps quite appropriately, will not yield control merely for an experiment, as is true of much of macroeconomic and fiscal policy.
The treatment may be beyond the legal reach of experimental manipulation even by a government, as is true of many management decisions.
Experimental subjects may have such strong attachments to particular treatments that they refuse to cede control to an experimenter, as is sometimes true in areas ranging from diet and exercise to bilingual education.

Two more examples of observational studies:

1. Effect of smoking on lung cancer. By the mid-1940s, it had been observed that lung cancer cases had tripled over the previous three decades. But the cause for the increase in lung cancer was unclear and not agreed upon. Possible explanations included changes in air quality due to introduction of the automobile, widespread expansion of paved roads that contained many carcinogens, aging of the population, the advent of radiography, better clinical awareness of lung cancer and better diagnostic methods in general in addition to smoking.

At the 1947 American Medical Association (AMA) convention in Atlantic City, doctors formed long lines to get free cigarettes.

At the very moment that doctors were lining up to get their free cigarettes, researchers in the United Kingdom and the United States were launching observational studies of the effect of smoking on lung cancer.

Wynder and Graham (Journal of the American Medical Association, 1950) and Doll and Hill (British Medical Journal, 1950) were retrospective studies that compared the smoking habits of patients with lung cancer to cancer-free controls, and found lung cancer patients were much more likely to smoke than cancer-free controls.

2. Effect of vitamin C on cancer. Linus Pauling (1901-1994) is the only person ever to win two unshared Nobel prizes (chemistry, 1954; peace, 1962). In his later years, Pauling advocated the health benefits of high doses of vitamin C. Pauling himself reportedly took at least 12,000 mg daily of vitamin C (an 8 oz. glass of orange juice has about 100 mg of vitamin C) and raised the amount to 40,000 mg if symptoms of a cold appeared. In 1993, after undergoing radiation therapy for prostate cancer, Pauling said that vitamin C had delayed the cancer’s onset for twenty years.

In a 1976 study, Pauling and Cameron presented observational data concerning the use of vitamin C as a treatment for advanced cancer. They gave vitamin C to 100 patients believed to be terminally ill from advanced cancer and studied subsequent survival. For each such patient, 10 historical controls were selected of the same age and gender, the same site of primary cancer and the same histological tumor type. The vitamin C patients were reported to have a mean survival time about 10 months longer than that of the controls. Cameron and Pauling concluded, “that there is strong evidence that that treatment…[with vitamin C]…increases the survival time.”

Criticisms of the studies:

Scientific evidence is commonly and properly greeted with objections, skepticism and doubt.

Among the criticisms of the lung-cancer-smoking studies were:

R.A. Fisher’s criticism that they did not control for a gene which might associated with both lung cancer and smoking:
Lung cancer patients might exaggerate their past smoking habits and interviewers might unconsciously or consciously skew their questions to exaggerate the past smoking of lung cancer patients.

Among the criticisms of the Vitamin C-cancer studies were:

William D. DeWys, chief of the clinical investigations branch of the National Cancer Institute’s cancer therapy program raised concerns about the comparability of the treated and control groups. Specifically, he observed that no data had been published to demonstrate that the patients had been matched by stage of their disease, functional ability, weight loss and sites of metastasis, all of which have an important impact on survival.
The control group was formed from records of patients already dead, while the treated group were alive at the start of the study. The argument was that the treated patients were terminally ill, that they all would be dead shortly, so the recent records of apparently similar patients, now dead, could reasonably be used to indicate the duration of survival absent treatment with vitamin C. Nonetheless, when the results were analyzed, some patients given vitamin C were still alive; that is, their survival times were censored. This might reflect the dramatic effects of vitamin C but it might instead reflect some imprecision in judgments about who is terminally ill and how long a patient is likely to survive, that is, imprecision about the initial prognosis of patients in the treated group.
Time from “untreatability” to standard therapies” to death depends on the time of “untreatability.” Treated patients were judged, at the start of treatment with Vitamin C, to be untreatable by other therapies. For controls, a date of untreatability was determined from records. It is possible that these two different processes would produce the same number, but it is by no means certain.

In the case of smoking and lung cancer, the original studies and later studies addressed some of these criticisms. Doll and Hill assessed the reliability of interviews by reinterviewing a group of controls six months later. Doll and Hill and Hammon and Horn conducted large prospective studies.

LaterEvidence:

1. Effect of smoking on lung cancer:

The 1964 United States Surgeon General’s Advisory Committee Report, Smoking and Health, which reviewed a vast literature concluded,

“Cigarette smoking is causally related to lung cancer in men: the magnitude of the effect of cigarette smoking far outweighs all other factors. The data for women, though less extensive, points in the same direction.”

Though there had been some experiments confined to laboratory animals, the direct evidence linking smoking with human health came from observational studies.

2. Effect of vitamin C on cancer: To test the claim that vitamin C is an effective treatment for advanced cancer, the Mayo clinic (Moertel et al., 1985, New England Journal of Medicine) conducted a double blind randomized controlled experiment comparing vitamin C to placebo for patients with advanced cancer of the colon and the rectum. They found no indication that vitamin C prolonged survival, with the placebo group surviving slightly but not significantly longer on average (median survival time in placebo group was 4.1 months compared to 2.9 months in vitamin C group). Today, few scientists claim that vitamin C holds promise as a treatment for cancer.

These two examples illustrate (1) the importance of observational studies in providing evidence on critical questions that cannot be addressed by experiments and (2) the danger of observational studies in leading investigators to advocate harmful policies or ineffective treatments.

Goals of the course:

1. Study methods for designing and analyzing observational studies to make them more provide reliable evidence.

2. Study methods for assessing the weight of evidence provided by observational studies that acknowledges the uncertainty inherent in them.

We will start by studying how to design and make inferences in randomized experiments.

The planner of an observational study should always ask himself the question, ‘How would the study be conducted if it were possible to do it by controlled experimentation?’

-- William G. Cochran, attributing the point to H. F. Dorn.

II. Potential outcomes model for defining the meaning of the effect caused by a treatment

The model was developed by Neyman, 1923; Rubin, 1974. See P. Holland (1986), “Statistics and causal inference”, Journal of the American Statistical Association, 945-970 for more background and history.

Consider a set of treatments , for example where T denotes parent works in battery factory and C denotes parent does not work in a battery factory.

For unit i, the potential response (outcome) is the outcome that unit would experience if exposed to (takes) treatment at a specific time or in a specific time period, e.g., = Lead level child would experience if parent worked in a battery factory.

Lead level child would experience if parent did not work in a battery factory.

Causal effect of T compared to C for unit =

Note that the causal effect of a treatment can only be defined in comparison to another (possibly inactive) treatment.

Example: The statement, “A study on the benefits of vitamin C showed that 90% of the people suffering from a cold who take vitamin C get over their cold within a week” is meaningless without knowing what would happen if those people didn’t take vitamin C.

The fundamental problem of causal inference (Holland):

It is impossible to observe both the value of and and therefore it is impossible to observe the causal effect of the active treatment compared to the control for unit .

Examples:

(1) Unit is a specific fourth grader, represents a novel year-long program of study of arithmetic, represents a standard arithmetic program and ris a score on a test at the end of the year. We can observe orbut not both.

(2) Unit is the output on a computer screen after a user is asked to input “Which team will win the Super Bowl this year?” T represents the user inputting “Philadelphia” and Crepresents the user inputting “Chicago.”

III. Randomized Experiments and How They Solve the Fundamental Problem of Causal Inference

Instead of trying to estimate the individual causal effect of the treatment T for a particular individual , estimate the average treatment effect (ATE):

The second equality means that we do not need to consider the joint distribution of but only the marginal expectations .

Let denote treatment unit i actually receives. The observed response is .

The observed data provides information about

and

It is important to recognize that and are not the same thing and need not be equal in general [similarly for and ].

Example: Let =person’s earnings at age 40, whether or not person graduated from college. We might expect that if more motivated and more academically able people are more likely to graduate from college.

Independence: There is a condition that makes and :

independent of .

We can ensure that is independent of (over repetitions of an experiment) by randomly assigning to the units. The value of a randomized experiment is that it physically creates independence.

In a randomized experiment, any covariate that is recorded before the random assignment -- including -- is independent of and in a large enough experiment, should be close to balanced between the treatment and control groups by the law of large numbers.

Example – Randomized Experiment of Job Training.

Controversy about the effects of job training programs:

Job training programs don’t work well:

“Those who don't get those early skills are unlikely to benefit much from short-term public training programs later on. Public job training also has a bad record because it does not use market incentives. A lot of public programs train people at tasks and skills that are obsolete.”

-- James Heckman, Nobel Prize winning economist

Job training programs work:

"Job Corps took me from the mean streets and out of a nightmare lifestyle into a mode where the most incredible of dreams came true."

-- George Foreman, two-time heavyweight boxing champion

The National Supported Work Demonstration was a randomized experiment of a job training program conducted in the mid-1970s. The program was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment. Those assigned to the treatment group received all the benefits of the program, while those assigned to the control group were left to fend for themselves.

We consider male participants in the program. 297 men were assigned to the treatment group and 425 to the control.

# Use NSW data

# The variables are treatment (1 if treated, 0 if not treated), age, education, black (1 # if black, 0 otherwise), hispanic (1 if Hispanic, 0 otherwise), married (1 if

# married, 0 otherwise), nodegree (1 if no high school degree, 0 otherwise),

# earnings75 (earnings in 1975), and earnings78 (earnings in 1978). The last

# variable is the outcome. Other variables are pre-treatment.

nswdata=read.table("nswdata.txt",header=TRUE);

attach(nswdata);

# Balance of covariates

t.test(age[treatment==1],age[treatment==0]);

t.test(education[treatment==1],education[treatment==0]);

t.test(black[treatment==1],black[treatment==0]);

t.test(hispanic[treatment==1],hispanic[treatment==0]);

t.test(married[treatment==1],married[treatment==0]);

t.test(nodegree[treatment==1],nodegree[treatment==0]);

t.test(earnings75[treatment==1],earnings75[treatment==0]);

Table 1: Baseline Comparison of Treated and Control Groups

Covariate / Treatment Group / Control Group / p-value for testing : means in treatment and control groups are same
Age / 24.63 / 24.44 / 0.72
Education (Years of schooling) / 10.38 / 10.19 / 0.14
Black / 0.80 / 0.80 / 0.96
Hispanic / 0.09 / 0.11 / 0.42
Married / 0.17 / 0.16 / 0.70
No High School Degree / 0.73 / 0.81 / 0.01
Earnings in 1975 (before the treatment) / 3,066 / 3,027 / 0.92

Table 1 shows that the treated and control groups were similar in many important ways prior to the start of treatment, so that comparable groups were being compared. For 6 of the 7 covariates considered, the difference between the groups was not significant at the 0.05 level, but for no high school degree, the difference is significant. This is in line with what one would expect from 7 significance tests if the only differences were due to chance, that is, due to the choice of random numbers in assigning treatments.

For us, Table 1 is important for two reasons.

1. Table 1 is an example showing that randomization tends to produce relatively comparable or balanced treatment groups in large experiments.

2. The seven covariates in Table 1 were not used in assigning treatments. There was no deliberate balancing of these variables. Rather the balance we see was produced by the random assignment, which made no use of the variables themselves. This gives us some reason to hope and expect that other variables, not measured are similarly balanced. Had the study not used random assignment, had it instead assigned people one at a time to balance these seven covariates, then the balance might well have been better than in Table 1, but there would have been no basis for expecting other unmeasured variables to be similarly balanced.

The statement that randomization tends to balance covariates is at best imprecise; taken too literally, it is misleading. For instance, in Table 1, the groups do differ slightly in terms of high school degree. Presumably, there are other variables, not measured, exhibiting imbalances similar to if not greater than that for high school degree. What is precisely true is that random assignment of treatments can produce imbalances by chance, but common statistical methods, properly used, suffice to address the uncertainty introduced by these chance imbalances. We will now study these statistical methods.