Establishing a Cause-Effect Relationship

Format for rest of class; two projects as experimenters. I will do design, you will learn to follow and act as experimenters

Next lecture: Jenny Preece

On Friday: Experimental design continued

Survey Project: Please make sure that you read a few survey reports to make sure you know how to write survey reports. And forllow the guideline as much as you can. It was designed for an experiment, so it might need adaptation.

Also I will give you some sort of a web template, once the second phase. One person from each group will be able to access a part of the class website where you can out in your reports for the final report.

Final Projects: Start thinking about it. I will expect a substantial project. Give me a good research proposal. I will start assigning deadlines soon, but read up the literature, do the background work. Its going to be much more rigorous than the survey project.

Give me a defensible proposal. I will do my best to shoot it down.

I will give you details about the structure of the project and deadlines related ot it next week. But here’s my expectatio: this is pretty much the one large requirement for this class. All of you are afvanced graduate students. For some ofyou this is closely tied to research. No excuses for a great project.

Suggestions for projects:

Type I and II Errors

Experimental Notation

How to depict Main Effects, Interactions

Show them one relationship: Barnes and Noble.com did a site redesign. They started noticing that the time that people spent on the site, and sales also increased.

Can they conclude that site redesign caused sales to increase.

Establishing a Cause-Effect Relationship

How do we establish a cause-effect (causal) relationship? What criteria do we have to meet? Generally, there are three criteria that you must meet before you can say that you have evidence for a causal relationship:

Temporal Precedence

First, you have to be able to show that your cause happened before your effect.Real life relationships between variables are never simple. Sometimes there can be cyclical sitautions, involving ongoing processes that interact that both may cause and, in turn, be affected by the other. This makes it very hard to establish a causal relationship in this situation.

Covariation of the Cause and Effect

What does this mean? Before you can show that you have a causal relationship you have to show that you have some type of relationship. For instance, consider the syllogism:

if X then Y

if not X then not Y

If you observe that whenever X is present, Y is also present, and whenever X is absent, Y is too, then you have demonstrated that there is a relationship between X and Y. I don't know about you, but sometimes I find it's not easy to think about X's and Y's. Let's put this same syllogism in program evaluation terms:

Better website, more visitors

Bad website, less visitors

No Plausible Alternative Explanations

Covariation does not imply causation. Part of proving causality, is to rule out alternative explanations. It's possible that there is some other variable or factor that is causing the outcome. This is sometimes referred to as the "third variable" or "missing variable" problem and it's at the heart of the issue of internal validity. What are some of the possible plausible alternative explanations? Just go look at the threats to internal validity (see single group threats, multiple group threats or social threats) -- each one describes a type of alternative explanation.

It is possible that better site (better company, more marketing) more visitors

Establishing causation in social sciences

In most applied social research that involves evaluating programs, temporal precedence is not a difficult criterion to meet because you administer the program before you measure effects. And, establishing covariation is relatively simple because you have some control over the program and can set things up so that you have some people who get it and some who don't (if X and if not X). Typically the most difficult criterion to meet is the third -- ruling out alternative explanations for the observed effect. That is why research design is such an important issue and why it is intimately linked to the idea of internal validity.

EXTERNAL VALIDITY

External validity is related to generalizing. External validity is the degree to which the conclusions in your study would hold for other persons in other places and at other times.

Sampling Model: In the sampling model, you start by identifying the population you would like to generalize to. Then, you draw a fair sample from that population and conduct your research with the sample. Finally, because the sample is representative of the population, you can automatically generalize your results back to the population. There are several problems with this approach. First, perhaps you don't know at the time of your study who you might ultimately like to generalize to. Second, you may not be easily able to draw a fair or representative sample. Third, it's impossible to sample across all times that you might like to generalize to (like next year).

Proximal Similarity Model: 'Proximal' means 'nearby' and 'similarity' means... well, it means 'similarity'. The term proximal similarity was suggested by Donald T. Campbell as an appropriate relabeling of the term external. Identify other contexts to which your current results might apply to. For example: if you tested a palm pilot, your results might apply to the Handspring.

We can place different contexts (times, places, peoples, objects) in terms of their similarity to the current context. This placement is referred to as a gradient of similarity. We conclude that we can generalize the results of our study to other persons, places or times that are more like (that is, more proximally similar) to our study. Notice that here, we can never generalize with certainty -- it is always a question of more or less similar.

Threats to External Validity

A threat to external validity is an explanation of how you might be wrong in making a generalization. For instance, you conclude that the results of your study (which was done in a specific place, with certain types of people, and at a specific time) can be generalized to another context (for instance, another place, with slightly different people, at a slightly later time). There are three major threats to external validity because there are three ways you could be wrong -- people, places or times.

Peoples: Your critics could come along, for example, and argue that the results of your study are due to the unusual type of people who were in the study.

Places: Or, they could argue that it might only work because of the unusual place you did the study in (perhaps you did your educational study in a college town with lots of high-achieving educationally-oriented kids).

Time: Or, they might suggest that you did your study in a peculiar time.

Objects: In HCI your results might be extendable to only similar objects.

The concept of Reliability:

Reliability is necessary for but not sufficient for Validity.

VALIDITY TYPES

Validity pertains to the oeprationslaizationa dn measurment of concepts. Any time you translate a concept or construct into a functioning and operating reality (the operationalization), you need to be concerned about how well you did the translation.

Internal Validity

Internal Validity is the approximate truth about inferences regarding cause-effect or causal relationships. Thus, internal validity is only relevant in studies that try to establish a causal relationship. It's not relevant in most observational or descriptive studies, for instance. But for studies that assess the effects of certain changes to websites, or to products, internal validity is an important consideration. For example: recently Amazon.com increased the number of tabs in its home page. Lets assume they were all scientific and did a study that showed it was an increase in the number of tabs that lead to an ease in navigation. Suppose at the same time that Amazon.com launched the site redesign they also launched a marketing campaign. Now is the increase in traffic dues to increase in number of tabs or due to the marketing campaign.

The key question in internal validity is whether observed changes can be attributed to your intervention (i.e., the cause) and not to other possible causes (sometimes described as "alternative explanations" for the outcome).

Note. Internal Validity has little to do with external or construct validity.

EXTERNAL VALIDITY

External validity is the degree to which the conclusions in your study would hold for other persons in other places and at other times.

Two major approaches to establishing generalization:

Sampling Model: Identify the population you would like to generalize to. Then, you draw a fair sample from that population and conduct your research with the sample. Finally, because the sample is representative of the population, you can automatically generalize your results back to the population.

Proximal Similairty Model: The term proximal similarity was suggested by Donald T. Campbell as an appropriate relabeling of the term external validity (although he was the first to admit that it probably wouldn't catch on!). Under this model, we begin by thinking about different generalizability contexts and developing a theory about which contexts are more like our study and which are less so. We conclude that we can generalize the results of our study to other persons, places or times that are more like (that is, more proximally similar) to our study. Notice that here, we can never generalize with certainty -- it is always a question of more or less similar.

Threats to External Validity

Errors in generalizing: geenralizing to other groups of people, other times

There are three major threats to external validity because there are three ways you could be wrong --

people,

places or

times

Improving External Validity

Making sure samples are random

Use the theory of proximal similarity more effectively. Do a better job of describing the ways your contexts and others differ, providing lots of data about the degree of similarity between various groups of people, places, and even times.

Repeat the study, with different subjects and different material

Perhaps the best approach to criticisms of generalizations is simply to show them that

CONSTRUCT VALIDITY
INTERNAL VALIDITY

Comparison of Internal, External and Construct Validity

Internal validity is that it is only relevant to the specific study in question. That is, you can think of internal validity as a "zero generalizability" concern. All that internal validity means is that you have evidence that what you did in the study (i.e., the program) caused what you observed (i.e., the outcome) to happen. It doesn't tell you whether what you did for the program was what you wanted to do or whether what you observed was what you wanted to observe -- those are construct validity concerns. It is possible to have internal validity in a study and not have construct validity.

For example: instance, imagine a study where you are looking at the effects of a new kind of pen based website design tool's effect on design on website. Imagine that the tool does lead to more efficiency in the design of websites. Now further imagine that the reason that you see an improved design prpocess in your study is that whenever people are experimenting with a new tool, they are kind of excited, and are paying more attention and consequently doing a better job. This study has internal validity, you have established a cause and effect, this tool leads to better design. However, the pen based nature of the tool has nothing to do with design. Therefore the study has not construct validity.

CONSTRUCT VALIDITY

Construct validity refers to the degree to which you can generalize back to the theoretical construct you started from. Like external validity, construct validity is related to generalizing. But, where external validity involves generalizing from your study context to other people, places or times, construct validity involves generalizing from your program or measures to the concept of your program or measures. You might think of construct validity as a "labeling" issue. When you decide on a site redesign, to make it "easier to use", and make a number of changes to implement the theoretical construct ease of use. Now you have done the study, and you want to claim that your actions did actually make it easier to use. Your numbers might tell you that somehting about your site changed. You cannot be sure what did change.

Multitrait multimethod approach, or MTMM for short. In order to argue that your measures had construct validity under the MTMM approach, you had to demonstrate that there was both convergent and discriminant validity in your measures. You demonstrated construct validity when you showed that measures that are theoretically supposed to be highly interrelated are, in practice, highly interrelated. And, you showed discriminant validity when you demonstrated that measures that shouldn't be related to each other in fact were not. While the MTMM did provide a methodology for assessing construct validity, it was a difficult one to implement well, especially in applied social research contexts and, in fact, has seldom been formally attempted.

KINDS OF CONSTRUCT VALIDITY

Face Validity

In face validity, you look at the operationalization and see whether "on its face" it seems like a good translation of the construct. This is probably the weakest way to try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is a good measure of math ability (i.e., the label "math ability" seems appropriate for this measure). Or, you might observe a teenage pregnancy prevention program and conclude that, "Yep, this is indeed a teenage pregnancy prevention program." Of course, if this is all you do to assess face validity, it would clearly be weak evidence because it is essentially a subjective judgment call. (Note that just because it is weak evidence doesn't mean that it is wrong. We need to rely on our subjective judgment throughout the research process. It's just that this form of judgment won't be very convincing to others.) We can improve the quality of face validity assessment considerably by making it more systematic. For instance, if you are trying to assess the face validity of a math ability measure, it would be more convincing if you sent the test to a carefully selected sample of experts on math ability testing and they all reported back with the judgment that your measure appears to be a good measure of math ability.