T-Tests Lab 5 8

T-Tests

Experimental and Quasi-experimental designs have in common the goal of comparing scores on a dependent variable between two conditions of the independent variable. The independent variable is always defined by two discrete (categorical) levels of a variable. The dependant variable is continuous. The question addressed in both designs is whether there is a difference in the dependant variable scores between conditions that is larger than would be expected by chance. For example, you may want to compare males to females on their spatial problem solving ability.

When you design an experiment or a quasi-experiment, you always strive to obtain as much internal validity as possible. Internal validity refers to the degree to which all alternative explanations for differences in the dependant variable between conditions have been controlled for, other than the independent variable. Whenever we compare groups, we try to make them as similar as possible. In an experiment, we achieve internal validity by manipulating the independent variable, while keeping all other potential confounds constant. A confound is any variable, other than your independent variable, that systematical differs between conditions

One of the biggest potential confounds in any study are individual differences. These are characteristics of the participants themselves. When comparing means between conditions, differences may arise simply because different people participated in each condition. This can be controlled for two ways. The strongest method is to have all subjects participate in all conditions of the study. This is called a within subjects design because each person’s score is being compared to their own score in a different condition. In this design, individual differences between conditions are controlled for because the same participants serve in each condition. By definition, there will be equal numbers of males and females in each condition, the ages of participants in each condition will be the same, as will all other individual difference characteristics.

Recall from pervious discussions that there are still potential confounds with this design - most notably carry over and order effects. Carry over effects occur when participation in one condition, effects performance in subsequent conditions. Participants may become more skilled at doing the task. The skill they gained in one condition carries over to performance in subsequent conditions. For example, a researcher is running a study on the effects of background music on learning. He has three conditions; A. B & C. If he always runs subjects using this order and he finds a significant difference between the three conditions, we will not know if it is due to carryover effects (e.g., perhaps they get better at the task over time) or order effects (perhaps they get tired and bored) or due to the independent variable. One way to control for this is to counterbalance for order. Equal numbers of participants could participate under all possible condition orders. This is not a very practical solution, however, when the carry over effects are relatively permanent.

Order effects occur due to repetition of the same task and are not due to the task itself (for example learning or increased skill). Order effects include influences such as boredom or fatigue, or differences in demand characteristics over conditions. Counterbalancing for order can control for these influences. Although using a within subjects design is the most powerful tool we have for controlling for individual differences, it is not always a practical or even possible solution.

When a within subjects design cannot be used a between subjects design can be. In this design, different participants serve in each of the conditions. This, of course, leaves individual differences as a potential confound. There are two ways that we can control for individual differences in a between subject design. One is to match the conditions on the variable of concern by ensuring that equal numbers of subjects with various characteristics are assigned to each of the conditions. A very large problem with matching is determining what characteristics to match. For example, if your dependant variable involves fine motor skills, a potential confound is sex. In general, males have more difficulty with these types of task than females do. You could ensure that all three conditions have equal numbers of males, but this does not ensure that the three conditions are matched for other potential individual differences such as experience with motor tasks. Perhaps, more people who have experience with fine motor control tasks (carvers, sewers, artists) end up in one condition than in another. The list of individual characteristics you would need to match for could be endless. The preferred way to control for individual differences is to trust the gods of randomness. Using a procedure in which participants are randomly assigned to all conditions in the experiment should. Over reasonably large numbers of subjects, this will insure that there are no systematic individual differences between conditions.

In an experiment, a comparison is made on measures of a dependant variable between discrete levels of an independent variable. All possible confounds are eliminated. Confounds are eliminated when all differences between conditions are held constant except for the independent variable. The conditions are the same except for the independent variable (which is manipulated by the experimenter). Since the researcher has caused the difference in the independent variable, and only in the independent variable difference in the dependant variable can be said to be caused by differences in the independent variable. All other possible explanations have been eliminated (except one that we will get to in a minute). If some variable other than the independent variable differs between conditions, the experiment is confounded. A confound is a threat to internal validity.

Quasi-experiments also involve making comparisons between conditions; however, the researcher does not have control over the manipulation for the independent variable. Because they did not create the difference under controlled conditions, other variables that may be related to the one they are studying are possible alternative explanations for differences in the dependant variable. For example, a study conducted last year looked at developmental changes in gender attitudes. The researchers measured gender attitudes in grade 10, grade12, college freshmen, sophomores, juniors and seniors as well as graduate students. While these subjects span a wide age range, they differ in many other aspects as well (education, experience, as life skills, hours they work). If they find differences, these differences might be due to age, but they may also be due to any of these confounds. They could conclude that there are differences associated with age, but they cannot claim to have evidence that age caused that difference. Even though they cannot make a causal conclusion, knowing that peoples’ gender attitudes are related to their age is still useful information. Quasi-experiments, although valuable, are inherently confounded and have lower levels of internal validity.

In designing your study, (whether it is an experiment or a quasi-experiment) you attempted to eliminate confounds. There is one confound you could not eliminate no matter how well controlled your study is. Differences between conditions could arise simply due to chance. We can never completely eliminate this possible explanation. We can, however, limit it. We do this through the magic of inferential statistics. Inferential statistics allow us to estimate how likely it is that the difference between conditions we obtained, is to be due to pure chance. The type of inferential statistic used depends on the parameter of the population you are comparing (e.g., mean, median, variance, correlations) and the number of conditions you are comparing. Since most studies compare the mean of samples (to infer a difference in the mean of populations), we will be focusing on inferential statistics that compare means. We will start with the simplest case, the t-test.

A t-test is the appropriate inferential statistic to use when

Ÿ You have conducted either an experiment or a quasi-experiment

Ÿ You are using either a between or a within subjects design

Ÿ You are comparing the means of conditions.

Ÿ You are comparing two conditions (not more).

The logic underlying a t-test is not complex. It relies on the same logic that we used for z-tests. Recall with z-tests, if you know the mean and the standard deviation of a distribution and the distribution is roughly normal in shape, we can estimate the percentage of scores that will fall above or below any score in the distribution. This is because with a normal distribution a known percentage of scores will fall within a given number of standard deviations from the mean. The z-score is simply a measure of how many standard deviations a score is away from the mean. The t-test just extends this logic to samples instead of individual scores.

Assume we have a population of people (let’s say all the students currently enrolled at UWP). We randomly select a sample of 100 of them and measure their height. The mean of the sample is an estimate of the mean of the population. If I take a second sample of 100 students and measure their heights, I would not expect to get the exact same mean. If I took 100’s of samples of 100 students from the same population, I would expect to get slightly different mean heights for each sample. Why do I not get the same mean height with each sample if they are randomly drawn from the same population? The answer is pure chance. Not all samples will be perfect estimates of the population mean. Most will be close. Some will vary a bit, and some will vary a great deal from the population mean, by pure chance. Chance is predictable. Most samples will give relatively good estimates of the population mean, but some will be influenced by individuals who are extremely tall or extremely short, producing a sample mean that is higher or lower then the actual population mean. The more the sample mean deviates from the population mean, the less likely we are to obtain it. The distribution of sample means is normally distributed. This distribution of means has a mean that is an estimate of the population mean.

However, we do not know the standard deviation of the distribution of means. However, we can estimate it from the standard deviations of our samples. First, we need to understand that the variability of our sample means is going to be effected by the size of the sample. Larger samples give us better estimates of the mean of the population. Assume that instead of drawing samples of 100 students from the population of UWP students, I draw samples of only 10. The distribution of these means will still be normally distributed but they will have a lot more variation. In a sample of 10, drawing one extremely tall or extremely short student will influence the mean of the sample more than that same score would in a sample of 100. So the variance (or standard deviation if you prefer) of smaller samples is larger.

We estimate the standard deviation of the sample means (which we call the standard error of the mean) from the sample standard deviation (s) and the sample size (n).

Where is the notation we use to stand for the standard error of the mean.

Don’t worry. You will not be asked to calculate this. What you should know is

1) Can be estimated from a sample variance.

2) The larger the sample size, the lower will be.

Assume that I want to compare the height of women on campus to that of males. We can think of these as two different populations, one of females and one of males. I think we would all agree that on average males are taller than females and that in general if I drew a sample of males and a sample of females from these two populations, the male samples would generally be taller than the female samples. Because of sampling error, this will not always be true. In fact, in some cases, I could even get a sample of females who were extremely tall, and a sample of males that were extremely short, so that the mean of the female sample is actually higher than the mean of the male sample. This would not reflect a true difference between the populations; it would simply be a chance occurrence.

Let’s say that instead of taking samples from males and females, I take samples from English majors and Psychology majors. Is there a real difference in their heights? I do not know. Even if there is no difference between the heights of these two populations of students, it is likely that simply due to sampling error (chance) the mean heights of my samples will not be exactly the same. This is the same situation we have when we are comparing two means we have drawn from any two populations. We don’t know how large the difference needs to be before we are fairly sure it is a real difference and not a chance difference.

Think about the samples from the two conditions as potentially representing two different populations. When we analyze the differences between conditions, we are determining whether the scores on the dependant variable are or are not likely to have been drawn from the same population.

When doing an experiment we state these two possible conclusions as hypothesis.

Scientific Hypothesis: There is a real (non-chance) difference between the two conditions we are comparing.