A within-subjects design is an experiment in which the same group of subjects serves in more than one treatment. Note that I’m using the word "treatment" to refer to levels of the independent variable, rather than "group". It’s probably always better to use the word "treatment", as opposed to group. The term "group" can be very misleading when you are using a within-subjects design because the same "group" of people is often in more than one treatment.

As an example of a within-subjects design, let’s say that we are interested in the effect of different types of exercise on memory. We decide to use two treatments, aerobic exercise and anaerobic exercise. In the aerobic condition we will have participants run in place for five minutes, after which they will take a memory test. In the anaerobic condition we will have them lift weights for five minutes, after which they will take a different memory test of equivalent difficulty. Since we are using a within-subjects design we have all participants begin by running in place and taking the test, after which we have the same group of people lift weights and then take the test. We compare the memory test scores obtained by all participants after each type of exercise, in order to answer the question as to what type of exercise aids memory the most.

Strengths

There are two fundamental advantages of the within subjects design: a) power and b) reduction in error variance associated with individual differences. A fundamental inferential statistics principle is that, as the number of subjects increases, statistical power increases, and the probability of Type II error decreases (the probability of not finding an effect when one "truly" exists). This is why it is always better to have more subjects, and why, if you look at a significance table, such as the t-table, as the number of subjects increases the t value necessary for statistical significance decreases. The reason this is so relevant to the within subjects design is that, by using a within-subjects design you have in effect increased the number of "subjects" relative to a between subjects design. For example, in the exercise experiment, since you have the same subjects in both groups, you will have twice as many "subjects" as you would have had if you would have used a between-subjects design. If ten students sign up for the experiment, and you use a between-subjects design, with equal size groups, you will have five subjects in the aerobic condition and 5 in the anaerobic condition. However, if you use a within-subjects design you will in effect have 10 subjects in both conditions. Just as with the term "groups" vs. "treatments", instead of using the term "subjects" it’s better to speak of "observations", since the term subjects is misleading in the within-subjects design when the same person may effectively be more than one "subject".

The reduction in error variance occurs because much of the error variance in a between-subjects’ design is due to the fact that, even though you randomly assigned subjects to groups, the two groups may differ with regard to important individual difference factors that affect the dependent variable. With within-subjects designs, the conditions are always exactly equivalent with respect to individual difference variables since the participants are the same in the different conditions. So, in our exercise example above, any factor that may affect performance on the dependent variable (memory) such as sleep the night before, intelligence, or memory skill, will be exactly the same for the two conditions, because they are the exact same group of people in the two conditions.

Weaknesses

There is also a fundamental disadvantage of the within-subjects’ design, which can be referred to as "carryover effects". In general, this means the participation in one condition may affect performance in other conditions, thus creating a confounding extraneous variable that varies with the independent variable. To counter for potential carry-over effects, counterbalancing is used so that participants are randomly assigned to the order in which they receive the treatments

Two basic types of carryover effects are practice and fatigue. As you read about the hypothetical exercise and memory experiment, you may very possibly have recognized that one problem with this experiment would be that participating in one exercise condition first, followed by the memory test, may inadvertently effect performance in the second condition. First of all, participants may very possibly be more tired from running in place and weight lifting than they are from just running in place so that they perform worse on the second memory test. If this is the case, they wouldn't do worse on the second test because aerobic exercise is better for memory than anaerobic, rather they would do worse because they were actually more worn out from exercising for ten minutes total than after only exercising for five. When one within-subjects treatment negatively affects performance on a later treatment this is referred to as a fatigue effect. On the other hand, in the exercise experiment the second memory test may be very similar to the first, so that by practicing with the first test they perform much better the second time. Again, the difference between the two conditions would not be due to the independent variable (aerobic vs. anaerobic), rather it would be due to practice with the test. When a within-subjects treatment positively effects performance on a later treatment this is referred to as a practice effect.

Adapted from -- http://web.mst.edu/~psyworld/experimental/within_subjects.html

In sum, in between groups experimental designs, participants must be randomly assigned to the various levels of the IV (e.g. treatments). Randomization is done to minimize the impact of individual differences among participants (e.g. attitudes, knowledge, personality) in the outcome variable (a threat to the internal validity of the study).

In within-group experimental designs, each participant receives all levels of the IV and serves as its own control, the possibility that individual differences affect the outcome of the study is not present because the same individuals receive all levels of the IV – treatments. Therefore, lack of randomization does not present a threat of to the study’s internal validity. However, the potential confound is order or carry-over effects as explained above.

Sometimes the carry over effect may be of such nature that within group designs are ruled out-