Cal Poly A.P. Statistics Workshop – February 16, 2013
DESIGN OF EXPERIMENTS (DOE)
Karen McGaughey and Soma Roy
The Language of DOE
Example #1: Suppose a group of researchers wish to test the effectiveness of three different formulations of sunscreen: 15, 30, 45 SPF. The researchers gather a volunteer sample of 75 U. S. adults, and randomly assign each of the participants to one of the formulations in such a way as there are 25 participants assigned to each formulation. After 6 weeks of using the sunscreen, the researchers will measure the change in skin pigmentation at 3 different locations on the body (face, arm, leg).
a) Explanatory Variable or Factor:
o Variable the researchers are studying the effect of
o Does this variable explain the outcomes observed in the study?
o Variable which defines the treatment groups (often categorical, but does not have to be)
b) Treatments:
o The ‘levels’ of the explanatory variable which describe the treatment groups
c) Response Variable:
o Variable which is ‘measured’ or recorded on each experimental unit in the study
o Outcome variable
d) Experimental Unit:
o The individual or object to which a treatment is applied
Example #2: An animal scientist is interested in studying the effect of protein and lysine levels on the growth of turkeys. The scientist has 60 Bourbon breed turkeys available for the study, located at a turkey farm in Northern California. As a matter of convenience for feeding and care, the turkeys are housed 10 turkeys to a pen, and feed is placed in a trough inside each pen. The animal scientist is planning to use 3 different levels of protein (A, B, C) and two different lysine levels (low and high), which will be added to the feed. After 3 weeks, the weight gain (in pounds) will be recorded for each turkey.
a) Explanatory Variable or Factor:
b) Treatments:
c) Response Variable:
d) Experimental Unit:
Example #3: Jimmy’s Bakery is tinkering with how it bakes chocolate cakes. Jimmy likes his grandmother’s chocolate cake recipe, but he’s trying to work out the ideal temperature (325°F, or 350°F). Jimmy decides to bake a total of 30 cakes, five-at-a-time (since the oven is big enough to hold five cakes). He will randomly assign each of 6 sets of 5 cakes to one of the temperatures, using a method that will result in 3 sets of cakes assigned to each temperature. Once baked, he will have a food scientist determine the moisture level of each cake, since moisture is one way to gauge quality. Note that for moisture, generally higher values are better than lower values.
a) Explanatory Variable or Factor:
b) Treatments:
c) Response Variable:
d) Experimental Unit:
The issue with identification of the wrong experimental unit:
The experimental unit is the unit which is randomly assigned to a treatment. In this case, Jimmy is randomly assigning a set of 5 cakes to a temperature, so a set of 5 cakes is the experimental unit. Moisture measurements will be taken on each cake, so cake is considered the sampling/measurement unit.
Suppose Jimmy analyzes his data (using a two sample independent t-test) using the sampling unit as his unit of analysis. This will give him a sample size of 15 (3 x 5 cakes) for each of his temperatures.
If Jimmy correctly analyzes his data (using a two sample independent t-test) using the experimental unit as the analysis unit, he has only a sample size 3 for each of his temperatures.
e) What are the implications in using the wrong unit for analysis?
Misidentification of the experimental unit happens often in research studies, and can lead to inaccurate conclusions.
Comments from the FDA on actual studies run by an ‘unnamed’ pharma company:
“Please be advised that if you have multiple horses per pen and treatments are randomly assigned to pens, the pen will be the experimental unit of analysis…..The number of experimental units per treatment group will be considerably reduced if horses are multiply penned.”
“If the cows are individually fed the additive, then the individual cow is the experimental unit. If the additive is provided on a group bases, the treatment group will be considered the experimental unit with more than one group per treatment (replicated in time or location) needed for valid statistical analysis.”
Replication vs. Sub-sampling/Repeated Measurements:
Reconsider Example #1 on Sunscreen:
Why did the researchers not use just one volunteer for each formulation of sunscreen? (This would be a much cheaper and easier way to carry out the study.)
The independent replication of the treatment allows the researcher to determine how much the response will vary within a treatment. That is, the researcher will be able to measure how experimental units given the same treatment will vary.
Recall: Two sample Independent t-test
is the standard deviation of independent experimental units given treatment 1
is the standard deviation of independent experimental units given treatment 2
We need independent replication of the treatment to get a measure of the variability of experimental units ‘treated’ exactly alike for our statistical test.
Note: The test above is called the independent two sample t-test. Independence in the name of the test refers to the independence of experimental units given treatment 1 vs. those given treatment 2.
When talking about replication, we mean independent experimental units within a treatment.
Example #4: The New England Journal of Medicine (Feb 9, 2012) reported the results of a study on the effectiveness of Tai Chi on postural stability of patients with Parkinson’s disease. In the study 130 Parkinson’s patients (referred by their physicians) were randomly assigned to do Tai Chi or a resistance training routine 2 times per week. After 6 months, the change from baseline was measured in each person’s functional reach (measured in cm). Positive changes indicate improvement. (Functional reach is how far a person can lean over to reach for something without losing their balance.)
a) Explanatory Variable or Factor:
b) Treatments:
c) Response Variable:
d) Experimental Unit:
e) What additional information, if any, would be provided by a control group in this study? (What is the purpose of including a control group in a study?)
f) Considering just the two treatments (Tai Chi and resistance training), the Tai Chi group saw larger (statistically significant) gains in functional reach compared to the resistance training group. Would it be reasonable to generalize this finding to all Parkinson’s patients? Explain.
g) Reconsider the previous question. Suppose that the Tai Chi group saw larger (statistically significant) gains in functional reach compared to the resistance training group. Would it be reasonable to conclude that there is evidence that Tai Chi improves in functional reach, compared to resistance training? Why or why not?
Ideal studies make use of two kinds of randomness:
· Random sampling/selection from the population allows for generalizing results from the sample to the larger population.
· Random assignment/allocation of experimental units to treatment groups permits cause-and-effect conclusions to be drawn.
Example #5: An article about handwriting appeared in the October 11, 2006 issue of the Washington Post. The article mentioned that among students who took the essay portion of the SAT exam in 2005-06, those who wrote in cursive style scored significantly higher on the essay, on average, than students who used printed block letters.
a) Experiment or Observational study?
b) Would you conclude from this study that using cursive style causes students to score better on the essay? If so, explain why. If not, identify a potential confounding variable, and explain how it provides an alternative explanation for why the cursive writing group would have a significantly higher average essay score.
Question: So, how can we legitimately draw cause-and-effect conclusions?
Answer: We can legitimately draw cause-and-effect conclusions, by assigning subjects to treatment (explanatory variable) groups in such a way that the groups are likely to be as similar as possible on all characteristics except the explanatory variable. Then if we see a significant difference in the response variable between the groups, we can conclude that the explanatory variable is causing the difference in the response.
Random assignment is the preferred method of assigning experimental units to treatments (explanatory variable groups) in an experiment. This gives each subject an equal chance of being assigned to any of the treatment groups, and in doing so attempts to create experimental groups that are as alike as possible.
Example #5 (contd.): The previously mentioned article in the Washington Post also mentioned a different study in which one essay was given to many graders. Some graders were shown a cursive version of the essay and the other graders were shown a version with printed block letters. Researchers randomly decided which version the grader would receive. The average score assigned to the essay with the cursive style was significantly higher than the average score assigned to the essay with the printed block letters.
What conclusion would you draw from this second study? Be clear about how this conclusion would differ from that of the first study, and why that conclusion is justified.
Why does random assignment permit cause-and-effect conclusions to be drawn?
Random assignment
· Attempts to eliminate the effect of confounding variables, and
· So attempts to create experimental groups that are as alike as possible, except for the treatment of interest.
COMPLETELY RANDOMIZED DESIGN (CRD)
· Is the simplest design of experiments
· Treatments (i.e. treatment combinations) are assigned completely at random to experimental units – that is, no restriction on how the random assignment is performed.
Example #6: Suppose that a team of researchers want to test the effectiveness of two different acne medications (Formula A and Formula B) on teenagers, and recruit a representative sample of 20 teenagers. As statisticians we will help the researchers randomly assign one of the two acne medications to these 20 teenagers:
Amar, Becky, Carol, Danny, Elle, Frank, Grant, Heather, Ian, Jake,
Katie, Lucy, Maria, Nick, Oscar, Pedro, Rosa, Sarah, Tim, Uma
CRD example: Randomly assign 10 teens to receive Formula A, and remaining 10 to receive B.
Restricted randomization example: We have prior knowledge that boys and girls are going to react differently to the acne medications, so we separate the 10 boys and 10 girls. Then, we randomly assign 5 boys to receive Formula A and 5 boys to receive Formula B. Repeat for the 10 girls.
Let us look at ways to create CR designs, and what random assignment attempts to achieve.
a) But, before we get into that, let us list some extraneous/nuisance variables that may be related to having acne.
GOAL: We want to use a method that will ensure that the assignments are made randomly and also that there are an equal number of subjects receiving Formula A and Formula B.
Method 1 – Coin Flipping Method: One way to randomly determine whether a subject should receive Formula A or Formula B, is to flip a coin, and
· If the outcome is “Heads”, the subject is assigned to Formula A
· If the outcome is “Tails”, the subject is assigned to Formula B
b) Explain why there is a deficiency in strictly using the flip of a coin to determine random assignments. Describe the problem that is keeping us from achieving our main goal.
Method 2: We will now look at a tactile method involving cards that we can use to randomly assign subjects to treatments.
Step 1: You should have received a stack of 20 cards, each with a participant’s name.
Step 2: Shuffle these 20 cards well, and randomly distribute them into two piles – with 10 cards in each pile. Let first pile be the Formula A recipients and the second pile be Formula B recipients.
c) Based on your results, determine the treatment allocation in the following table:
Subject / Gender / Age (yrs) / Diet / Formula A or BAmar / Male / 17 / Healthy
Becky / Female / 14 / Healthy /
Carol / Female / 16 / Healthy
Danny / Male / 15 / Not healthy
Elle / Female / 17 / Healthy
Frank / Male / 17 / Not healthy
Grant / Male / 19 / Healthy
Heather / Female / 13 / Healthy
Ian / Male / 17 / Healthy
Jake / Male / 15 / Healthy /
Katie / Female / 14 / Healthy
Lucy / Female / 18 / Not healthy
Maria / Female / 16 / Healthy
Nick / Male / 14 / Healthy
Oscar / Male / 16 / Healthy
Pedro / Male / 15 / Healthy
Rosa / Female / 18 / Not healthy
Sarah / Female / 18 / Not healthy
Tim / Male / 17 / Healthy
Uma / Female / 17 / Not healthy
Recall that we said earlier that random assignment attempts to create treatment groups that are as alike as possible except for the treatment they are receiving. Did your random assignment create experimental groups that are similar? Let us explore that.
There are 10 males and 10 females among the subjects.
d) Ideally, how many of the males would you like to have present in treatment group A, and how many in treatment group B?
e) According to your random assignment in 3(c), how many males are assigned to Formula A?
f) Combine your answer from 3(e) with the other workshop participants’ results and create a dotplot for the number of males on Formula A on the whiteboard. Sketch the dotplot here, and carefully label the axis.
g) Regarding the dotplot created in 3(f):
· At what number does this dotplot center? Does that make sense?
· Does random assignment always result in an equal number of males in both treatment groups A and B?
· What does this dotplot tell you about random assignment and gender distribution?
h) What about age? How does the average age of those in Formula A compare with those in Formula B? As per your random assignment in 3(c)and the information available on the cards about the teens’ ages, compute the following: