Worksheet 6 Single Factor ANOVA

Bio286 Worksheet 5 page 1

Bio286: Worksheet 5 – Single factor ANOVA

1. There is concern about the recruitment failure of Valley Oak in the Central Valley. One hypothesis that has arisen is that grazer greatly affects new seedlings. An experiment was done to test the effect of grazing on the establishment of new seedlings. Twenty-five plots were randomly assigned to 5 treatments (5 plots for each treatment). Open the file “Grazers and Oak Seedlings.jmp” The five treatment were:

Full Cage -both deer and rabbits excluded
Open – no manipulation
Cage Control – a partial cage that does not exclude either deer or rabbits, but which should have all other attributes of a cage (that could cause artifacts
Deer – Cages that exclude rabbits
Rabbits – cages that exclude Deer.

The variable N_SEEDLINGS is number of seedlings in the plot after recruitment season. The following hypotheses were posed and they must be tested sequentially:

H1: There are no cage artifacts. If there are artifacts then no further tests should be done because the experiment may be confounded. If there is no evidence of cage artifacts then go to H2
H2: Grazers affect seedling establishment. Here the intent is to compare total exclusion to no exclusion (what is the best estimate of the latter treatment). If this is not true then stop. If it is true then go on to H3:
H3: Total exclusion of grazers is more effective than partial exclusion. Think carefully about this comparison. If H3 is correct go to H4
H4: Rabbits inhibit seedling establishment more than Deer.

First produce a figure that shows the average number of seedlings as a function of Treatment. Try to get the sites arranged in a way that makes sense. Ordering for character variables is by default alphabetical. This is often not an appropriate order for graphs and tables. In JMP you can modify order by right clicking on the variable names in the data window – here do this with TREATMENT. Then click on COLUMN PROPERTIES then ROW ORDER LEVEL. Make sure VALUE ORDERING is in the window below Column Properties. Click on VALUE ORDERING. Now a new window will open up sort the TREATMENTS in the following order: Full Cage, Deer, Rabbit, Open, Cage Control. Once you do this click ok and make a bar graph.
Run an Analysis of Variance (I advise using the FIT MODEL platform) and then test each of the posed hypotheses. Think carefully about each hypothesis and what is the best test of it. .
Make sure the data meet the assumptions of ANOVA – use Box plots (GRAPH BUILDER), probability plots (DISTRIBUTION) and calculate variances as a function of ‘TREATMENT’ You may be confused by the output – ask if you have questions.
Run the analysis – what are the results? Here use FIT MODEL, put ‘TREATMENT’ in the MODEL EFFECTS window and ‘N-SEEDLINGS in the Y Window, make sure the Personality is STANDARD LEAST SQUARES and EMPHASIS is EFFECT LEVERAGE. Run the model.
Test the posed hypothesis:
Indicate how you tested the hypothesis. You should use CONTRASTS! Click on TREATMENT in the output window then on LSMEANS CONTRAST. Now you need to look at the contrast syntax from the lecture or ask. Repeat the contrast procedure for all hypotheses. Think carefully about the hypotheses and what groups should be included in the comparison.

c. Now run pairwise comparisons using the Tukey all pairwise option. In the ANOVA output window (FIT LEAST SQUARES) click on TREATMENT then on LSMEANS TUKEY HSD. This will allow you to do Turkey test on all pairwise comparisons of treatments. The top table gives the difference in means, the standard error and the confidence interval. The bottom table gives the groups of treatments (by letter) that are not statistically different at the prescribed critical alpha.

Would this method have been able to test the posed hypotheses?
Would your results have differed if you had used the pairwise protocol vs contrasts??

2. Give me an example of

a. a random effects model

A fixed effects model

For both tell me what the question is, what the predictor groups are (the treatments) what the scale of inference is and what counts as a replicate observation

Randomized block (RB) or completely randomized (CR) design. In the absence of other information (like a preliminary survey) would you use a randomized block or completely randomized design? Why? Remember the tradeoffs.
We are interested in determining if oak seedling growth is higher with supplemental water. We will have two treatments with and without supplemental water. We will be using potted seedlings in a greenhouse. We only have room for 20 pots. We could use a completely randomized design and need to choose between a CR (10 pots of each treatment randomly placed in the greenhouse) and a RB (10 blocks of 2 pots, one watered and one not)
Based on the results of the greenhouse experiment we want to assess the question outdoors. We decide to do the experiment in a field near the campus. Again we have room only for 20 planted seedlings and need to decide on a CR design (10 plants of each treatment randomly placed in the field) and a RB (10 blocks of 2 planted seedlings, one watered and one not).
We have found the same results in the experiments done in the greenhouse and field. We now are ready to try the experiment in more realistic settings. We again will use 20 seedlings but now want to plant them near adult trees. Here our choice is to plant the seedling near select 20 randomly selected trees and then to randomly assign 10 to receive water and 10 not to. The second option would be to plant two seedlings near each of 10 randomly select trees then to randomly assign one of each pair to receive supplemental water

Randomized block design – open ‘sea star colors –two sample’. This is a file with the density of orange and purple seastars sample in central California. Sites were selected randomly hence we are dealing with a random factor. Since both purple and orange individual were collected at all sites – sites can be considered blocks containing both color groups (treatments). Our null hypothesis is that the density of purple stars = density of orange stars. We really do not have a hypothesis concerning particular sites (if you did then site would be considered a fixed effect). What we really want to do is account for the variability associated with sites to allow a better assessment of the effect of color. We know that density data are often log normal and indeed these data are also. You should use the variable ‘LNNUMBER’ for all analyses.
You will need to stack the dataset because it is constructed in way that is intuitive but not appropriate for analysis/ Go to TABLES, STACK and put L_Orange and L_Prurple in the STACK COLUMNS window and type “LNNUMBER” in the STACKED DATA COLUMN window and type “COLOR” in the SOURCE LABEL COLUMN window. Now click on SELECT in the lower left in the area titled NON-STACKED COLUMNS. Selce “Site” and “Latitude”. Use the CONTROL button to select multiple variables.
First run the model without incorporation of the random effect. Use ANALYZE, FIT MODEL. Put LNNUMBER in the Y window and COLOR in the MODEL EFFECTS window. Make sure the personality is STANDARD LEAST SQUARES and the EMPHASIS is EFFECT LEVERAGE. Run the model. What is the result with respect to the hypothesis?
Now run the same model ( note you can use RECALL to bring up the former analysis in the FIT MODEL window) but add ‘SITE’ then while ‘SITE’ is highlighted click attributes and then RANDOM. The model will automatically shift to METHOD = REML. Change it back to EMS. Run the model. What is the result with respect to the hypothesis? Also how much of the variability in the data is accounted for by the random term SITE?
Repeat b but change METHOD to REML (restricted maximum likelihood). What is the result with respect to the hypothesis? Also how much of the variability in the data is accounted for by the random term SITE?
Repeat c but change PERSONALIZTY to MIXED MODEL. This will change the model window. Make sure SITE has moved to the RANDOM EFFECTS window. What is the result with respect to the hypothesis? Also how much of the variability in the data is accounted for by the random term SITE?
What is your conclusion concerning the: (1) incorporation of random effects in a statistical model and (2) use of EMS, REML and mixed model approaches when random effects are present. Note the answer to #2 will likely change if the design is unbalanced.