Bio286 Worksheet 4: Multiple Regression, Non-Linear Regression Page 1

Bio 286 Worksheet 4 – Multiple Regression, Non-Linear Regression

1)Multiple regression – we are interested in assessing if there is a relationship between a focal species of limpet and other attributes of an intertidal area. Specifically we want to know if the density of our focal species is related to the food resource (diatom abundance), the density of other limpets (which may be competitors for food), Tide height, and density of predators. We randomly select 19 plots that are each 20 by 20 cm and then assess the variables described above. You may make the assumption that the survey was done correctly and that replicate plots are independent of each other.

  1. Open the file ‘Limpet_MR’. There should be 5 variables
  2. Diatom abundance: amount of food available in plot
  3. Focal – limpet: the abundance of focal limpet species in plot
  4. Other limpets: abundance of other limpets in plot
  5. Tide Height (cm) : tide height (in cm) of the experimental plot
  6. Predators: estimated number of predators per plot
  7. We think that we want to examine the individual relationships between the independent variables and the putative response variable (Focal-limpet). It seems that the proper way to do this is through the use of simple regression as all the variables are continuous.
  8. What is the Null Hypothesis?
  9. Go to ANALYZE, FIT Y BY X and put focal ‘focal limpet’ in the Y window and all the other variables in the X window. This will produce 4 graphs, each with Focal-limpet in the Y position and with the independent variables in the X position.
  10. Look carefully at each graph to assess if the relationship between Y and X is linear or not. Note that if there is no obvious relationship then the best fit line to the data is simply the mean of Y (Focal-limpet).
  11. Now click on the red triangle for BIVARIATE FIT… while simultaneously holding down the CTRL button (CMD for Macs). Then click on FIT MEAN. This will put in a line that is the mean of Focal-limpet.
  12. Repeat iii this time for FIT LINE. This will produce the best linear fit of Y to X.
  13. Below each graph will be the reports for the analyses. You want to examine the MEANS, which should be the same for all 4 reports. You also want to look at the p-values for all models. Note the slope of the relationship is shown in the PARAMETER ESTIMATE as the value next to the name of the independent variable (e.g. other limpets). The intercept is just above this estimate. Also note that the P-Value for the slope is the same as the P-Value for the model fit in the ANALYSIS OF VARIANCE table.
  14. Fill out the table below:

Independent Variable / Slope / P-Value / r2 for model
Diatom abundance
Other limpets
Tide Height
Predators
  1. What is the conclusion with respect to your null hypotheses?
  1. Upon reflection we think that we may not have done the most powerful analysis. We now think that a multiple regression analysis would have been better.
  2. First go to the PREFERENCES menu (for PC’s this is FILE, PREFERENCES; JMP > PREFERENCES or CMD-, on a Mac). Then to PLATFORMS, FIT LEAST SQUARES. Check SHOW VIF and AICc. This will make your default output include VIF (variance inflation) values and AICc (corrected Akaikes values).
  3. Go to ANALYZE, FIT MODEL. This is the most useful and flexible platform in JMP. Put Focal-Limpet in the Y window and all the independent variables in the CONSTRUCT MODEL EFFECTS window. Do this using the ADD button. Do not CROSS or NEST anything. Make sure the PERSONALITY (upper right) is set to STANDARD LEAST SQUARES although you should have a look at all the options and set the EMPHASIS to EFFECT LEVERAGE. Run the model.
  4. If you ever need to rerun a model in any platform you can hit the RECALL button on the platform and the last model run will come up. You can run it again or modify it to run again.
  5. Check assumptions (particularly for colinearityusing VIF, values above 10 indicate a problem)
  6. Compare the output to the table you made above
  7. One of the very nice things about simple regressions is that the plots of Y by X make sense (eg Focal-limpet vs diatom abundance). This is not the case of for multiple regression. If you want to plot data to show the relationship that was assessed in the multiple regression analysis you need to show what are called partial residuals. The way to do this is shown below
  8. Click on the red triangle then on SAVE COLUMNS then on EFFECT LEVERAGE PAIRS. This step will add a series of columns to your data set that show the scaled residual error after accounting for all the independent variables in the model except the one of interest. For example lets assume that you were interested in the effect of other limpets on the Focal limpet after accounting for the other independent variables (ie: diatom abundance, tide height and predators). The p-values in the multiple regression allow assessment of this relationship and the new columns allow you to look at his relationship. Note that these graphs are not the same as plots that depict the contribution to the full multiple regression model.
  9. The easiest platform to use is GRAPH BUILDER. You only want to use the new columns that begin with “Y leverage’ (float over the variable name to see the full name). These variables should be dragged to the Y AXIS. Once you do this look at the rest of the name of the variable. For example if the name of variable is ‘Y leverage of other limpets for Focal-limpet” then drag ‘other limpets’ to the X axis. Click on the icon that looks like a trendline (linear) to see the relationship. What this shows is the relationship between Focal-limpet and other limpets after accounting for diatom abundance, tide height and predators. These graphs should make sense with respect to the p-values and sign of the estimated slopes in the multiple regression.
  10. Finally, I just went to another sample area and came up with the following data
  11. Diatom abundance=125
  12. Tide Height=55
  13. Other Limpets=43
  14. Predators=5.3

What is the predicted number of Focal-limpets?

2)Non-Linear Regression – open the file SpeciesArea

  1. The data here are measurements of the number of bacterial species found as a function of area sampled (mm sq). It is thought that the relationship should be a saturated one (rise to an asymptote), but we want to assess 4 different models for fit. A really good question here is what is the hypothesis or set of hypotheses to be tested (if any). Perhaps a better question is which of the models assessed fit the data best. We are going to ask this question in the worksheet. This is NOT to say there are not good inferential questions that could be addressed (meaning using a p-value), but theses should reflect specific hypotheses concerning, for example, slopes or tension functions. In order to assess model fit you should use the Akaike Information Criterion – corrected (AICc). Remember this is an assessment of the fit of the model after accounting for the number of fitted parameters. This gives an unbiased comparison of models. The lower the value the better. An important question is what degree of difference is big enough to separate lead to a conclusion that the lower value model is superior. This is a huge question in statistical model selection – for this worksheet we simply assume that the model with the lowest AICc value is the best. I am giving you one value, the MEAN model, but you need to fill in the rest of the table to make your final decision. I want you to assess three additional model; (1) linear fit, (2) Gompertz 3 parameter model, and (3) a Biexponential 5 parameter model.

  1. First go to the graph builder and plot the relationship so you know what it looks like (Area = x axis and Species = Y axis).
  2. Start with the linear regression. Use ANALYZE, FIT MODEL. Put Species in the Y box and Area in the CONSTRUCT MODEL EFFECTS box. Make sure the PERSONAILTY and EMPHASIS are STANDARD LEAST SQAURES and EFFECT LEVERAGE respectively. Run the model. Look down the output and copy the AICc score to the table above. Remember we don’t care about p-values here all we want to do is compare models.
  3. Now go to ANALYZE, MODELLING, NONLINEAR. Put Species in the Y box and AREA in the X box. In the new graph window you can click on the red triangle for FIT CURVE, then SIGMOID CURVES then GOMPERTZ 3 P (which is a three parameter GOMPERTZ model (asymptotic). Now repeat the series of steps but now us EXPONENTIAL GROWTH AND DECAY then BIEXPONENTIAL 5p (which is a 5 parameter model). You should have a figure with two curves and fit characteristics on top (including AICc). Copy those values to the table above.

The key take home messages from the nonlinear portion of the worksheet are that (1) Model selection should usually be done using information criteria like AICc. The main reason for this is that often you have a general idea about shape of the curve but the equation is often not part of the hypotheses because many equations can produce similar curves. Note if you do have specific hypotheses concerning values of the parameters you should use the confidence level approach. For example the output below is from the Gompertz 3 p where a= asymptote, b = Growth rate and c=inflection rate from the model shown in the table above. Assume you had a null hypothesis that the Growth rate was 0.002. Using the confidence intervals below you could reject this hypothesis.

(2) There is not a clear general null hypothesis concerning nonlinear regressions. This is an element of these models that is unlike most other ordinary statistical procedures you will use. (3) In order to test hypotheses, you will need to craft the hypotheses carefully and in particular craft a reasonable null model that is testable – for example is the relationship = to a linear fit.