G632, 2002: MULTIVARIATE PROJECT IN PALEOECOLOGY

SOFTWARE

In this course we will primarily use two different statistical packages that operate in a Windows PC environment. These are NTSYS-pc, which is the Numerical Taxonomy System for the PC, and MINITAB for Windows, which is an all purpose computational and statistical software package. NTSYS is the best program available for Cluster Analysis and Multidimensional Scaling. MINITAB has limited options for Cluster Analysis and it will not do Multidimensional Scaling. We will use MINITAB primarily for Principal Components Analysis, Factor Analysis, and Discriminant Analysis. You may also find it useful to use the spreadsheet program Excel for managing data sets and for making various plots not available in MINITAB or NTSYS.

DATA SETS

Each student must find their own multivariate data set for their multivariate project. Your data set should be suitable for applying all the multivariate techniques we will cover in class. Ideally the data set should be too large to readily recognize patterns without the use of multivariate analysis. Generally this means greater than 20 observations and greater than 5 variables. There must be more observations than variables. The data must be in the form of interval data or possibly numerical rankings. The data cannot be exclusively categorical or presence/absence (binary). Your data set must be approved by the instructor.

EXAMPLE DATA SETS

Kammer, T.W. and W.I. Ausich. 1987. Aerosol suspension feeding and current velocities: distributional controls for late Osagean crinoids. Paleobiology, 13:379-395. Provides examples of Multi-Dimensional Scaling, Principal Components Analysis, and Discriminant Analysis.

Kammer, T.W., T.K. Baumiller, and W.I. Ausich. 1998. Evolutionary significance of differential species longevity in Osagean-Meramecian (Mississippian) crinoid clades. Paleobiology, 24:155-176. See data in Table 1 and discriminant analysis in Figure 3.

Kammer, T.W. and A.M. Lake. 2001. Salinity ranges of Late Mississippian invertebrates of the central Appalachian basin. Southeastern Geology, 40:99-116.

Savarese, M, L.M. Gray, and C.E. Brett. 1986. Faunal and lithologic cyclicity in the Centerfield Member (Middle Devonian: Hamilton Group) of western New York: a reinterpretation of depositional history. Bulletin of the New York State Museum, 457:32-56.

SOME DATA SETS AVAILABLE FOR STUDENT USE

Baarli, B.G. 1987. Benthic faunal associations in the Lower Silurian Solvik Formation of the Oslo-Asker districts, Norway. Lethaia, 20:75-90. transcribe data from Fig. 4 for a data set.

Brower, James. Data sets for the Middle Devonian Hamilton Group from New York state. Unpublished data available from Dr. Kammer.

Fursich, F.T. et al. 2001. Comparative ecological analysis of Toarcian (Lower Jurassic) benthic faunas from southern France and east-central Spain. Lethaia, 34:169-199. Data set in Appendix.

Holterhoff, P.F. 1996. Crinoid biofacies in Upper Carboniferous cyclothems, midcontinent North America: faunal tracking and the role of regional processes in biofacies recurrence. Palaeogeography, Palaeoclimatology, Palaeoecology, 127:47-81. Data set on p. 78.

Holterhoff, P.F. 1997. Filtration models, guilds, and biofacies: Crinoid paleoecology of the Stanton Formation (Upper Pennsylvanian), midcontinent, North America. Palaeogeography, Palaeoclimatology, Palaeoecology, 130:177-208. The data set on p. 205 is a subset of the data from the 1996 paper by Holterhoff.

Imbrie, J. and E.G. Purdy. 1962. Classification of modern Bahamian carbonate sediments. American Association of Petroleum Geologists Memoir 1, p. 253-272. Data in Table 1.

Li, C. and B. Jones. 1997. Comparison of foraminiferal assemblages in sediments on windward and leeward shelves of Grand Cayman, British West Indies. Palaios, 12:12-26.

McGhee, G.R. 1976. Late Devonian benthic marine communities of the central Appalachian Allegheny Front. Lethaia, 9:111-136. Raw data in Table 2.

Miller, A.I. 1988. Spatial resolution in subfossil molluscan remains: implications for paleobiological analyses. Paleobiology, 14:91-103. Dr. Kammer has a copy of the raw data (may have problems with highly correlated variables when doing discriminant analysis).

Nuhfer, A.T. 1979. Lateral variations in lithofacies and biofacies of the Ames Member (Conemaugh, Pennsylvanian) near Morgantown, West Virginia. M.S. thesis, WVU, 180 p. Data in Table 6. This data set should be collapsed into supergeneric groupings.

Patzkowsky, M.E. 1995. Gradient analysis of Middle Ordovician brachiopod biofacies: biostratigraphic, biogeographic, and macroevolutionary implications. Palaios, 10:154-179. Raw data in Table 1. (Very difficult to interpret without detailed knowledge of brachiopods.)

Patzkowsky, M.E., and S.M. Holland. 1999. Biofacies replacement in a sequence stratigraphic framework: Middle and Upper Ordovician of the Nashville Dome, Tennessee, USA. Palaios, 14:301-323. Data set of 100 samples with 35 genera in Appendix 1.

MULTIVARIATE TECHNIQUES: REFERENCES

GENERAL

Davis, J.C. 1986. Statistics and Data Analysis in Geology. John Wiley and Sons, New York, 646 p. QE48.8 .D38 1986 (Kammer has a copy)

Matrix Algebra, p. 107-140.

Discriminant Functions, p. 478-491

Cluster Analysis, p. 502-515

Eigenvector Methods, Principal Components and Factor

Analysis, p. 515-562

Dillon, W.R. and M. Goldstein. 1984. Multivariate Analysis, Methods and Applications. John Wiley and Sons, New York, 587 p. See Chapter 4 for discussion of Multi-Dimensional Scaling. (Kammer has a copy)

Gauch, H.G. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press, Cambridge, 298 p. (Kammer has a copy)

Introduction, p. 1-42

Ordination, p. 109-172

MINITAB Reference Manual. Use the Help function, including the StatGuide.

NTSYS Reference Manual. Use the Help function, including Contents and Index.

Pielou, E.C. 1984. The Interpretation of Ecological Data. John Wiley and Sons, New York, 263 p. QH541.15 .S72P54

Reyment, R.A. 1991. Multidimensional Paleobiology. Pergamon Press, Oxford, 377 p. (Kammer has a copy)

Swan, A.R.H. and M. Sandilands. 1995. Introduction to Geological Data Analysis. Blackwell Science. 446 p. QE33.2.S82 S93 1995 (Kammer has a copy)

CLUSTER ANALYSIS

Archer, A.W. and C.G. Maples. 1988. Monte Carlo simulation of selected binomial similarity coefficients (I): effect of number of variables. Palaios, 2:609-617.

Archer, A.W. and C.G. Maples. 1988. Monte Carlo simulation of selected binomial similarity coefficients (II): effect of sparse data. Palaios, 3:95-103.

Archer, A.W. and C.G. Maples. 1989. Response of selected binomial coefficients to varying degrees of matrix sparseness and to matrices with known data interrelationships. Mathematical Geology, 21:741-753.

Cheetham, A.H. and J.E. Hazel. 1969. Binary (presence-absence) similarity coefficients. Journal of Paleontology, 43:1130-1136.

Clifford, H.T. and W. Stephenson. 1975. An Introduction to Numerical Classification. Academic Press, 229. QH83 .C565

Hazel, J.E. 1970. Binary coefficients and clustering biostratigraphy. Geological Society of America Bulletin, 81:3237-3252.

Hohn, M.E. 1976. Binary coefficients: a theoretical and empirical study. Mathematical Geology, 8:137-150.

Mello, J.F. and M.A. Buzas. 1968. An application of cluster analysis as a method of determining biofacies. Journal of Paleontology, 42:747-758.

Romesburg, H.C. 1984. Cluster Analysis for Researchers. Lifetime Learning Publications, 334 p. QA278 .R66

Valentine, J.W. and R.G. Peddicord. 1967. Evaluation of fossil assemblages by cluster analysis. Journal of Paleontology, 41:502-507.

Zenetos, A. 1991. Re-evaluation of numerical classification methods for delimiting biofacies and biotopes in an estuarine environment. Lethaia, 24:13-26.

MULTI-DIMENSIONAL SCALING

Dillon, W.R. and M. Goldstein. 1984. Multivariate Analysis, Methods and Applications. John Wiley and Sons, New York, 587 p. See Chapter 4 for discussion of Multi-Dimensional Scaling.

Kenkel, N.C. and L. Orloci. 1986. Applying metric and nonmetric multidimensional scaling to ecological studies: some new results. Ecology, 67:919-928.

Michen, P.R. 1987. An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio, 69:89-107.

PRINCIPAL COMPONENTS/FACTOR ANALYSIS

Gould, S.J. 1981. Correlation, cause, and factor analysis, p. 239-255, In The Mismeasure of Man. W.W. Norton and Co., New York, 352 p. Excellent disucssion of factor analysis.

Joreskog, K.G. , J.E. Klovan, and R.A. Reyment. 1976. Geological Factor Analysis. Elsevier Publishing Co., 178 p., QE33.3 .M3J67

Klovan, J.E. 1968. Selection of target areas by factor analysis. Western Miner, February 1968, p. 44-53.

Klovan, J.E. 1975. R and Q-mode factor analysis, and a primer on matrix algebra, p. 21-69. In R.B. McCammon (ed.), Concepts in Geostatistics. Springer-Verlag Publishing Co. QE33.2 .M3C66

PROJECT ON MULTIVARIATE ANALYSIS

OF PALEOCOLOGIC DATA

Each student will work with a real data set of their own choosing. Use MINITAB and NTSYS-pc to perform the following analyses of your data set.

1. Perform a Q-mode (samples) cluster analysis of the data set to initially classify your data. Study the results and define as many groupings as you think are geologically significant. How would you interpret these sample groupings? You may need to try a variety of similary coefficients, or even transform the data set, to get interpretable results. Explain which similarity coefficient works best and why. If you transformed the data set first, explain what you did and your justification for doing so.

2. Perform Q-mode multidimensional scaling (MDS) on the samples. Use the same similarity coefficients and data transformation you used for the Q-mode cluster analysis. Does the Q-mode MDS provide a better, or poorer, way to discover patterns in your data set as compared to cluster analysis?

3. Perform an R-mode (variables) cluster analysis of the data set. Use the Pearson correlation coefficient for similarity. Study the results and define as many groupings as you think are geologically significant. How would you interpret these variable groupings? How well does cluster analysis handle negative correlations between variables? Is this technique appropriate if there are strong negative correlations?

4. Perform R-mode multidimensional scaling (MDS) on the variables. Use the Pearson correlation coefficient for similarity. Compare the results of this ordination technique with the results from R-mode clustering. Does the R-mode MDS provide a better, or poorer, way to discover patterns in your data set?

5. Generate a correlation matrix of the variables. Which variables show strong positive or negative correlations? Use PLOT to make bivariate plots of those variable pairs with strong positive or negative correlations. Interpret your results and comment about the relationships between the variable pairs. This correlation matrix is the starting point for Principal Components Analysis (PCA) and Factor Analysis.

6. Calculate the principal components of the data set. How many components have an eigenvalue greater than 1.0 (or almost 1.0)? List the eigenvalues. How much variance is accounted for by each component? What is the total variance explained by all the components with eigenvalues greater than 1.0?

7. Perform Factor Analysis:

A. Principal Components Mode: Produce plots of the loadings of the original variables on those principal components with eigenvalues greater than one. Provide interpretations of these plots.

B. Rotation: Rotate the principal components using the VARIMAX method. What are the eigenvalues for the rotated principal components? Compare these to the eigenvalues of the unrotated principal components; comment. Plot the loadings of the original variables on the rotated factors.

C. Plot the rotated factor scores of each observation against the rotated factors. You have just ordinated your data. Interpret the results.

8. Perform an iterative discriminant analysis on your data set. Test the groupings defined by Q-mode cluster analysis, Q-mode MDS, and ordination of samples/observations from factor analysis. There may be more than one way to subdivide the data set into groups or classes. Do the different models of groupings all make sense, or are there some sets of groupings, or even one set of groupings that make the most sense?

Perform discriminant analysis on each of your models of groupings that make sense. Are any of the samples/observations misclassified? If so, should you redefine your model of groupings?

Try to determine which variables are the best predictors of correct groupings. Through an iterative process try different combinations of variables to see which ones discriminate between groups and which ones do not. What are the minimum variables needed to discriminate between the groups you have defined? In trying to determine which variables are the best predictors, you might test the variables which have the highest loadings on each of the rotated factors from factor analysis.

After you have chosen the variables that are the best predictors of groupings, explain their significance. One goal of discriminant analysis is to determine the minimum number of variables needed to classify observations into some given model of groupings. This can then save a lot of work if unnecessary variables don’t need to be measured.

9. SYNTHESIS. Use all the results from your multivariate analysis to create a list of comprehensive conclusions. Integrate your results into a coherent explanation of the underlying factors controlling the structure of your data set.

10. WRITTEN REPORT. Present your analysis in a written report including the following sections:

A. Abstract

B. Introduction - A brief description of the source of the data set and the geologic/geographic setting.

C. Presentation of plots from computer runs - Clearly label what each plot represents (Principal Components; Q-mode cluster analysis; etc.). Also label each point on the plots where relevant. Write/draw interpretations directly on the plots.

D. Discussion of the results from the various techniques.

E. Conclusions

F. References cited

Due: April 1, 2002