Worksheet 9 Principal Components Analysis and Discriminant Function Analysis

Bio 286Worksheet 10 – PCAregression and Clusteringpage 1

Worksheet 10 – PCA regression andCluster Analysis

1)PCA Regression – Here you are going to use PCA regression to test hypotheses concerning the relationship between degree of urbanization (as a continuous variable) and Military spending, Gross National Product, Birth rates, Death rates.

Bring up your saved data table from the last lab
First run a multiple regression on the relationship between POP_1983, POP_1986, POP_1990, BIRTH_82, DEATH_82, GNP_82, MIL and Urban_Metric. The Urban_Metric score indicates the degree of urbanization in a county: higher scores are more urbanized.
Use ANALYZE, FIT MODEL
Put the predictor variables in the CONSTRUCT MODEL EFFECTS window and Urban_Metric in the Y window
PERSONALITY should be STANDARD LEAST SQUARES
Run the model
Look at the table PARAMETER ESTIMATES and specifically at the VIF scores – any value >10 indicates significant colinearity. Is there any evidence of colinearity?
Given that there is evidence of colinearity one solution would be to use PCA regression. Give the results of question 1, we know that the original variables load up on two principal components. We also know the factors (PC’s) are independent so colinearity will not be an issue if we sue the factors as predictor variables.
Use ANALYZE, FIT MODEL
Put the Prin1 and Prin 2 inthe CONSTRUCT MODEL EFFECTS window and Urban_Metric in the Y window
PERSONALITY should be STANDARD LEAST SQUARES
Run the model
Look at the table PARAMETER ESTIMATES and specifically at the VIF scores – any value >10 indicates significant colinearity. Is there any evidence of colinearity?
Ok now interpret the results in terms of the original variables
Use GRAPH, GRAPH BUILDER
Put Prin 2 on the X axis and Urban_Metric on the Y
Click the upper icon showing the linear fit of a line to data
Interpret this results in terms of the original variables (that load on Prin 2) and also offer a general interpretation of the relationship. Remember the points are countries

2)Cluster analysis – this exercise is to show you how different distance measures have an effect on the clustering pattern

Use Analyze, Multivariate Methods, Cluster
Put POP_1983, POP_1986, POP_1990, BIRTH_82, DEATH_82, GNP_82, MILin the Y window
Put URBAN (red icon) in the LABEL window.
Use OPTION – Hierarchical
Click on STANDARDIZE DATA (which transforms the variable values to z scores)
Try the AVERAGE method first
Below the cluster diagram is a SCREE plot and a small diamond Icon that you can drag to show the number of cluster for a particular distance – try it out
If you want you can click on the CLUSTERING HISTORY icon to see how the cases were entered as clustering progressed
Now repeat the process for Centroid, Ward, Single and Complete. These methods are all described in the help file for CLUSTER
Notice the differences in clustering as a function of distance metric. There can be subtle or very large (especially for SINGLE Linkage) differences