Bio 286Worksheet 10 – PCAregression and Clusteringpage 1

Worksheet 10 – PCA regression andCluster Analysis

1)PCA Regression – Here you are going to use PCA regression to test hypotheses concerning the relationship between degree of urbanization (as a continuous variable) and Military spending, Gross National Product, Birth rates, Death rates.

  1. Bring up your saved data table from the last lab
  2. First run a multiple regression on the relationship between POP_1983, POP_1986, POP_1990, BIRTH_82, DEATH_82, GNP_82, MIL and Urban_Metric. The Urban_Metric score indicates the degree of urbanization in a county: higher scores are more urbanized.
  3. Use ANALYZE, FIT MODEL
  4. Put the predictor variables in the CONSTRUCT MODEL EFFECTS window and Urban_Metric in the Y window
  5. PERSONALITY should be STANDARD LEAST SQUARES
  6. Run the model
  7. Look at the table PARAMETER ESTIMATES and specifically at the VIF scores – any value >10 indicates significant colinearity. Is there any evidence of colinearity?
  8. Given that there is evidence of colinearity one solution would be to use PCA regression. Give the results of question 1, we know that the original variables load up on two principal components. We also know the factors (PC’s) are independent so colinearity will not be an issue if we sue the factors as predictor variables.
  9. Use ANALYZE, FIT MODEL
  10. Put the Prin1 and Prin 2 inthe CONSTRUCT MODEL EFFECTS window and Urban_Metric in the Y window
  11. PERSONALITY should be STANDARD LEAST SQUARES
  12. Run the model
  13. Look at the table PARAMETER ESTIMATES and specifically at the VIF scores – any value >10 indicates significant colinearity. Is there any evidence of colinearity?
  14. Ok now interpret the results in terms of the original variables
  15. Use GRAPH, GRAPH BUILDER
  16. Put Prin 2 on the X axis and Urban_Metric on the Y
  17. Click the upper icon showing the linear fit of a line to data
  18. Interpret this results in terms of the original variables (that load on Prin 2) and also offer a general interpretation of the relationship. Remember the points are countries

2)Cluster analysis – this exercise is to show you how different distance measures have an effect on the clustering pattern

  1. Use Analyze, Multivariate Methods, Cluster
  2. Put POP_1983, POP_1986, POP_1990, BIRTH_82, DEATH_82, GNP_82, MILin the Y window
  3. Put URBAN (red icon) in the LABEL window.
  4. Use OPTION – Hierarchical
  5. Click on STANDARDIZE DATA (which transforms the variable values to z scores)
  6. Try the AVERAGE method first
  7. Below the cluster diagram is a SCREE plot and a small diamond Icon that you can drag to show the number of cluster for a particular distance – try it out
  8. If you want you can click on the CLUSTERING HISTORY icon to see how the cases were entered as clustering progressed
  9. Now repeat the process for Centroid, Ward, Single and Complete. These methods are all described in the help file for CLUSTER
  10. Notice the differences in clustering as a function of distance metric. There can be subtle or very large (especially for SINGLE Linkage) differences