Factor Analysis (Wilson 2003)

Lab Activities

This lab exercise continues with the multivariate analysis of data presented by Davis in Table6-7.MTW (available from folder on your H:\Drive). Bring up MiniTab and open Table6-7.MTW. Today we'll return to cluster analysis and undertake an R-Mode analysis of the water quality data (cluster by variables).

Cluster Analysis:

1)From the tool bar select Stat then multivariate then cluster variables.

2)Delete rows 21 through 46 so that only the variables associated with group A remain.

3)Add all variables into the variables (or distance) matrix.

4)Compute using the correlation distance measure.

5)Use a complete linkage approach.

6)Partition by clusters and let the number of clusters equal 2 (this variable doesn't actually seem to affect the outcome).

7)Show dendrogram

8)OK.

9)Repeat for at least one other linkage approach.

10)Reopen the worksheet and delete rows 1 through 20 and then the last 6 observations (those associated with the reconnaissance samples). The remaining observations were all obtained from the unproductive mining areas. Repeat the same cluster variables analysis that you ran above and compare

11)What are the distinguishing differences between the variable interrelationships associated with the productive and non-productive mining areas?

Optional exercise - Basic Statistics - Correlation:

1)To examine the correlation matrix directly use MiniTab's Stat - Basic Statistics - Correlation option.

2)Run this on the collection of samples from A. Between what variables do you find the highest correlation (choose 4 or 5)? Consider possible interrelationships to the dendrogram you generated above.

Factor Analysis:

1)From the tool bar select Stat then multivariate then Factor Analysis.

2)Select all variables - Ti through Au - for analysis.

3)Under number of factors to extract enter 2.

4)Use the principal components method of extraction.

5)Use varimax rotation approach (see definition 2nd page)

6)Accept the defaults under Options.

7)Under graphs select all three plots.

8)Under storage place the scores in columns c15-c16.

9)OK

10)Generate the score plot using MiniTab’s Graph option. Label points using the Group column.

Do the following:

  1. Cluster Analysis: Hand in dendrograms from cluster by variables exercise and note unique differences in the associations between productive and non-productive areas. You can write you comments directly on the dendrograms.
  2. Factor analysis: Make score plot for factors 1 and 2. Use A, B, C labels to distinguish between scores associated with productive, non-productive, and unknown areas.
  3. Does factor analysis suggest that certain members of Group C may be productive heavy metal mining areas and that others may not be? (Refer to labeled figure).
  4. Are the results of factor analysis consistent with those obtained from discriminant and cluster analysis?

Terms

Factor Loadings represent the degree to which each of the variables correlates with each of the factors.

Scores represent the value of the sample (object, individual, area, etc.) on each of the derived factors. Factors can be thought of as additional variables. An object’s score on a factor represents a weighted combination of its scores on each of the input variables. Its usage is similar to that in discriminant analysis.

Communality the proportion of the total variance associated with each variable that is accounted for by the factors.

Equimax rotation rotates the loadings so that a variable loads high on one factor and low on the others.

Varimax rotation rotates the loadings so that the variance is a maximum along one factor.