* Lab 1: Principal Components Analysis
* Multivariate analysis @Department of Statistics, SU;
* Chengcheng Hao;
* ;
* 9-12 2011;
* Instructions:
* In each lab, one example and one exercise from the textbook will be introduced;
* The exercise should be done in SAS individually or grouply on the class.
You may of course discuss with each other or find help from the lab-consultant
but not copy someone else's code;
* The results of the exercise should be sketched and commented, and
shown to the lab-consultant in some way;
* The solution (may not be the standard solution) will be distributed and dicussed
in the end of the lab;
*******************************************************
* Chapter 4 Principal components analysis *
*******************************************************;
odshtmlbody='HT11_MM1_report.html'style=sasweb;
odsgraphicson;
*******************************************
* Example:4.3 in Page 67; *
** Data set:Table 4.7, Page 71; *
** Results:Exhibit 4.2 4.3; *
*******************************************
* This example aims to develop a Consumer Price Index (CPI) of US based
on the esimated retail food prices by cites. For each of 23 cities, the
prices of five kinds of food are measured. The developed CPI should reflect
most information given by the five prices, but contain a lower number
of variables;
* The data, which are reported by US Bureau of Labor Statistics (1973),
are input in SAS by the following codes;
data table4_7;
input city $13. Bread Burger Milk Oranges Tomatoes;
datalines;
ATLANTA 24.5 94.5 73.9 80.1 41.6
BALTIMORE 26.5 91.0 67.5 74.6 53.3
BOSTON 29.7 100.8 61.4 104.0 59.6
BUFFALO 22.8 86.6 65.3 118.4 51.2
CHICAGO 26.7 86.7 62.7 105.9 51.2
CINCINNATI 25.3 102.5 63.3 99.3 45.6
CLEVELAND 22.8 88.8 52.4 110.9 46.8
DALLAS 23.3 85.5 62.5 117.9 41.8
DETROIT 24.1 93.7 51.5 109.7 52.4
HONALULU 29.3 105.9 80.2 133.2 61.7
HOUSTON 22.3 83.6 67.8 108.6 42.4
KANSAS CITY 26.1 88.9 65.4 100.9 43.2
LOS ANGELES 26.9 89.3 56.2 82.7 38.4
MILWAUKEE 20.3 89.6 53.8 111.8 53.9
MINNEAPOLIS 24.6 92.2 51.9 106.0 50.7
NEW YORK 30.8 110.7 66.0 107.3 62.6
PHILADELPHIA 24.5 92.3 66.7 98.0 61.7
PITTSBURGH 26.2 95.4 60.2 117.1 49.3
ST LOUIS 26.5 92.4 60.8 115.1 46.2
SAN DIEGO 25.5 83.7 57.0 92.8 35.4
SAN FRANCISCO 26.3 87.1 58.3 101.8 41.5
SEATTLE 22.5 77.7 62.0 91.1 44.9
WASHINGTON DC 24.2 93.8 66.0 81.6 46.2
;
procprincompdata=table4_7
out=pci_mc
cov
plots=(score(ncomp=2) scree);
title'PCA of Food Price Data: mean-corrected data (covariance matrix)';
var Bread Burger Milk Oranges Tomatoes;
id city;
run;
/* COV: to compute the principal components from the covariance matrix (mean-correlated data).
If you omit the COV option, the correlation matrix is analyzed (standarlized data).
PLOTS=SCORE: to request a scatter plot of the first principal component
against the second principal component
PLOTS=SCREE: to request a scree plot */
* A problem in the above analysis is that the first principal component, PRIN1,
is affected by oranges too much. One way to fix it is by computing the principal
component from the correlation matrix instead;
procprincompdata=table4_7
out=pci_sd
plots=(score(ncomp=2) scree);
title'PCA of Food Price Data: standardized data (correlation matrix)';
var Bread Burger Milk Oranges Tomatoes;
id city;
procprint; run;
procsortdata=pci_sd;
by prin1;
procprint;
id city;
var prin1 prin2;
title2'Rankings by the First Principal Component (correlation matrix)';
run;
procsortdata=pci_sd;
by prin1;
procprint;
id city;
var prin1 prin2;
title2'Rankings by the Second Principal Component (correlation matrix)';
run;
*************
* Exercise *
*************;
* Read and answer the Question 4.4 in Page 82;Comment on your SAS results;
* The data have already been input;
data ex4_4;
input City $14. Bread Hamburger Butter Apples Tomatoes;
datalines;
Anchorage 70.9 135.6 155.0 63.9 100.1
Atlanta 36.4 111.5 144.3 53.9 95.9
Baltimore 28.9 108.8 151.0 47.5 104.5
Boston 43.2 119.3 142.0 41.1 96.5
Buffalo 34.5 109.9 124.8 35.6 75.9
Chicago 37.1 107.5 145.4 65.1 94.2
Cincinnati 37.1 118.1 149.6 45.6 90.8
Cleveland 38.5 107.7 142.7 50.3 83.2
Dallas 35.5 116.8 142.5 62.4 90.7
Detroit 40.8 108.8 140.1 39.7 96.1
Honolulu 50.9 131.7 154.4 65.0 93.9
Houston 35.1 102.3 150.3 59.3 84.5
Kansas City 35.1 99.8 162.3 42.6 87.9
Los Angeles 36.9 96.2 140.4 54.7 79.3
Milwaukee 33.3 109.1 123.2 57.7 87.7
Minneapolis 32.5 116.7 135.1 48.0 89.1
New York 42.7 130.8 148.7 47.6 92.1
Philadelphia 42.9 126.9 153.8 51.9 101.5
Pittsburgh 36.9 115.4 138.9 43.8 91.9
St. Louis 36.9 109.8 140.0 46.7 79.0
San Diego 32.5 84.5 145.9 48.5 82.3
San Francisco 40.0 104.6 139.1 59.2 81.9
Seattle 32.2 105.4 136.8 54.0 88.6
Washington 31.8 116.7 154.81 57.6 86.6
;
run;
* Insert your codes here;
odsgraphicsoff;
odshtmlclose;