Q4.4The data was analyzed using PROC PRINCOMP in SAS. The scree plots based on the eigenvalues of the covariance (see Fig1) and correlation matrices (see Fig2) indicate that two principal components should be retained.

Fig1: Scree plot for the eigenvalues of the covariance matrix (MEAN-CORRELATED DATA)

Fig2: Scree plot for the eigenvalues of the correlation matrix (STANDARDIZED DATA)

The two principal components, according to the covariance and correlation matrices, account for 71.29% and 67.38% of the total variance, respectively.

(a) Based on the two retained principal components, the price index measures can be defined as follows.

For mean-corrected data (covariance matrix):

The first price index primarily represents the prices of bread and hamburgers, while the second price index primarily represents the prices of butter and apples.

For standardized data (correlation matrix):

The first price index primarily represents the prices of bread, hamburgers, and tomatoes while the second price index primarily represents the prices of apples.

(b)For mean-corrected data, Anchorage is the most expensive and San Diego is the least expensive city based on the first price index. Also, Kansas City is the most expensive and Buffalo is the least expensive city based on the second price index.

For standardized data, Anchorage is the most expensive and Buffalo is the least expensive city based on the first price index. Also, Houston is the most expensive and Boston is the least expensive city based on the second price index.

As can be seen from above, the most and least expensive cities are different for mean-corrected and standardized data. The choice of appropriate price indices should be based on whether or not all food items can be considered to be equally important. In the present case there is no reason to believe that the price indices should be affected by the variances in the prices of the different food items. Thus it is recommended that standardized data be used to compute the price indices.

(c) The scores (based on standardized data) for the two retained principal components were used to plot the cities as shown in Fig 3.

Fig 3: The scores of the cities for the first two principal components

Four distinct clusters can be identified visually. The clusters differ from each other with respect to both price indices. Cluster 1 consists of two cities that score high on both the price indices. Cluster 2 consists of cities that have a moderate score on the first price index and a moderate to low score on the second price index. Cluster 3 consists of cities that score moderate to low on the first price index and moderate to high on the second price index. Finally, cluster 4 consists of one city that scores low on both price indices.

The cities belonging to the 4 distinct clusters may be identified from their principal component scores given in Table.

Table 1: Rankings by the First Principal Component (STANDARDIZED DATA)

ObsCityPRIN1 PRIN2

1 Buffalo -2.63510 -1.16866

2 San Diego -2.12285 0.67935

3 Los Angeles -1.54583 1.17324

4 Milwaukee -1.28384 0.54389

5 St. Louis -1.25282 0.02051

6 Seattle -1.00151 0.33749

7 Cleveland -0.73717 0.27483

8 Minneapolis -0.72789 -0.60318

9 San Francisco-0.71329 1.31967

10 Houston -0.46158 1.34059

11 Kansas City -0.41894 -0.39474

12 Pittsburgh -0.32009 -1.08421

13 Detroit -0.19803 -1.53076

14 Washington 0.25131 0.73830

15 Cincinnati 0.26537 -0.78737

16 Atlanta 0.31672 -0.08010

17 Baltimore 0.35844 -1.10212

18 Dallas 0.38231 0.92354

19 Chicago 0.50829 1.29299

20 Boston 0.57396 -1.61813

21 New York 1.28901 -0.91857

22 Philadelphia 2.10142 -0.79552

23 Honolulu 2.79600 0.91037

24 Anchorage 4.57611 0.52860