CLASS 18 ASSIGNMENT
Statistics in Business
Spring 2012
(Due at the beginning of Class 18)
This assignment looks at CEO compensation. A data set is provided giving annual compensation amounts for 1,579 US CEOs chosen because they head companies in the largest US Industries (largest as measured by number of public companies). The data set is named “Class 18 Assignment Data CEO Compensation” and is linked to the course assignment page. The compensation amountsin the data set are for either of 2007 or 2008. The data contains 1,579 elements and variables ID (1 through 1,579), Ticker, Industry, Company, CEO Name, Year (2007 or 2008), Compensation (in dollars), and S&P 500. This last variable is “Yes” if the company is in the S&P500 and “No” if not.
We will pose a series of questions about the population of CEOs, where population is taken broadly to include the process the generated the sample data we happened to have observed. In responding to these questions, state your hypotheses, give your test-statistics, and your conclusion.
1. In general, it was believed the 15% of US population of public companies are in the S&P 500. What proportion of the sample companies are in the S&P 500? And since our data is for companies in large industries, we might expect the sample proportion to exceed 15%. Is the sample proportion significantly higher than15%?
Of the 1,579 companies in the sample, 219 were in the SP500. We used a pivot-table to get that number.
Ho: P=0.15. Ha:P>.15.
The sample proportion of companies is 219/1579 = 0.139 is LOWER than 0.15. This means the data came out in the “wrong” or “unexpected” tail. So…..the conclusion is that we cannot reject H0 in favor of our Ha (because we got the Ha “wrong’). No further analysis is required.
For teaching purposes, we go ahead and show how to calculate the p-value if Ha had been P<0.15. Using the binomial distribution, the p-value would be BINOMDIST(219,1579,.15,TRUE) = 0.110. Using the normal approximation to the binomial, Z-statistic is (0.139-0.15)/(0.15*0.85/1579)^.5 = -1.25. NORMSDIST(-1.25,true) = 0.104….almost an identical p-value. We do not reject H0 even in favor of the Ha consistent with the data. The sample proportion is not statistically significantly lower than 0.15 (and it is certainly not statistically significantly higher).
2. Assuming our data is a random sample of some population; do the data show a significant relationship between industry and being in the S&P 500? In other words, do some industries have a significantly higher proportion of companies in the S&P 500 than others?
Since both variables are categorical, we can implement a chi-square independence test. H0: INDUSTRY and SP are independent. Ha: they are not.
The 15x2 contingency table of counts is
SP500Industry / No / Yes / Grand Total
Business Products / 129 / 25 / 154
Computer Services / 65 / 5 / 70
Computer Software / 75 / 13 / 88
Electronics / 84 / 12 / 96
Financial Services / 149 / 15 / 164
Financial Services Regional Banks / 128 / 13 / 141
Financial Services Securities / 56 / 15 / 71
Healthcare Products / 92 / 15 / 107
Insurance Property & Casualty / 64 / 10 / 74
Management Services / 68 / 4 / 72
Petroleum & Coal Extraction / 93 / 17 / 110
Pharmaceuticals / 132 / 15 / 147
Real Estate Investment Trusts / 112 / 14 / 126
Semiconductors / 76 / 15 / 91
Utilities Electric / 37 / 31 / 68
Grand Total / 1360 / 219 / 1579
The expected counts given H0 and distances are as follows:
Expected given Independence / 132.6 / 21.460.3 / 9.7
75.8 / 12.2
82.7 / 13.3
141.3 / 22.7
121.4 / 19.6
61.2 / 9.8
92.2 / 14.8
63.7 / 10.3
62.0 / 10.0
94.7 / 15.3
126.6 / 20.4
108.5 / 17.5
78.4 / 12.6
58.6 / 9.4
Distances / 0.1 / 0.6
0.4 / 2.3
0.0 / 0.1
0.0 / 0.1
0.4 / 2.6
0.4 / 2.2
0.4 / 2.7
0.0 / 0.0
0.0 / 0.0
0.6 / 3.6
0.0 / 0.2
0.2 / 1.4
0.1 / 0.7
0.1 / 0.4
7.9 / 49.3
Calculated Chi-squared statistic / 77.0
CHIDIST(77.0,14) = / 1.02326E-10
= CHISQ.TEST(O,E) / 1.02326E-10
The calculated chi-squared is 77.0 (most of which came from the large distance between number of observed and expected electric utilities in the SP. The p-value is close to zero and we reject H0. There is a statistically significant relationship between INDUSTRY and SP. In particular, electrical utility companies tend to be bigger than companies in other industries (and make the SP).
3. One might expect that CEO’s of companies in the S&P500 would be more highly compensated than CEOs of companies not in the S&P. Do the data support that expectation?
Since CEO compensation is numerically scaled and there are two groups/samples (SP and not SP), we will use a t-test: two samples assuming equal variances. H0: μ(sp) = μ(not sp), and Ha: μ(sp) > μ(not sp). This will be a one-tailed test given the question suggests that CEO’s of SP companies get paid more.
Using a pivot table to get the compensation variable in two columns allowed us to use the DATA ANALYSIS tool “t-test: Two-sample assuming equal variances”. I made the SP group input variable 1 so that I expect the t-stat to come out positive.
I
The t-stat did come out positive given that the sample mean compensation was higher for SP CEOs ($10,210,498 compared to only $2,953,603). The calculated t-stat of 15.4 is statistically significant (p-value essentially zero). We soundly reject H0 in favor of Ha.
4. One might expect that CEO compensation varies across industries. Can we say, based on our data, that there are significant differences in average CEO compensation across industries?
Since CEO compensation is numerically scaled and there are more than two industries, we will use ANOVA single factor. H0 is that mean CEO compensation is equal for all industries. Ha is that the mean compensations are not all equal. Using a pivot table to get the compensation data into columns by industry, we use DATA ANALYSIS and the ANOVA tool.
Sample mean compensation amounts vary across the 15 industries. Below is a sorted list showing that FINANCIAL SERVICES SECURITIES CEO’s had the highest sample means, and FINANCIAL SERVICES REGIONAL BANKS the lowest. So if you plan on become a CEO someday, be certain to ask “exactly what kind of financial services are we talking about?”
Groups / Count / Sum / Average / VarianceFinancial Services Securities / 71 / 646471012 / 9105226 / 3.75992E+14
Insurance Property & Casualty / 74 / 448208475 / 6056871 / 6.30571E+13
Utilities Electric / 68 / 403313456 / 5931080 / 3.01583E+13
Petroleum & Coal Extraction / 110 / 544501311 / 4950012 / 6.70321E+13
Computer Software / 88 / 422630878 / 4802624 / 9.96672E+13
Business Products / 154 / 702071947 / 4558909 / 2.00607E+13
Pharmaceuticals / 147 / 596633997 / 4058735 / 3.45832E+13
Healthcare Products / 107 / 346190816 / 3235428 / 2.46378E+13
Real Estate Investment Trusts / 126 / 398055783 / 3159173 / 1.22643E+13
Electronics / 96 / 289442836 / 3015030 / 1.62938E+13
Financial Services / 164 / 476086651 / 2902967 / 3.65357E+13
Semiconductors / 91 / 254076942 / 2792054 / 6.60385E+12
Management Services / 72 / 198959476 / 2763326 / 1.00154E+13
Computer Services / 70 / 185310339 / 2647291 / 8.99307E+12
Financial Services Regional Banks / 141 / 341045617 / 2418763 / 9.83007E+12
The fact that sample means vary across industries is not surprising. What we want to measure is whether the observed variation could have happened by chance. The calculated p-value of2.34E-11 says no. We reject H0. The differences in sample mean compensation amounts across industries are decidedly statistically significant.
Remember, statistical significance results for a combination of large variation in sample means and large samples sizes. Here we have both.