MAR 5621: Advanced Managerial Statistics

Assignment #1

(due in class Thursday, March 25)

This assignment may be done either individually or in teams of 2 people. (Teams should turn in only one paper, with both members’ names on it.)

Note: When reporting numerical results from Excel output, please round off the excessive decimals (2 or 3 significant digits are usually more than enough).

Grading: Unless otherwise stated, all problems are worth 2 points per letter.

***********************************

Problem 1: Age and News Viewing

The News Division at Big Brother Media is interested in determining if there is a relationship between the age of an individual (in years) and the number of hours of TV news they watch per week (in hours). A random sample of 50 individuals is interviewed and each reports his/her age and the amount of TV news they watch per week. A summary of the data is as follows:

Age (years) / News Viewing (hours/week)
Mean / 36.0 / 8.0
Standard Deviation / 10.5 / 3.5

Correlation coefficient r = 0.57

(a) Determine the equation of the regression line for predicting the amount of News Viewed, using Age as the predictor.

(b) How much of the variability in News Viewing is explained by factors other than Age? (That is, how much variability is not explained by Age)?

(c) Determine SSTotal, SSRegression, and SSResidual for this regression.

(d) (3 pts) Suppose we discover a new data point that we had forgotten to include in the analysis: Theodore is 18 years old and watches 14 hours of news per week. If we add Theodore to the data set, what will happen to each of the following quantities? For each quantity, determine whether it will increase, decrease, stay the same, or whether it is impossible to determine. (Hint: no formal calculations are needed, but drawing a sketch would be helpful.)

(i) The slope of the regression line

(ii) The y-intercept of the regression line

(iii) The correlation coefficient

Problem 2: Airlines

The Airline tab of the ASSIGN1.xls file contains data on the on-time performance of 10 major American airlines. On-time performance is defined as the percentage of flights that arrive on time. One column of data is the on-time performance for one month (October 1995). The second column of data is the on-time performance for the following year (Nov 1995 – Oct 1996). Suppose we’re interested in how well we can forecast future on-time performance based on one month’s worth of on-time performance data.

(a)Identify the appropriate independent (X) and dependent (Y) variables for this analysis. Generate a scatterplot of Y plotted against X. Comment on the relationship shown in the scatterplot. Guess the correlation between X and Y by eyeballing the scatterplot, and then compare your guess to the actual value. Does this appear to be a strong or a weak relationship?

(b)From eyeballing the scatterplot, guess the standard deviation of X and the standard deviation of Y. Compare the two guesses to the actual values (you can use the Excel STDEV function). Is there a notable difference between the variability of X and the variability of Y? Why or why not?

(c)Determine the regression equation for predicting Y from X.

(d)Determine the predicted value and the residual for Delta Airlines. How well does the regression equation do in predicting Delta’s performance?

(e)Based on these data, what can we conclude about consistency in airline on-time performance?

Problem 3: Magazines

The Magazines tab of the ASSIGN1.xls file contains data on the advertising costs and audience characteristics of 55 magazines in spring 1996. The cost of advertising is quite different from one magazine to another. What characteristics of a magazine’s audience can account for differences in the cost of advertising?

(a)Identify the appropriate dependent variable (Y) and list the various possible independent variables in the data set. Before doing any analysis, which variables do you think are the most important?

(b)Generate a correlation matrix for these variables. Based on this correlation matrix, which variable appears to be the most effective in predicting Y?

(c)Generate and look at appropriate scatterplot(s) in order to check your answer to part (b). Does the relationship between Y and the best predictor appear to follow a straight line? Are there extreme or outlying observations that substantially distort the relationship?

(d)Regress Y on the best predictor identified in parts (b) and (c). Report the regression equation and interpret the meaning of the slope in simple non-jargony English.

(e)Based on this regression, determine the predicted cost of advertising, and the error of prediction, for People magazine.

(f)Generate a 95% confidence interval for the population regression slope. Interpret this interval in a sentence or two.

(g)How big is the scatter of the advertising costs around the regression line? Report and interpret (in simple non-jargony English!) a quantitative measure of the size of this scatter.

Problem 4: Moneyball

Consider the following passage (and surrounding paragraph from Moneyball, bottom p. 67):

“The statistics were not merely inadequate; they lied. And the lies they told led the people who ran major league baseball to misjudge their players, and mismanage their games. James later reduced his complaint to a sentence: fielding statistics made sense only as numbers, not as language. Language, not numbers, is what interested him.”

(a) What does this mean, and why is it relevant for a class on statistics? Discuss briefly in a short paragraph or two.