C22.0103
FINAL EXAM
Name:______

Write your answers to the first five questions on the attached sheets, in the spaces provided. Circle the choice which best answers questions 6-15. Do not write anything else on this page (besides your name and the circles). When you are finished, hand in the entire exam (both question sheets and answer sheets). Please do not remove any pages from the exam paper. There are 15 questions, each worth 5 points. Everyone receives 25 points for free. Good Luck!

1)WRITTEN 11) (A) (B) (C) (D) (E)

2)WRITTEN 12) (A) (B) (C) (D) (E)

3)WRITTEN 13) (A) (B) (C) (D) (E)

4) WRITTEN 14) (A) (B) (C) (D) (E)

5) WRITTEN 15) (A) (B) (C) (D) (E)

6)(A) (B) (C) (D) (E)

7)(A) (B) (C) (D) (E)

8)(A) (B) (C) (D) (E)

9)(A) (B) (C) (D) (E)

10)(A) (B) (C) (D) (E)

Answer For Question 1:

Answer for Question 2:

Answer for Question 3:

Answer for Question 4:

Answer for Question 5:

C22.0103

FINAL EXAM

In Questions 1) - 5), we consider the response variable of Hotel and Restaurant Employment for Costa Rica(in thousands of employees) for each year from 1995 to 2008, together withthe following three explanatory variables: Tourists Arriving (in thousands), GDP (in millions of US Dollars), and Year.

1)Here is the linear regression output for the simple regression of Hotel and Restaurant Employment on Tourists Arriving.

Regression Analysis: Hotel and Restaurant Employment versus Tourists Arriving

The regression equation is

Hotel and Restaurant Employment = 26.2 + 0.0412 Tourists Arriving

Predictor Coef SE Coef T P

Constant 26.173 6.845 3.82 0.002

Tourists Arriving 0.041155 0.005095 8.08 0.000

S = 8.07098 R-Sq = 84.5% R-Sq(adj) = 83.2%

Analysis of Variance

Source DF SS MS F P

Regression 1 4249.5 4249.5 65.24 0.000

Residual Error 12 781.7 65.1

Total 13 5031.2

A) Based on this output, discuss the impact of an additional 2000 tourists arrivingin Costa Rica in a given year on Hotel and Restaurant Employment. (2 Points)

B) Test the null hypothesis that the true coefficient of Tourists Arriving in thismodel is .03. Use a two-tailed alternative hypothesis, and a significance level of.05. (3 Points).

2)Here is the fitted line plot for the simple regression in Question 1.

A)The data point furthest to the right corresponds to the year 2008, and has a leverageof 0.34 and a Cook's D of 0.86.Does this give us cause for concern as to the validity of the regression model? (2 points).

B)Is there anything about the fitted line plot, or the plot of residuals from this regressionversus year (see below) that gives us cause for concern as to thevalidity of the regression model? (3 points).

3) Here is the regression output using all three explanatory variables.

Regression Analysis: Hotel and Restaurant Employment versus Tourists Arriving, GDP, Year

The regression equation is

Hotel and Restaurant Employment = - 10404 + 0.0162 Tourists Arriving - 0.197 GDP+ 5.24 Year

Predictor Coef SE Coef T P

Constant -10404 2538 -4.10 0.002

Tourists Arriving 0.01624 0.01952 0.83 0.425

GDP -0.1967 0.1258 -1.56 0.149

Year 5.244 1.275 4.11 0.002

S = 5.11627 R-Sq = 94.8% R-Sq(adj) = 93.2%

Analysis of Variance

Source DF SS MS F P

Regression 3 4769.5 1589.8 60.74 0.000

Residual Error 10 261.8 26.2

Total 13 5031.2

A)Based on this output, is there evidence of a positive relationship between Tourists Arriving and Hotel and Restaurant Employment? (2 points).

B)Use the output above to compute the p-value in testing the null hypothesis that the truecoefficient of GDP is zero versus the alternative hypothesis that the true coefficientis positive. (2 points).

C)Do the F-statistic and its associated p-value indicate that all variables should beincluded in the regression? (1 point).

4) Next, we omit Year from the regression. For the regression based on Tourists Arriving and GDP, the output is as follows.

Regression Analysis: Hotel and Restaurant Employment versus Tourists Arriving, GDP

The regression equation is

Hotel and Restaurant Employment = 32.2 + 0.0667 Tourists Arriving - 0.216 GDP

Predictor Coef SE Coef T P

Constant 32.232 8.744 3.69 0.004

Tourists Arriving 0.06667 0.02376 2.81 0.017

GDP -0.2161 0.1966 -1.10 0.295

S = 8.00209 R-Sq = 86.0% R-Sq(adj) = 83.5%

Analysis of Variance

Source DF SS MS F P

Regression 2 4326.8 2163.4 33.79 0.000

Residual Error 11 704.4 64.0

Total 13 5031.2

Is this model preferable to the full model in Question 3? Justify your answer.

(5 points).

5) In the regression output in Question 4 above, note that the estimated coefficientfor Tourists Arriving is closer to zero than the estimated coefficient for GDP (since |.06667| < |−.2161|). How, then, do you explain the fact that the t-statistic for Tourists Arriving is further from zero than the t-statistic for GDP? (5 points).

Questions 6-15 are general and do not pertain to the regression example above.

6) In a multiple regression context, suppose that we have three available explanatory variables. Suppose that we run three regressions. The first regression uses variables 1 and 2, and producesan of .65. The second regression uses variables 2 and 3, and yields anof .70. The thirdregression uses variables 1 and 3, and produces an of .75. Whichmodel is preferable, according to AICc?

A) The model with variables 1 and 2.

B) The model with variables 2 and 3.

C) The model with variables 1 and 3.

D) It cannot be determined from the available information.

7) Consider a simple linear regression of y on x, where the y-values are not all the same.Suppose that the residuals all take the same value. Then:

A) must be 1. B) may be less than 1 C) It cannot be determined from the available information

8) Suppose that a simple linear regression model holds for a data set with n=20.

What is the probability that the sample mean of the (unobservable) errors is more than .2969 times thesample standard deviation of the errors?

A) .0918 B) .100 C) .1836 D) .200 E) None of the Above.

9) Suppose we are going to use a t-test to test the null hypothesis

versus the alternative hypothesis .Assume that the null hypothesis is true and the population is normally distributed. What is the probability that the right-tailed p-value will be less than .01?

A) .005 B) .99 C) .995 D) .01 E) None of the Above

10) In a sample of size 10 from a normal population, the sample mean is 2 and the samplestandard deviation is 3. Construct a 95% confidence interval for the population mean, μ.The interval is:

A) (−.146, 4.146) B) (.141, 3.859) C) (−3.88, 7.88) D) (−.114, 4.114) E) None of the Above.

11) We will look here at the results of a very large trial of an HIV vaccine. The trial was conducted on 16,400 people in Thailand, all of whom were HIV negative at the start of the trial. Half of the people received a placebo, and

half received the vaccine. Both groups were followed for three years afterwards. Of the 8,200 who received the vaccine, 51 developed HIV. Of the 8,200 who received the placebo, 74 developed HIV. Here are the results from Minitab's2-proportions.

Test and CI for Two Proportions

Sample X N Sample p

1 51 8200 0.006220

2 74 8200 0.009024

Difference = p (1) - p (2)

Estimate for difference: -0.00280488

95% upper bound for difference: -0.000571046

Test for difference = 0 (vs < 0): Z = -2.07 P-Value = 0.019

If the vaccine were actually ineffective, what would be the probability of observing at least as biga reduction as seen here in the HIV rate for the vaccine compared to the placebo?

A) .0095 B) .019 C) .038 D) .981 E) None of the Above

12) In simple linear regression, if the right-tailed p-value for the coefficient of theexplanatory variable is .5, then the must be

A) .5 B) .25 C) 1 D) 0 E) None of the Above

13) Suppose that an automobile manufacturer has been notified by owners that a certain modelhas a sticky accelerator pedal. To investigate these claims, the company wants to perform their own laboratory tests, based on a random sample of n automobiles of the given model.If 1% of the automobiles in the population have the sticky accelerator pedal, what is the smallestvalue of n that the company should use for their sample size to guarantee a probability of at least90% that at least one of the automobiles in the sample has a sticky accelerator pedal?

A) 10 B) 120 C) 230 D) 550 E) None of the Above.

14) Based on a sample of size 10 from a normal distribution, suppose you want to test thenull hypothesis that the population mean is zero against a right-tailed alternativehypothesis. The sample mean is 1.0386 and the sample standard deviation is 1.452. Thenthe p-value is:

A) .0238 B) .05 C) .0119 D) .025 E) None of the Above

15) Consider a game where a fair coin is tossed four times, independently. If all four tossesare heads, you win $10. Otherwise, you lose $1. If you are going to play this game once,what is your expected profit?

A) $4 B) 31.25 Cents C) −31.25 Cents D) −$4 E) None of the Above.