ST. JOHN'S UNIVERSITY
NEW YORK
College of Business Administration
DS 2334ProjectDr. Nagel
Perform a simple linear regression on your data (at least 20 points) using Excel or MINITAB. Prepare a report containing the following:
1.A description of the problem you're investigating. Describe fully the independent and dependent variables. Include your expectations on β1 (positive or negative) and why. Describe how the sample was taken and the population from which it was taken.
2.The data collected. Give the names of the people or the source of the data.
3.Use Excel or Minitab to plot the dependent vs. the independent variable. Show the regression equation from the computer output. Plot the regression line on the Excel or MINITAB plot showing clearly the two points that your line is passing through and how they were obtained.
4.Interpret b1 (and b0 if possible) including units, and r2 adjusted. The interpretations should refer to your specific project.
5.Test for significance of β1 at α = 0.05 and at α = 0.01using the t value from the computer output, and the appropriate values from the t table. State clearly the null and alternate hypotheses. Show thecomputer table with the t values. State the table and calculated values of t and your conclusion.
6.State and compare the p value from the computer printout with α = 0.05 and α = 0.01. What do these comparisonsimply?
7.State and compare the calculated and table values of F for α = 0.05 and for α = 0.01. What do these comparisonsimply?
- Manually calculate a 95% and a 99% confidence interval on β1 using the computer’s values for sb1 and b1 and the table values of t and interpret them. Show the calculations
9.State clearly how the confidence intervals in part 8 is consistent with your conclusions in parts 5, 6 and 7.
10.What other quantifiable independent variables [at least 2] might be included to improve the model and what effect [+ or -] do you think each of them would have on Y?
NOTE: All interpretations should relate to your specific independent and dependent variables and their units. Be sure to name the Y and X columns when entering the data. Do not use Y or X in any of your interpretations.
To use Excel 2003, go to Tools, Data Analysisand Regression. To use Excel 2007, go to Data,
Data Analysis, and Regression.
For an example see below
DS 2334
NOTE: THE SAMPLE BELOW ONLY USES α = 0.05. You must use both α = 0.05AND
α = 0.01
- For my project, I’m comparing the amount of goals vs. the amount of wins for twenty NHL teams. I have listed the teams based on the order of their placement for the 2008-2009 Season. The first 16 teams are from the Eastern Conference, and the last 4 teams listed are from the Western Conference. I believe that by measuring the amount of goals per team, I can determine the amount of wins that the team will have. I expect that ß1will be positive, because as the number of goals increases, the number of wins should also increase (positive correlation). Therefore, the number of wins is dependent on the number of goals.
- This information was obtained from statistics listings.
Team / Goals For (X) / Wins (Y)
Boston / 274 / 53
Washington / 272 / 50
New Jersey / 244 / 51
Pittsburgh / 264 / 45
Philadelphia / 264 / 44
Carolina / 239 / 45
NY Rangers / 210 / 43
Montreal / 249 / 41
Florida / 234 / 41
Buffalo / 250 / 41
Ottawa / 217 / 36
Toronto / 250 / 34
Atlanta / 257 / 35
Tampa Bay / 210 / 24
NY Islanders / 201 / 26
San Jose / 257 / 53
Detroit / 295 / 51
Vancouver / 246 / 45
Chicago / 264 / 46
Calgary / 254 / 46
- SEE CHART
ŷ = b0 +b1x
=-17.61497458 + 0.242839728 (0)
=-17.61497458 Indicated by lower arrow
ŷ = b0 +b1x
=-17.61497458 + 0.242839728 (350)
= 67.37893 Indicated by upper arrow
I have plotted the regression line to run through these two points on the chart. The x values are 0 and 350, and the y values are -17.61497458 and 67.37893, respectively.
Regression StatisticsMultiple R / 0.71081438
R Square / 0.505257082
Adjusted R Square / 0.477771365
Standard Error / 5.882560542
Observations / 20
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95% / Upper 95.0%
Intercept / -17.61497458 / 14.08259866 / -1.250832677 / 0.227010745 / -47.20141643 / 11.97146728 / -643 / 11.97146728
Goals For / 0.242839728 / 0.056639195 / 4.287485501 / 0.000443227 / 0.123845195 / 0.361834261 / 195 / 0.361834261
From the regression data, we see that:
- b1= 0.242839728
This means that, for each additional goal, a team’s number of wins should increase by 0.242839728.
- b0 = -17.61497458
This is the expected number of wins (-17.61497458) if a team were to have 0 goals.
- r2 = 0.505257082
This is the percent of variation in wins that is explained by the number of goals.
- H0 : ß1 = 0
Ha: ß1≠ 0
t calc = 4.287485501
t table = 2.101 at α = 0.05
t calc > t table
REJECT H0at α = 0.05 State ttable at α = 0.01 as well and conclusion.
- p value = 0.000443227
α = 0.05 (0.025 double-sided)
p value < α
REJECT H0at α = 0.05. State conclusion at α = 0.01.
ANOVAdf / SS / MS / F / Significance F
Regression / 1 / 636.1186665 / 636.1186665 / 18.38253192 / 0.000443227
Error / 18 / 622.8813335 / 34.60451853
Total / 19 / 1259
f calculated = 18.38253192
f table = numerator – 1, denominator – 18, α = 0.05 = 4.41
f calc > f table
REJECT H0State Ftable at α = 0.01 as well and conclusion
8. b1 ± tSb1
0.242839728 ± (2.101)(0.056639195)
0.12384 < ß10.36184
REJECT H0
Include here a 99% confidence interval as well.
- My conclusion in part 8 is consistent with my conclusions in parts 5, 6, and 7, because there is no zero in the confidence interval. If the confidence interval began at a negative number, and then ended at a positive number, there would be a chance that H0 could be accepted because it would have to include zero. However, all numbers in the confidence interval are positive, therefore concluding that we should reject H0and conclude ß1 ≠ 0 and X has a significant effect on Y.
- Two other quantifiable variables that I could include to improve the model would be Goals Against (GA) and Payroll.
- The GA Statistics list the amount of goals that were scored on that particular team. I believe that the number of wins is dependent on the amount of goals that are scored on that team. If the number of goals scored on a team increases, then their number of wins should decrease, thereby showing a negative correlation.
- The Payroll statistics list the total amount of money any given team makes. The number of wins is definitely dependent on that team’s payroll, because better players are paid higher salaries. As the amount of payroll increases, then the number of a team’s wins will increase, thereby showing a positive correlation.