Correlation and Regression Analysis (Business/Economics Example)
The following table provides all the Inflation data from 1997 – 2007. The inflation rate is calculated from the Consumer Price Index which is complied by the Bureau of Labor Statistics and in based upon a 1982 Base of 100. Data showing rates to 1914 can be found at http://www.inflationdata.com/inflation.
n = 11 the number of years observed.
Year / 1997 / 1998 / 1999 / 2000 / 2001 / 2002 / 2003 / 2004 / 2005 / 2006 / 2007Rate % / 2.85 / 3.24 / 3.39 / 2.68 / 2.27 / 1.59 / 2.83 / 3.38 / 2.19 / 1.55 / 2.34
a) Is there a relationship between the measured variables?
b) Using excel, develop a scatter plot.
c) Find the regression line/line of best fit/least squares line that can be used to make predictions.
d) Graph the line found in part c to overlay the scatter plots.
Part a
Is there a relationship between the measured variables?
Step 1: Set up an excel sheet with the columns shown. There is also a sheet shown that includes to cell formulas to develop the table.
The x is redefined as the years since the base year 1997Historical Inflation 1997 - 2007 (http://www.inflationdata.com/inflation/)
Define x to be the years since the base year of 1997
Year / x / Rate %,
y / x^2 / y^2 / xy
1997 / 0 / 2.85 / 0 / 8.1225 / 0.00
1998 / 1 / 3.24 / 1 / 10.4976 / 3.24
1999 / 2 / 3.39 / 4 / 11.4921 / 6.78
2000 / 3 / 2.68 / 9 / 7.1824 / 8.04
2001 / 4 / 2.27 / 16 / 5.1529 / 9.08
2002 / 5 / 1.59 / 25 / 2.5281 / 7.95
2003 / 6 / 2.83 / 36 / 8.0089 / 16.98
2004 / 7 / 3.38 / 49 / 11.4244 / 23.66
2005 / 8 / 2.19 / 64 / 4.7961 / 17.52
2006 / 9 / 1.55 / 81 / 2.4025 / 13.95
2007 / 10 / 2.34 / 100 / 5.4756 / 23.40
/ 55 / 28.31 / 385 / 77.0831 / 130.60
Excel showing cell formulas
Year / x / Rate %,
y / x^2 / y^2 / xy
1997 / 0 / 2.85 / =B4*B4 / =C4*C4 / =B4*C4
1998 / 1 / 3.24 / =B5*B5 / =C5*C5 / =B5*C5
1999 / 2 / 3.39 / =B6*B6 / =C6*C6 / =B6*C6
2000 / 3 / 2.68 / =B7*B7 / =C7*C7 / =B7*C7
2001 / 4 / 2.27 / =B8*B8 / =C8*C8 / =B8*C8
2002 / 5 / 1.59 / =B9*B9 / =C9*C9 / =B9*C9
2003 / 6 / 2.83 / =B10*B10 / =C10*C10 / =B10*C10
2004 / 7 / 3.38 / =B11*B11 / =C11*C11 / =B11*C11
2005 / 8 / 2.19 / =B12*B12 / =C12*C12 / =B12*C12
2006 / 9 / 1.55 / =B13*B13 / =C13*C13 / =B13*C13
2007 / 10 / 2.34 / =B14*B14 / =C14*C14 / =B14*C14
/ =SUM(B4:B14) / =SUM(C4:C14) / =SUM(D4:D14) / =SUM(E4:E14) / =SUM(F4:F14)
Compute the correlation coefficient to determine the strength and direction of the relationship.
n = 11 x = 55 y = 28.31 x2 = 385 y2 = 77.0831 xy = 130.60
Ø A negative r = -0.5080 indicates that as x increases, y decreases.
Ø The strength on the correlation is “moderate”
Strength of CorrelationSize of r Interpretation
Note: These values could be positive and negative.
Only positive numbers are shown.
0.90 to 1.00 - very high
0.70 to 0.89 - high
0.50 to 0.69 - moderate
0.30 to 0.49 - low
0.00 to 0.29 - little, if any
Part b
Using excel, develop a scatter plot.
Hightlight columns for x and y. Then choose INSERT CHART, select XY scatterPlace the mouse arrow in the graph area, right click, then choose CHART OPTIONS to add titles and data labels on the points
Part c
Find the regression line/line of best fit/least squares line that can be used to make predictions.
Least-squares line ( is pronounced y –hat)
where a is the intercept and b is the slope.
Sample mean for x: Sample mean for y:
Slope
Intercept
The regression line is = -0.0995x + 3.0711
(note the equation produced by excel may vary slightly due to rounding)
Part d
Graph the line found in part c to overlay the scatter plots.
Place the mouse arrow on any point, right click and choose Add Trendline. Select Type LINEAR and OPTIONS to Display equation on chart and R-squared. When the line is displayed on the graph you can right click on the equation to format the font and decimal places.