Correlation and Linear Regression Using Excel

Correlations

Correlation is a measure of the strength of linear association between two variables. It ranges from -1 to +1 with values close to -1 or +1 indicating a strong linear relationship and values close to 0 indicating a weak relationship.

  1. Create the spreadsheet shown to display the percentages obtained by 5 students in Physics and Chemistry.
  1. Select Tools/Data Analysis/Correlation and click OK. (The Data Analysis Toolpak add-in must be installed.)
  2. In the InputRangebox enter B1:C6 and select the Labels in first row option.
  3. Choose the Output Range option and in the OutputRange box enter A8 and click OK.

The output will appear as shown. This gives the correlation between Physics and Chemistry as being 0.826902 and the correlation between Physics and itself as being 1 and Chemistry and itself as being 1. (The values will be 1 as any variable is in perfect linear relationship with itself.)

This sort of table output is useful as we can deal with correlations between more than two variables.

  1. Enter a third column of results with percentages for Biology and create a new table of correlations.

The table will give all the correlations between variables taken two at a time. These are called pairwise correlations and the table is called the correlation matrix.

Linear Regression

We now want to find the linear regression of Chemistry on Physics. Linear regression is the process of fitting the best possible straight line through a series of points.

  1. Select Tools/Data Analysis/Regression and click OK.
  2. Select the Chemistry column to be the Input Y Range and the Physics column to be the Input X Range.
  3. Make sure that the Labels box in the top part of the dialog box and the Line Fit Plots box in the lower part of the dialog box are both ticked, but that none of the other boxes are ticked.
  4. Choose the Output Range option and enter F1 in the OutputRange box. Click OK.

A large amount of summary output will be produced including a chart. Multiple R is the square root of R square, R square is the square of the correlation coefficient and Standard Error is the estimate of the standard deviation of the errors e in the model. The number of observations is the original number of data points.

  1. Scroll to the right to see the chart (which was produced because you selected Line Fit Plots). Select one of the Predicted Chemistry points by clicking on it.
  2. To change the individual points to a line:Right click and select Format Data Series. Make sure that the Patterns tab is selected and change the Line option to Automatic and the Marker option to None. Click OK.