Pearson's Correlation Coefficient, r

Correlation is a technique for investigating the relationship between two quantitative, variables, for example, Age and IQ or Height and Weight.

You have already met Spearman’s Correlation coefficient which measures the strength of the correlation in the RANKING of two sets of data.

Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables.

The first step in studying the relationship between two variables is to draw a scatter graph of the variables to check for linearity. The correlation coefficient should not be calculated if the relationship is not linear (i.e following a straight line).

Pearson's Correlation Coefficient, measures of how close the scatter of points is to a straight line and is a number between -1 and 1 as shown in these diagrams:

Values of Pearson's correlation coefficient

Pearson's correlation coefficient (r) ranges from -1 to +1:

r=-1 / / data lie on a perfect straight line with a negative slope
r=0 / / no linear relationship between the variables
r=+1 / / data lie on a perfect straight line with a positive slope

The easiest way to get a value of Pearson's Correlation Coefficient is to plot a scatter graph using Excel and put on a line of best fit (add trendline). If you then right click on your line and use ‘options’ you will be able to put the equation of the line of best fit and the ‘r squared’ value on your graph.

You now need to square root the ‘r squared’ value and you have Pearson’s Correlation Coefficient. It is reasonable to describe a value of around 0.8 or more as strong correlation, 0.6 as moderate, 0.4 as weak etc.

Don’t forget to use you equation of the line of best fit to predict missing data items – and the value of Pearson’s to indicate how reliable you think your prediction will be.