Correlation & Measuring Correlation

Correlation refers to the ______or ______between two variables.

There are several characteristics we consider when describing the correlation between two variables:

  1. Direction
  2. Positive
  3. For a generally ______trend, we say that the correlation is ______. An increase in the independent variables means that the dependent variable generally ______.
  4. Negative
  5. For a generally ______trend, we say that the correlation is ______. An increase in the independent variable means that the dependent variable generally ______.
  6. No correlation
  7. For randomly scattered points with no upward or downward ______, we say there is ______.
  8. Examples. Identify the direction of the following scatter plots as positive, negative, or no correlation:
  1. Linearity
  2. Linear
  3. The points approximately form a ______.
  4. Non-linear
  5. The points do ______form a straight line.
  6. Examples. Identify the following scatter plots as linear or non-linear:

linear
  1. Strength
  2. Strength describes how closely the data follows a pattern or trend.
  3. ______correlation
  4. ______correlation
  5. ______correlation
  6. Examples. Identify the strength of the following scatter plots as strong, moderate, or weak:

Example 1. The manager of a recreation park thought that the number of visitors to the park was dependent on the temperature. He kept a record of the temperature and the number of visitors over a two-week period. Plot these points on a scatter plot. Then comment on the type of correlation (strength & direction).

Temp. (˚C) / 16 / 22 / 31 / 19 / 23 / 26 / 21 / 17 / 24 / 29 / 21 / 25 / 23 / 29
# of visitors / 205 / 248 / 298 / 223 / 252 / 280 / 233 / 211 / 258 / 295 / 229 / 252 / 248 / 284
/ Type of correlation:

Example 2. A Math Studies student wanted to check if there was a correlation between predicted height of daisies and their actual height. Draw a scatter plot to illustrate the data. Then comment on the correlation.

Predicted height (cm) / 5.3 / 6.2 / 4.9 / 5.0 / 4.8 / 6.6 / 7.3 / 7.5 / 6.8 / 5.5 / 4.7 / 6.8 / 5.9 / 7.1
Actual height (cm) / 4.7 / 7.0 / 5.3 / 4.5 / 5.6 / 5.9 / 7.2 / 6.5 / 7.2 / 5.8 / 5.3 / 5.9 / 6.8 / 7.6
/ Type of correlation:

Outliers. Be sure to observe and investigate any outliers, or isolated points which do not follow the trend formed by the main body of data.

Correlation vs. causation. ______between two variables does not necessarily mean that one variable ______the other. ______! For two variables to have a causal relationship, one action must ______ another. For example, smoking can cause lung cancer. For two variables to be correlated just means that there is a ______between the variables. For example, blue eyes are correlated with blonde hair.

If one action causes another, then they are also correlated. But if two things are correlated, it does not mean that one causes the other. For example, there could be a strong correlation between the predicted grades that teachers give and the actual grades that the students achieve. However, the achieved grades are not caused by the predicted grades. Only if a change in one variable ______a change in the other variable can we say that there is a causal relationship. In cases where this is not apparent, then there is no justification based on high correlation alone, to conclude that there is a causal relationship.

GDC. Enter the independent variable in L1 and the depending variable in L2. Go to “2nd” “y=” (stat plot). Turn the stat plot on. Then select the scatter plot option. Make sure your windows are appropriately sized.

Measuring correlation. We can try to determine correlation by observing a scatter plot to make a judgment as to how clearly the points form a linear relationship. This accurate can be quite inaccurate, so to achieve a more precise measure of the strength of linear correlation between two variables, we can use Pearson’s correlation coefficient, r.

For a set of n data given as ordered pairs

Pearson’s correlation coefficient is

Where and are the means of x and y respectively, and Ʃ means the sum overall the data values.

You do not have to learn this formula! Instead, we will make our GDC do the work for us 

-Turn diagnostics on! (“2nd” > “0” then scroll down to “diagnosticon”)

-Enter the independent variable in L1 and the dependent variable in L2

-“Stat” > “Calc” > “LinReg (ax+b)”

-Enter L1 as the Xlist and L2 as the Ylist. “Calculate”

-Look for the “r” value

Interpreting r

  • The value of r will range from ____ to ____.
  • The sign of r indicates the ______of the correlation.
  • A ______r value indicates the variables are ______correlated. An increase in one variable will result in an ______in the other.
  • A ______r value indicates the variables are ______correlated. An increase in one variable will result in a ______in the other.
  • The size of r indicates the strength of the correlation. When r is between:
  • 0 and 0.1, there is ____ correlation
  • 0.1 and 0.5, there is a ______correlation
  • 0.5 and 0.8, there is a ______correlation
  • 0.8 and 1, there is a ______correlation

(FYI – the textbook uses different breakpoints.)

Example 3. The data given below for a first-division football league show the position of the team and the number of goals scored. Find the correlation coefficient, r, and comment on this value.

Position / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10 / 11 / 12 / 13 / 14 / 15 / 16 / 17 / 18 / 19 / 20
Goals / 75 / 68 / 60 / 49 / 59 / 50 / 55 / 46 / 57 / 49 / 48 / 39 / 44 / 56 / 54 / 37 / 42 / 37 / 40 / 27

Example 4. The heights and shoe sizes of the students at Nosredna High School are given in the table below. Find the correlation coefficient, r, and comment on this value.

Height (x cm) / 145 / 151 / 154 / 162 / 167 / 173 / 178 / 181 / 183 / 189 / 193 / 198
Shoe size / 35 / 36 / 38 / 37 / 38 / 39 / 41 / 43 / 42 / 45 / 44 / 46

Coefficient of determination, r2

To help describe the correlation between two variables, we can also calculate the coefficient of determination, r2. This is simply the correlation coefficient, r, squared. (When you square r, you take away the ______of the correlation.)

Given a set of bivariate data, we can find r2 on our GDC the same way we found r. Alternatively, if r is already known, we can simply square this value to find r2.

Interpretation of the coefficient of determination

If there is a causal relationship, then r2 indicates the ______to which ______in the ______variable explains ______in the ______variable.

For example, an investigation into many different kinds of oatmeal found that there is a strong positive correlation between the variables fat content and kilojoule content. It was found that r = 0.862 and r2 = 0.743. An interpretation of this r2 is:

______% of the variation in ______(the dependent variable) of oatmeal can be explained by variation in the ______(the independent variable) of oatmeal.

If 74.3% of the variation in kilojoule content of oatmeal can be explained by the fat content of oatmeal, then we can assume that the other 25.7% (100% - 74.3%) of the variation in kilojoule content of oatmeal can be explained by ______.

Example 5. At a father-son camp, the heights of the fathers and their sons were measured.

Father’s height (x cm) / 175 / 183 / 170 / 167 / 179 / 180 / 183 / 185 / 170 / 181 / 185
Son’s height (y cm) / 167 / 178 / 158 / 162 / 171 / 167 / 180 / 177 / 152 / 164 / 172

Calculate the coefficient of determination. Then interpret this value in context.