Chapter 4

The student will be able to:

1.  Use the scatter diagram and linear correlation coefficient to determine whether a linear relationship exists between two variables.

2.  Determine the regression line for bivariate data.

3.  Test hypotheses about correlation coefficients.

4.  Understand that correlated data may not have a causal relationship.

5.  Determine the best prediction relative to correlation.

Section 4.1 – Scatter Diagrams and Correlation

Objectives

  1. Draw and interpret scatter diagrams
  2. Describe the properties of the linear correlation coefficient
  3. Compute and interpret the linear correlation coefficient
  4. Determine whether a linear relation exists between two variables
  5. Explain the difference between correlation and causation

Objective 1 – Draw and interpret scatter diagrams

Univariate data – One variable

BiVariate data – Two variables

Credit Score Interest Rate (%)

545 19

595 18

640 12

675 9

705 7

750 5

The response variable is the variable whose value can be explained by the value of the explanatory or predictor variable.

A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.

Correlation – There is a correlation between two variables when one of them is related to the other in some way.


Example

Temperature / Cricket Chirps
83 / 1025
72 / 960
88 / 1200
84 / 1100
80 / 900
76 / 860
70 / 880
93 / 1180

Enter the data above into L1 and L2 and draw a scatter plot

Looking at a scatter diagram can help you determine if the variables have a linear relationship.

(pg. 192)


Two variables that are linearly related are positively associated when above-average values of one variable are associated with above-average values of the other variable and below-average values of one variable are associated with below-average values of the other variable. That is, two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.

As the explanatory variable goes up the response variable goes up and at a constant rate.

Two variables that are linearly related are negatively associated when above-average values of one variable are associated with below-average values of the other variable. That is, two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases.

Looking at a scatter plot of the data can help you determine if the two variables are positively associated, negatively associated, or have no association. (pg. 194)

Objective 2 - Describe the Properties of the Linear Correlation Coefficient

The linear correlation coefficient measures the strength and direction of the linear relation between two quantitative variables. The Greek letter ρ (rho) represents the population correlation coefficient, and r represents the sample correlation coefficient.

(round to 3 decimal places)

where

is the sample mean of the explanatory variable
sx is the sample standard deviation of the explanatory variable
is the sample mean of the response variable
sy is the sample standard deviation of the response variable
n is the number of individuals in the sample

Sample scatter plots with associated value for r

Objective 3 - Compute and Interpret the Linear Correlation Coefficient

The default for LinReg is the explanatory variable is in L1 and the response variable is in L2. If the explanatory and response variables are in different lists than L1 and L2 then enter the lists after LinReg, for example

LinReg (ax+b) L3, L4

NOTE: the formula chart does not mention LinReg. Also, r can be found using LinRegTTest (see instructions below)

Example

Find r for the following set of data.

Temperature / Cricket Chirps
83 / 1025
72 / 960
88 / 1200
84 / 1100
80 / 900
76 / 860
70 / 880
93 / 1180


Objective 4 - Determine whether a linear relation exists between two variables

Method 1: P-Value approach

Since the formula chart specifically mentions LinRegTTest, we will prefer the P-value approach instead of the critical value approach.

Note this approach is not listed in the book.

Method 2: Critical value approach

Following is the critical value approach which is the approach given in the book

Critical Values for Correlation Coefficient (Table II Appendix A from book)

n / Critical Value / n / Critical Value
1 / 0.997 / 21 / 0.413
2 / 0.950 / 22 / 0.404
3 / 0.878 / 23 / 0.396
4 / 0.811 / 24 / 0.388
5 / 0.754 / 25 / 0.381
6 / 0.707 / 26 / 0.374
7 / 0.666 / 27 / 0.367
8 / 0.632 / 28 / 0.361
9 / 0.602 / 29 / 0.355
10 / 0.576 / 30 / 0.349
11 / 0.555 / 40 / 0.304
12 / 0.532 / 50 / 0.273
13 / 0.514 / 60 / 0.250
14 / 0.497 / 70 / 0.232
15 / 0.482 / 80 / 0.217
16 / 0.468 / 90 / 0.205
17 / 0.456 / 100 / 0.195
18 / 0.444
19 / 0.433
20 / 0.423

Example

Assume that 20 pairs of data result in a value of r = 0.855. Is there a linear relation between x and y?

Example

Assume that 10 pairs of data result in a value of r = 0.601. Is there a linear relation between x and y?

Example

Is there a linear relationship between temperature and cricket chirps? Use the P-value approach and a = 0.05.

Temperature / 83 / 72 / 88 / 84 / 80 / 76 / 70 / 93
Cricket Chirps / 1025 / 960 / 1200 / 1100 / 900 / 860 / 880 / 1180


Example

American Black Bear – The American black bear is one of eight species in the world. It is the smallest North American bear and the most common bear species on the planet. In 1969, Dr. Michael Pelton of the University of Tennessee initiated a long-term study of the population in the Great Smokey Mountains National Park. One aspect of the study was to develop a model that could be used to predict a bear’s weight. One variable thought to be related was the length of the bear. The following data represents the lengths and weights of 12 American black bears.

Total Length (cm) / Weight (kg)
139 / 110
138 / 60
139 / 90
120.5 / 60
149 / 85
141 / 100
141 / 95
150 / 85
166 / 155
151.5 / 140
129.5 / 105
150 / 110

Does a linear relationship exist between the weight of the bear and it’s height? Use the P-value approach and a = 0.05.

Objective 5 - Explain the difference between correlation and causation

Note, do note read “causal” as “casual, ” not the same!

Causation can only come from designed experiments, not observational studies.

A lurking variable is related to both the explanatory and response variables. Two variables can be correlated without there being a causal relationship through a lurking variable.

Causation

If there is a significant linear correlation between two variables, then one of five situations can be true.

·  There is a direct cause and effect relationship

·  There is a reverse cause and effect relationship

·  The relationship may be caused by a third variable

·  The relationship may be caused by complex interactions of several variables

·  The relationship may be coincidental

Common Errors

There are some common errors that are made when looking at correlation.

·  Avoid concluding causation. Just because there is a linear relationship doesn't mean that one thing caused the other. It could be any of the five situations above.

·  Avoid data based on rates or averages. Variation is suppressed when using a rate or an average. Remember the central limit theorem? The variance of the sample means was the variance of the population divided by the sample size. So, if you work with averages, the variances are smaller and you might be able to find linear relationships that are significant when they would not be if the original data was used.

·  Watch out for linearity. All that we're testing here is the strength of a linear relationship. There are other kinds of relationships. In algebra, we talk about linear, quadratic, cubic, quartic, exponential, logarithmic, Gaussian (bell shaped), logistics, and power models. A scatter plot is a good way to look for patterns.

Section 4.2 – Least-Squares Regression

Objectives

  1. Find the least-squares regression line and use the line to make predictions
  2. Interpret the slope and the y-intercept of the least-squares regression line
  3. Compute the sum of squared residuals

Objective 1 - Find the least-squares regression line and use the line to make predictions

Once the linear correlation coefficient has indicated that a linear relationship exists between two variables, our next step is to find a linear equation that describes the relationship between the two variables.

The goal of this section is to find not just any linear equation, but the “best” linear equation that fits our data.

What does “best” mean?

We will define “best” in terms of residuals, or errors. A residual is the difference between an observed y-value (y) and predicted y-value (). The predicted y-value comes from the line we chose to represent the data.

From an example in the book

Residual =

Positive residuals indicate that a data point is above the line, i.e., above average

Negative residuals indicate that a data point is below the line, i.e., below average.

So the definition of “best” is to minimize the sum of the squared residuals

The line of best-fit or the least-squares regression line is the line that minimizes the sum of the squared residuals.

On the calculator it will be .

Relate this equation back to the slope-intercept form for a linear equation, , that you learned in Algebra, where m is the slope and b is the y-intercept.

x is called the predictor variable y is called the response variable

The good news is that the calculator will do all of the work for us. Use either LinReg or LinRegTTest to get the a and b values to form the least-squares regression line. Calculator instructions for these formulas was presented earlier.

Example

Use your calculator to find the least-squares regression line for the following set of data:

Temperature / 83 / 72 / 88 / 84 / 80 / 76 / 70 / 93
Cricket Chirps / 1025 / 960 / 1200 / 1100 / 900 / 860 / 880 / 1180

Predict the number of cricket chirps when the temperature is 90 degrees. 0 degrees. 105 degrees.

Example – American Black Bears

Total Length (cm) / Weight (kg)
139 / 110
138 / 60
139 / 90
120.5 / 60
149 / 85
141 / 100
141 / 95
150 / 85
166 / 155
151.5 / 140
129.5 / 105
150 / 110

Use your calculator to find the least-squares regression line.

Predict the weight of a bear if the length is 150cm. 160cm. 200cm

Objective 2 - Interpret the slope and the y-intercept of the least-squares regression line

The y-intercept of any line is the point where the line intersects with the vertical axis. Find the y-intercept by letting x=0 in the equation and solving for y.

To interpret the y-intercept, first ask two questions?

  1. Is 0 a reasonable value for the explanatory variable?
  2. Do any observations near x = 0 exist in the data set?

If the answer is no to either question, do not interpret the y-intercept.

Do not use the regression model to make predictions outside the scope of the model. That is, do not use the regression model for values of the explanatory variable that are much smaller or larger than the observed data.

The x-intercept is the rate of change, on average.

Example – Cricket Chirps

Temperature / 83 / 72 / 88 / 84 / 80 / 76 / 70 / 93
Cricket Chirps / 1025 / 960 / 1200 / 1100 / 900 / 860 / 880 / 1180

Interpret the slope and y-intercept of the least-squares regression line found earlier.

Example – American Black Bears

Total Length (cm) / Weight (kg)
139 / 110
138 / 60
139 / 90
120.5 / 60
149 / 85
141 / 100
141 / 95
150 / 85
166 / 155
151.5 / 140
129.5 / 105
150 / 110

Interpret the slope and y-intercept of the least-squares regression line found earlier.

13