EXPLORATION

Plot one point on the graph and then click Show Line. Why do you think a line is not graphed?

Clear the graph and plot twopoints that have whole-number coordinates.

  • On your own, find an equation for the line through these twopoints.
  • Click Show Line. Compare the equation for the line drawn to the equation that you calculated. Explain and resolve any differences.

Clear the graph and plot threepoints. Think about a line that "fits" these three points as closely as possible.

  • Is it possible for a single straight line to contain all three of the points you plotted?
  • On a piece of paper, plot these same threepoints, and sketch a line that you think best fits the threepoints.
  • Click Show Line. Do you think that the line graphed fits the points well? How does it compare to the line you drew?

Clear the graph. Place several points on the graph that lie roughly in a straight line, then hit Show Line. The line that appears is the regression line, which is sometimes known as the "line of best fit."

  • What is the r-value for the line?
  • Place just one additional point on the graph that lies far away from the line. What effect does this point have on the rvalue? What effect does it have on the line of best fit?
  • Move several of the points. How does the r-value and line change as points are moved?

The line that is drawn is called the "least-squares regression line." Bascially, the least-squares regression line is the line that minimizes the squared "errors" between the actual points and the points on the line. This makes the line fit the points. To get a better feel for the regression line, try the following tasks.

  • Plot fourpoints so that the regression line is horizontal. Do this in several different ways. What do you notice about the regression line and the rvalue?
  • Plot threepoints (not all on a straight line) so that the regression line is horizontal. What do you notice about the regression line and the rvalue?

Explore the Relationship Between Correlationand Linear Association

Use the interactive math applet below to help you answer these questions:

  1. Compare the r-values for the following three situations.
  1. Create a scatterplot that you think shows a strong positive linear association between the two variables. Sketch a picture of this scatterplot. What is the r-value?
  1. Create a scatterplot that you think shows a strong negative linear association between the two variables. Sketch a picture of this scatterplot. What is the r-value?
  1. Create a scatterplot that you think shows no linear association between the two variables. Sketch a picture of this scatterplot. What is the r-value?

2. For each r-value below, create a scatterplot that has that exact r-value. Sketch a picture of that graph.

  1. r = 1
  1. r = -1
  1. r =0

3. Plot several points that exhibit a strong positive linear trend, and then plot one outlier.

  1. Overall, is this scatterplot roughly linear?
  1. Is the r-value close to 1?
  1. What is the r value?

4. In the lower left corner of the coordinate plane, plot 10 points that exhibit no trend (this is sometimes called a "cloud" of points). Then plot one point in the upper right corner.

  1. Overall, is this scatterplot linear?
  1. Is the r-value close to 1?

c. What is the r-value?

  1. Does a high r-value necessarily mean that the data are generally linear?
  1. Does an r-value close to zero always mean that the data are not linear?

Journal: Explain what you have learned about r, the correlation coefficient.

SUMMARY

Pearson Correlation Coefficient

An important question that comes up in determining a curve to fit our data points is: How scattered can the points be and still have a shape that can be represented by a curve? The idea of correlation helps to measure this. The value r is Pearson's correlation coefficient. It is a measure of the linear association between the horizontal variable and the vertical variable. It gives information about how tightly packed the data points are about the regression line. It thereby also gives information about how well the regression line fits the data. The r-values can range from -1 (strong negative linear association) to 0 (no linear association) to +1 (strong positive linear association). But beware! The correlation coefficient, r, is sometimes misleading. You should always look at the scatterplot and combine that knowledge with the r-value in order to draw valid conclusions about the strength of the linear association.

The moral is that the correlation coefficient, r, is a valuable tool for studying the linear association between two variables, but it does not fully explain the association (in fact, no statistic does).