Lecture 6 FRIDAY SEPTEMBER 12, 2008

SCATTERPLOTS AND CORRELATION, CHAPTER 4, PAGE 90

We are now looking at two QUANTITATIVE variables to determine if there is a relationship between them.

The RESPONSE variable measures the outcome of a study.

The EXPLANATORY variable explains or influences the changes is the response variable.

There could be more than one explanatory variable.

SCATTERPLOT: An X-Y graph used to display the relationship between the Response variable and the Explanatory variable.

Examples Vehicle MPG vs weight

Heating Cost for a house vs outside temp

Air conditioning cost vs outside temp

The Response variable goes on the Y axis

The Explanatory variable goes on the X axis.

Every point on the scatterplot represents one pair of ( x-y ) data.

INTERPRETING A SCATTERPLOT PAGE 94

1) Look at the overall pattern. Look for points which do not fit the pattern, ie, striking deviations.

2) Look at the DIRECTION, + or -.

A + direction is a pattern that goes upward to the right, ie, as X increases, Y increases

A – direction is a pattern that goes downward to the right, ie, as X increases, Y decreases

3) FORM, Straight line pattern or Curved pattern

Straight line patters are called linear patterns.

4) STRENGTH: Strong: Points fit some kind of a line, straight or curved , with small deviations.

Weak: Points fit a line, straight or curved, loosely, ie, large deviations.

Strength is a subjective term, not clearly defined. Other words like Moderate might be used.

CATEGORICAL VARIABLES can be shown on a scatterplot by using symbols for the data points.

Example: Heating Gas vs Outside Temperature

One symbol is used for OLD furnace

Another symbol is used for NEW furnace.

Comparison between the old furnace and the new furnace is easy to see.

EXCEL CAN MAKE SCATTERPLOTS VERY EASILY

CORRELATION P 99

The strength of a LINEAR relationship is quantified by the value of

r, the correlation

The value of r can vary from -1 to +1 and everything between these extremes.

When r = -1 or +_1 , all the points lie exactly on a straight line.

When r is between -1 and +1, the points do not fall exactly on a straight line.

The closer r gets to 0, the larger the deviations from a straight line become

The closer r gets to -1 or +1 the stronger the linear relationship

FORMULA FOR r

Your calculator will do this for you.

FACTS ABOUT r, correlation

1)  r only refers to linear relationships, never to curved relationships

2)  r does not care which variable is on the X axis and which is on the Y axis.

You can switch the X and Y variables and the correlation will be the same.

3)  r has no units. You can change the units of the X or the Y variables or both and the correlation will not change.

4)  The sign of r is always the same as the sign of the algebraic slope of the line.

5)  The correlation, r, is strongly affected by extremely large or extremely small values.

SUMMARY of Chapter 4 is on P104