Chapter 5: Relationships Between Quantitative Variables

3 Tools Used to Describe, Picture, and Quantify the Relationship Between Two Quantitative Variables:

  1. Scatterplot: A Two-Dimensional Graph of Data Values
  2. Correlation: A statistic that measures the strength and direction of a linear relationship between two quantitative variables.
  3. Regression Equation: An equation that describes the average relationship between a quantitative response variable and an explanatory variable.

Section 5.1 Looking for Patterns with Scatterplots

*Plot the Response Variable (Dependent Variable, y-variable) on the y-axis.

*Plot the explanatory variable (x-variable) along the x-axis.

Positive Association: Two variables have a positive association when the values of one variable tend to increase as the values of the other variable increase.

Ex. Plastic (lb) and Household Size

Size / 2 / 3 / 3 / 6 / 4 / 2 / 1 / 5
Plastic / .27 / 1.41 / 2.19 / 2.83 / 2.19 / 1.81 / .85 / 3.05

Negative Association: Values of one variable tend to decrease as the values of the other variables increase.

Ex. Years Owned (Car) vs. Value

Years / 1 / 5 / 10 / 8 / 6 / 3 / 2
Value / 14,000 / 9,000 / 6,000 / 7,000 / 8,000 / 11,000 / 12,500

Curvilinear Patterns: When a curve can be used to describe the pattern of a scatterplot better than a line.

Ex. Height of a golf ball hit by Tiger Woods vs. Horizontal distance traveled (ft).

Distance / 50 / 100 / 150 / 200 / 250 / 300 / 350
Height / 47 / 81 / 125 / 123 / 75 / 43 / 6

Indicating Groups Within Data on Scatterplots

*Should certain groups be analyzed separately

Ex. Male and Female Height and Weight

Outliers: Unusual combinations of data values.

Section 5.2: Describing Linear Patterns with a Regression Line

Regression Analysis: The area of statistics used to examine the relationship between a quantitative response variable and one or more explanatory variables.

Regression Equation: Estimation of an equation that describes how, on average, the response variable is related to the explanatory variable. Used to make predictions.

Regression Line: A straight line which represents the best equation for describing the relationship between two variables.

*Estimates the average value of y at specified x-values.

Equation is =b0+b1x or equivalently

=ax+b

** is the predicted y or estimated y

**b0 is the y-intercept (Only makes sense if x=0 is included in the range of observed x-values).

**b1 is the slope of the line describing the relationship. It represents the average increase in y for each one unit increase in x.

**We can utilize the calculator to help us out.

Ex. What is the value of a car that is 7 years old?

*Use LinReg and Store Key.

Statistical Relationships: If we know the value of one variable we can predict the value of the other variable. There may be variation from the average pattern.

Deterministic Relationships: If we know the value of one variable we can exactly determine the value of the other variable.

Extrapolation: Using a regression equation to predict values outside the range of the observed data.

**Risky-No guarantee that the relationship holds for values outside of our range of observed values.

Ex. Regression equations can predict the winning 100 meter dash times for the 2008 and 2012 Olympics. Should this same line be used to make predictions for the 3412 Olympics?

Residual (Prediction Error): The difference between the observed y-value and the predicted value (). Calculated as (y-).

Ex. What is the residual for the amount of plastic used by a family of 5.

Observed Value for a Family of 5: 3.05 lbs

Predicted Value for a Family of 5: 2.66 lbs

Residual=(Observed-Predicted)=(3.05-2.66)= .39

The Least Squares Criterion

The basis for estimating the equation of the regression line. (Least sum of the squares)

The Least Squares Line (Regression Line) is the line where the sum of the squared differences between the observed values of y and the predicted values is smaller for that line than any other line.

SSE=Sum of Squared Prediction Errors

Board Examples

Section 5.3: Measuring Strength and Direction with Correlation

Correlation: A number that indicates the strength and the direction of a straight-line relationship.

*Strength is determined by the closeness of the points to a straight line (R values close to 1 or -1)

*Direction is determined by whether one variable generally increases or generally decreases when the other variable increases.

(Only describes linear relationships).

*Also known as the Pearson Coefficient or the Pearson Product Moment Correlation*

*Formula*

Interpreting the Correlation Coefficient:

*Always Between 1 and -1

*R=1 or R=-1 Indicates a Perfect Linear Relationship and all of the data points fall on the same straight line.

*If R is (+) the two variables tend to increase together.

*If R is (-) then as one variable increases the other tends to decrease.

*If R=0 the best straight line through the data is exactly horizontal (knowing the value of x does not help predict the value of y).

What about R Squared?

A squared correlation, is also used to describe the strength of a relationship between two variables.

is always between 0 and 1.

describes the proportion of variation explained by x.

Ex. =.75 means that the explanatory variable explains 75% of the variation in the response variable.

Section 5.4: Why the Answers May not Make Sense

**Allowing outliers to overly influence the Results

**Combining Groups Inappropriately

**Using correlation and a straight-line equation to describe curvilinear data.

Influential Observations: Outliers with extreme x values. (Have the most influence on correlation and regression)

Ex. A set of data was collected to determine a relationship between height and weight of members of a statistics class. One man in the class was 68 inches tall and weighed 350 lbs.

The equation of the least squares line without his measurement was :

weight=5.55(Height)-202.3

The equation of the least squares line with his measurement was:

Weight=4.87(Height)-179.990

Using these lines to make predictions, the weight of a person who is 62 inches tall would be:

With Outlier: 141.8 lbs

Without Outlier: 121.95 lbs.

**It makes a 20 lb difference in our predictions.

Inappropriately Combining Groups: Combining two or more groups when the groups should be considered separately.

Ex. The scatterplot showing the relationship between height and fastest driving speed has a positive association and an r value >.75

However, when the data is split into groups of Men and Women each scatterplot has no correlation and the r value for each is <.12.

Curvilinear Data: Always graph your data before you calculate a regression line!!! Your predictions will be inaccurate if you assume a linear relationship for data with a curved pattern.

Section 5.5: Correlation Does not Prove Causation

Ex. There is a strong correlation between an individuals heating bill and illness (flu, cold, etc.)

Does this mean that the price of Heat is making us sick?

Interpretations of an Observed Association:

  1. There is a causation. The explanatory variable is indeed causing a change in the response variable.

Use a Designed Experiment which rules out confounding variables.

  1. There may be causation, but confounding factors contribute as well and make this causation difficult to prove.

Data from an observational study alone CANNOT be used to establish causation. (Too difficult to separate the effect of confounding variables from the effect of the explanatory variable).

  1. There is no causation. The association is explained by how the explanatory and response variables are both affected by other variables.

In many towns there is a very strong correlation between the number of churches and the number of bars in town. However, the number of churches does not cause the number of bars…both are affected by the population of the town.

  1. The response variable is causing a change in the explanatory variable.

There is a strong relationship between good health and church attendance. However, healthy people are more able to attend church so good health increases the likelihood of church attendance.