Math 217Name: ______

Lab 6: Scatterplots and Correlation in SPSSComputer # ______

Due Date: ______

In this lab, we investigate relationships between pairs of quantitative variables. Such a relationship is usually graphed with a scatterplot. The correlation between the variables measures the strength and direction of their linear association.

In a scatterplot, the values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data set appears as a point in the plot; the point’s placement is determined by the values of both variables for that individual.

We can draw scatterplots in SPSS using either Graphs > Legacy Dialogs orGraphs > Chart Builder. The following directions use the Legacy dialog, but if you prefer Chart Builder, go ahead and use it.

What to do:

  1. Start SPSS and open the SPSS data file you saved from lab 4 (fast food data).
  2. The five quantitative variables in this file are fat, sodium, calories, caloric density, and serving size. Before we draw any graphs, use the Variable View tab to make sure all fivequantitative variables are defined as using scale measure. If you have to change any of them, save your data file afterward.
  3. Why are some items higher in calories than others? One reason might be the serving size.

Note 1: If we are trying to use one variable (like serving size) to explain the variation in another variable (like calories), the first variable is called the “explanatory” and the second is the “response” variable. In SPSS, they are sometimes referred to as “independent” (= explanatory) and “dependent”
(=response).

Note 2: The explanatory variable should be plotted on the x axis (horizontal), and the response variable should be plotted on the y axis (vertical).

  1. Draw a scatterplot for calories (response) vs. serving size (explanatory):

Use Graphs > Legacy Dialogs > Scatter/Dot > Simple Scatter > Define.

Click Reset to return all settings to their defaults.

Put serving size on the X axis and calories on the Y axis.

Use the Titles button to title your scatterplot, “Calories vs. Serving Size for 60 Items". Click Continue.

Click OK.

  1. In any graph of data, you should look for the overall pattern and for striking deviations from the pattern (outliers).

Some scatterplots seem unformed (I call them “blob” shaped).

Some scatterplots clearly suggest a line or a curve along which the points fit (more or less).

Some scatterplots are made up of two or more clusters of points, and each cluster may have its own individual shape.

  1. Look at the calories vs. serving size scatterplot. Write a description of your opinion regarding the pattern of the scatterplot (unformed? linear? curved? clusters? outliers?):
  1. Find the names, serving size values, and calorie content values for any outliers you perceive in the scatterplot:

Item name / Serving size (g) / Calories

Specifically, what is different about these items which distinguishes them from all the other items in the file?

  1. Because these items are not “typical” fast food, we might wish to exclude them from our analysis. Back in Data View, use Data > Select Cases to temporarily remove these three items from consideration.
  2. Redraw the scatterplot. Give it the title, “Calories vs. Serving Size, Outliers Excluded.”
  3. With the outliers excluded, let's agree to describe the association between calories and serving size as “roughly linear.” To show the line which best models the association between the variables, open the Chart Editor and choose Elements > Fit Line at Total. Close the Chart Editor. Notice that the line captures the general trend of the scatterplot.
  4. To measure the strength and direction of the linear association (after excluding the outliers), choose Analyze > Correlate > Bivariate. Move calories and serving size into the Variables list, and click OK. To find the correlation (r) between calories and serving sizein the Correlations table,look in the row for one variable and the column for the other variable. Record the Pearson Correlation: r = ______.
  5. The sign of the correlation indicates the direction of a linear relationship:

If r > 0, the y-values and x-values tend to increase together; we say there is a positive association between the variables.

If r < 0, the y-values tend to decrease as the x-values increase; we say there is a negative association between the variables.

a)As serving size increases, does the calorie content tend to increase, or tend to decrease? ______

b)Is the correlation you found above a positive value, or a negative value? ______

c)For calories vs. serving size, what is the direction of the linear relationship (positive or negative)? ______

  1. The correlation also measures the strength of a linear relationship.

If r2 is close to 1 (r is close to 1 or to -1), the linear relationship is very strong. The points tend to lie close to the linear model.

If r2 is close to 0 (r is close to 0), the linear relationship is very weak.

Ifr2is close to 0.5 (r is close to ±0.7), the strength of the linear relationship is best described as “moderate.”

a) What is the value of r2 for calories vs. serving size? ______

b) How would you describe the strength of the linear relationship between calories and serving size? ______

  1. Use Data > Select Cases to select all cases (thus reinstating the outliers).
  2. Like the mean and standard deviation, the correlation is not resistant;the correlation can be strongly affected by a few outliers. Re-calculate the correlation with the outliers included: Analyze > Correlate > Bivariate.

a)Now the correlation between the two variables is r = ______.

b)Double-click on the original scatterplot (which included all 60 items) and add a fit line at total.

c)With the outliers included,the R-Sq Linear is r2 = ______.

d)Is the linear association stronger now that the outliers are included, or is it weaker? ______

  1. Move on to a second example. Use Data > Select Cases to select only the Taco Bell items (restaurant = 2).
  2. Create a scatterplot for calories vs. serving size for the Taco Bell items, and give it an appropriate title.
  3. What is the most obvious feature of this scatterplot?
  1. There is one point in the scatterplot which is distinctly separated from the other points. What’s the name of this item, and what’s unusual about it?
  1. Use Data > Select Cases to unselect that one item. Create a scatterplot for the remaining Taco Bell items and give it an appropriate title.
  2. Use the Chart Editor to add a fit line at total.

a)What is R Sq Linear for this scatterplot? r2 = ______

b)What is the Pearson Correlation? r = ______

c)How strong is the linear association between the variables? ______

d)What is the direction of the association? ______

e)Which of these Taco Bell items is farthest from the fit line (look for the greatest vertical separation)? ______

------

Save your SPSS output to your folder; name it “Lab 6.”

Also save your data file as "Lab 6."

1