Dear sabrinaanddavid,

Great question. Here are the answers.

It is often desirable to use a linear function to model a given set of data.

In this activity, you will work with several data sets, and for each, you will

look for a suitable line that approximates the given data. You will also work

with the line that is generally used as the best-fitting line, the least-squares

regression line. You will learn how to use Excel to find this line and its

equation.

For the following scatterplots a, b, and c, draw an appropriate line to fit

the data by “eye-balling” the graph and judging what line comes closest to

all points. In the space following each graph, indicate whether the slope

of the line you drew is positive, negative, or zero. For graph d, explain

why a line would not be a good fit. (Source: The Wall Street Journal

Almanac 1999.)

You would draw the line that sort of goes with the data. The slope is positive. It goes up and to the right.

For this one the slope is negative. It goes down and to the right.

It appears to me that this line also goes up and to the right (positive slope).

A line (i.e., a straight line) doesn’t fit this data well because there appears to be a non-linear (i.e., a curved) relationship between the two variables.

You’ll want to draw straight up and down lines from the points to the line. There are 6 points above the line and there are 5 points below the line.

You can tell from the scatterplot that the slope of the line will be positive by looking at the general trend of the points. Since the line is designed to be the best fit, it will follow the trend of the points. Since the mind is so good at detecting patterns, we can generally tell the sign of the slope just by looking.

Write the equation of your line and indicate what the variables x and y represent.

Y=-1.0322x+574.99

Y is the SAT verbal score.

X is the percent taking the test.

What is the slope of the line and what does it represent? Interpret the slope in

the context of the data.

The slope is -1.0322. That is, for every increase of 1% taking the test, we expect to see a decrease of 1.0322 in the verbal score.

What is the y-intercept of the line and what does it represent? Interpret the

y-intercept in the context of the data.

The y-intercept is 574.99. It represents the expected score on the test when the percent is 0. This number does not have much meaning for this problem.

Use the line you found to predict the average verbal SAT score for a state in

which 60 percent of students take the exam.

We plug 60 into the equation for x and we get and we get: 513.058

Where does this value appear on the graph? Mark it on a copy of the graph.

You’re going to have to mark this point (draw a vertical straight up and down line at x=60 until you hit the line, then go to the left (flat) until you hit the y-axis.

458 Excel Activities

Write the equation of the line and indicate what the variables x and y represent

in the equation.

Y=-0.0135x+25.656

Is there a clear choice of explanatory variable and response variable in this

data set? Why or why not?

There is not. A theater owner might be interested in predicting the number of tickets he’ll sell based on movie rental data. Similarly, a movie rental place might be interested in predicting sales based on theater attendance. Both data sets are equally easy to gather and both are of moderate interest. There is no clear choice.

Describe what your scatterplot and line show.

We see a slightly negative slope. The main item of interest is a huge outlier. One guy who claims to have seen 200 movies in the theater. This cannot be right.

There is one clearly unusual data point in the data set—the male who estimated

he saw 200 movies at a theater last year. Delete this case and look at

the scatterplot for these adjusted data.

Write the equation of this adjusted line and describe how the least-squares line changed when the outlier was deleted.

Y=0.1582x + 23.05

The line changed a great deal. The 200 was a high leverage point. Deleting it dramatically changed the line. It changed from a negative to a positive slope.