Relationships between 2 variables

(Y) Response variable : result or outcome of interest

(X) Explanatory variable: explains changes in the response variable

Describing Relationships

Direction : Positive, Negative

Form: Linear, Curved, Scattered

Strength: Strong, Moderate, Weak

Sentence:

There is a (strength), (direction), (form) relationship between x and y.

Simple Linear Regression

4 things to be cautious of when performing simple linear regression on data.

1.)Extrapolation: using your equation to predict outside the range of the data you used to come up with your equation.

2.)Lurking variables : an underlying variable that is causing the relationship to look different than it is in reality

3.)Outliers : your equation is strongly influenced by outliers

4.)Claiming X causes Y. They just have a linear relationship

Equation used for Simple Linear Regression

or

Term / Definition
X / Explanatory Variable
Y / Response Variable
m or / Slope of a fitted line
or / Y-intercept of a fitted line
/ Predicted value of Y
/ Percentage of variation describe by a fitted line

Interpretation Sentences

Slope

With every additional 1 unit of X, it is predicted that Y increases/decrease by m

Y-intercept

When X = 0, we expect Y to equal b

Does the y-intercept always have to make sense? Yes / No (Extrapolation)

R : Correlation coefficient

How strongly correlated x and y are.

1 = strong positive linear relationship,

0 = no linear relationship,

-1=strong negative linear relationship.

R-squared

% of the variation in Y can be explained by X

If you are given , , , and

Finding slope

Finding y-intercept

Residuals

How do you find a residual?

What is the x and y axis on your residual plot?

x-axis : independent variable

y-axis : residuals

What are the 3 things you want your residual plot to look like if linear regression is the best fit?

1.)No pattern (random scatter)

2.)Equal variance around 0

3.)Half above 0, half below 0.

You will also want to check what about your residuals? Normality

What do you use to check this? A normal quantile plot

Regression Inference

5 Assumptions

1.)relationship between X and Y is linear (can check by looking at a scatterplot of x and y or the residual plot)

2.)no obvious lurking variables (you can kind of assume this for this course)

3.)simple random sample (was the data taken from a SRS)

4.)constant variance of the residuals (plot of residuals)

5.)residuals vary according to a normal distribution (normal quantile plot of residuals)

Recall from Exam 2… similar tests can be run on the slope to see if there is a statistically significant relationship between x and y. (slope does not equal zero)

T-statistic

Confidence interval for the slope

standard error se(b1) can be obtained from the JMP output.

is the predicted value of the slope

Hypothesis test for the slope

or

or

Degrees of freedom = n-2

Test hypothesis with a p-value

Conclusion

If , we fail to reject

Ifp , we reject in favor of