Relationships between 2 variables
(Y) Response variable : result or outcome of interest
(X) Explanatory variable: explains changes in the response variable
Describing Relationships
Direction : Positive, Negative
Form: Linear, Curved, Scattered
Strength: Strong, Moderate, Weak
Sentence:
There is a (strength), (direction), (form) relationship between x and y.
Simple Linear Regression
4 things to be cautious of when performing simple linear regression on data.
1.)Extrapolation: using your equation to predict outside the range of the data you used to come up with your equation.
2.)Lurking variables : an underlying variable that is causing the relationship to look different than it is in reality
3.)Outliers : your equation is strongly influenced by outliers
4.)Claiming X causes Y. They just have a linear relationship
Equation used for Simple Linear Regression
or
Term / DefinitionX / Explanatory Variable
Y / Response Variable
m or / Slope of a fitted line
or / Y-intercept of a fitted line
/ Predicted value of Y
/ Percentage of variation describe by a fitted line
Interpretation Sentences
Slope
With every additional 1 unit of X, it is predicted that Y increases/decrease by m
Y-intercept
When X = 0, we expect Y to equal b
Does the y-intercept always have to make sense? Yes / No (Extrapolation)
R : Correlation coefficient
How strongly correlated x and y are.
1 = strong positive linear relationship,
0 = no linear relationship,
-1=strong negative linear relationship.
R-squared
% of the variation in Y can be explained by X
If you are given , , , and
Finding slope
Finding y-intercept
Residuals
How do you find a residual?
What is the x and y axis on your residual plot?
x-axis : independent variable
y-axis : residuals
What are the 3 things you want your residual plot to look like if linear regression is the best fit?
1.)No pattern (random scatter)
2.)Equal variance around 0
3.)Half above 0, half below 0.
You will also want to check what about your residuals? Normality
What do you use to check this? A normal quantile plot
Regression Inference
5 Assumptions
1.)relationship between X and Y is linear (can check by looking at a scatterplot of x and y or the residual plot)
2.)no obvious lurking variables (you can kind of assume this for this course)
3.)simple random sample (was the data taken from a SRS)
4.)constant variance of the residuals (plot of residuals)
5.)residuals vary according to a normal distribution (normal quantile plot of residuals)
Recall from Exam 2… similar tests can be run on the slope to see if there is a statistically significant relationship between x and y. (slope does not equal zero)
T-statistic
Confidence interval for the slope
standard error se(b1) can be obtained from the JMP output.
is the predicted value of the slope
Hypothesis test for the slope
or
or
Degrees of freedom = n-2
Test hypothesis with a p-value
Conclusion
If , we fail to reject
Ifp , we reject in favor of