Chapter 1 Simple Linear Regression

Example:

Y: force, X: mass, then the famous Newton’s first law is

Y= 9.8 X

Example:

Y: consumption of chocolate in pounds, X: population, then the relationship between the consumption of chocolate and the population of a country might be

Y= X+ 3

Ideally, we all hope the usual equation for describing the relationship between two variables X and Y in science are fixed. A commonly used equation is Y=f(X), where f is certain function. Thus, given the value of X, the exact value of Y can be obtained via this equation. However, in the real world, based on the data obtained, it is almost impossible to obtain the fixed equation as given in the previous examples. There are so many sources of unpredicted errors in the data collecting process. In addition, the fixed or exact equations might not be accurate in describing the natural phenomenon. Therefore, it should be sensible to take the random error into account. This motivates us to consider the statistical model Y=f(X)+ε, where ε is some random variable and is referred to as the random error and random variation. The function f(X) which can describe the relationship between variables Y and X might be very complicated. In this course, we consider the simplest equation, the linear equation. The simplest linear regression model is

,

where and are unknown parameters. We will refer the above model as the simple linear regression model. The word “regression” was associated with Sir Francis Galton (1822-1911) when he studied the relationship between the height of the parents and the height of the children. He found that very tall parents might tend to have shorter children. It seems to be “reverse”. Therefore, he used the word “reversion” for describing the relationship. We will discuss simple linear regression in

next section.

l  Y is often called response, or dependent variable, or outputs.

l  X is often called predictor, or independent variable, or inputs, or regressor.

Question: Why did Sir Francis Galton use the word “reversion”? Please explain based on the linear equation he found for the heights of the parents and the children.

2