Least Squares Method

ESTIMATION:

Least squares method:

Statistical model:

Data available: and

Statistical model for the data:

Two assumption are made about the random error:

In the above model, we are interested in the value of the unknown parameters and . If we can guess (estimate) the two parameters reasonably accurate, then we can obtain useful information about the relationship between variables X and Y. Positive might imply the positive correlation between the two variables while the negative one might imply the negative correlation. If is close to 0, this might indicate there is no relation between X and Y. In addition, we can predict the future value of Y given a new X based on the estimated values of and .

The best known method for estimating the parameters and is the least squares method proposed by Gauss in eighteen century. The least squares method is to find the estimates of and , and , which can minimize the object function

That is, for any values of , . The following explains why the least squares method work from two points of view.

(a)Algebraic point of view:

Suppose , is the true model. Also, we assume all the random errors are very small.Then, heuristically,

However,

These 3 equations imply the object function attains its minimum as the parameters estimates equal to their true counterparts. That is, under small random errors (which usually happen in practice), the parameter estimates and might be quite close to the true value of the parameters as 3 and 5 given above.

(b)Geometric point of view:

Suppose there is no random error and is the true model.

Suppose is the “estimated” model. Then,

is the sum of square of the distance between the predicted (estimated) response values () and the data . Intuitively, since the data are generated from the true model, sensible parameter estimates should result in small sum of square of the distance between the response and the predicted response. For example, as the “estimated” model is the true model, is equal to 0. In general, is the sum of square of the distance between the data ’s and the predicted response . The above interpretation can be applied as the true model has small random errors.

Now, we show the procedure to find and . In calculus, the maximum or the minimum of a two variables of function f(x,y) can be found by solving and first, then check the Hessian matrix (the second partial derivative) matrix is positive definite or negative definite. Therefore, we need to find the solutions of the normal equations:

and .

It is quite complicated to find the solutions of the above equations directly. However, it is much easier by let , and thus

Then, we find , minimizing and can be obtained by the equation .

Since

Thus,

where

, .

Thus,

The fitted regression equation (the fitted line) is

The fitted value for the i’th observation is and the residual for the i’th observation is The residuals ’s can reflect how close the fitted line and the data.

Example:

/ 0 / 2 / 2 / 5 / 5 / 9 / 9 / 9 / 9 / 10
/ -2 / 0 / 2 / 1 / 3 / 1 / 0 / 0 / 1 / -1

Please fit the model .

[sol]:

Thus, and

and .

is the fitted regression equation.

Note: for the model , .

[Derivation of ]: