Chaojun LI(李超君) 5101209122

Cost Function

This report tries to examine the cost function of 145 electricity generating companies in the United States.


With the increase of the quantity of electricity produced, the cost will rise simultaneously. What’s more, the average cost drops dramatically first with the increase of electricity produced, and then has a trend to go up.

a. Regression model

1)  If we run the regression to estimate the relation between costs and the quantity, we can get the equation,

cost=-0.741095+0.006431*KWH

2)  Diagnosis



The SST, SSE and SSR are separately 56422.84, 51190.372 and 5232.468. The adjusted R-square is 0.907, which mean this equation explains a large part of the cost.

If we have a look at the residuals, when the quantity of electricity produced or the fitted value increases, the residuals scatter more widely and tend to bias negatively, which means the model may violate the hypothesis that the residuals are independent from the regressor and homoscedasticity and the model may be improved.

From the partial residual plots, we can see that the relation does not fit linear very well, which mean we can improved the model by adding a regressor, KWH^2.

b.  An revised regression model

1)  After we add the regressor, KWH^2, the regression model is changed into,

Costs = 1.883860 + 0.0.003731KWH +2.4E-7KWH^2

2)  Diagnosis


The SST, SSE, SSR of the revised model are separately 56422.84, 52841.396 and 3581.444. The adjusted R-square is 0.935631. The model is improved in a way that it explains a larger part of costs.

With the increase of KWH and fitted value, the residuals still scatter more widely, which violates the hypothesis of homoscedasticity. However, the residuals have a less trend to bias negatively, which means the model has been improved.

If we have a look at the partial residual plot, the relation is more like linear than the previous one.

According to the diagnosis, the revised model is indeed better than the previous one.

Engel’ Law

a.  Regression Model

1)  In the previous reports, we have already run the regression and get the equation that,

food/totcper = 1.091566 -0.075737logtotcinc + u.

This time, we may conjecture that the consumption of food depends on the average age of the adult household members. After running the regression, we get,

food/totcper = 1.060486 + 0.000963age -0.077019logtotcinc + u

If the model is correct, then when the age increases by 10.38, holding other things constant, the food/totcper will increase by 1%.

2) Diagnosis

The SST, SSE, SSR are separately 187.7086, 16.0266 and 171.6820. The adjusted R-square is 0.085027.

The scatter plots of residual against



regressors and fitted value show vague images that when the fitted value and regressors increase, the residuals fall into certain areas, which imply that the regression does not violate the assumption of homoscedasticity.



If we have a further look at the partial residual plots against age and logtotcinc, we can see that the linear relation is not very clear. Thus we can revise this model by adding Age^2 and Logtotcinc^2.

b.  Revised regression model

1)  The equation of revised regression model is

food/totcper = -0.276925 -0.011391*age +0.000141*age^2 +0.376869*logtotcinc -0.031971*logtotcinc^2

2)  Diagnosis



SST, SSE and SSR are separately 187.7086, 22.076 and 165.6326. The revised R-square is 0.116940. We can see that the model has been improved in a way that the equation explains a larger part of the dependent variable.


The scatter plots of residual against regressors and fitted value still give vague pictures that the residuals fall into certain areas.



The partial residual plots show a more linear relation than than previous one, which means the revised model is improved.