University of Oslo

UNIVERSITY OF OSLO

DEPARTMENT OF ECONOMICS

Exam: ECON4135 - Applied statistics and econometrics, fall 2004

Date of exam: Wednesday, December 1, 2004

Time for exam: 14:30 – 17:30

The problem set covers 6 pages

Resources allowed:

· All written and printed resources, as well as calculators, are allowed

Grades given: A (best), B, C, D, E and F, with E as the weakest passing grade.

Comments given in arial font

“Broken limits to life expectancy” by Oeppen and Vaupel (Science, VOL 296, 10 May 2002) shoved that many previous claims of upper limits to expected life length for a newborn have been broken, and also that expected life length has shown a remarkable linear development since 1840. We shall look at some of these data for females.

For each of the years 1840 – 2000 Oeppen and Vaupel looked at observed life expectancy in best practicing country, called record life expectancy. Best practicing country is defined as the country with the highest life expectancy in the actual year. Life expectancy in a country, often denoted by , is calculated from the observed age-specific mortality rates in the actual year, and is the expected life length for a newborn under the hypothesis that it is subject to mortality rates throughout its life as was observed in its year of birth. Record life expectancy in a given is denoted by.

Problems

1. Female life expectancy in best practicing country is plotted against year in Figure 1. A linear regression model is fitted to the data by ordinary least squares, see Stata output in Exhibit 1. What is the interpretation of? Is the intercept estimate directly meaningful? Give a 95% confidence interval for the gain in life expectancy in one calendar year, and also in 10 calendar years. What is the 99% confidence interval for yearly gain in expected life length?

is yearly gain in female life expectancy in best practicing country in the model, and is its estimate from the1840-2000 series. Since year is measured after Christ, year=0 is way outside the observed data span, and extrapolation to the value at year 0, the intercept, is risky business. The estimated intercept is large negative, which is nonsense for life expectancy. 95% CI for : (0.238, 0.248); 95% CI for is 10 times that for , (2.38, 2.48); 99% CI for : (0.237, 0.249).

2. From Figure 1 there was clearly more variation in the data in the first part of the period than in the remaining period, and there were outlying observations in the period 1916-1919. What could have caused these patterns? Another pattern is that record life expectancy is flat over several periods around 1900. Why could that be? The regression results in Exhibit 1 were calculated with robust standard errors. Why is it a good idea to calculate robust standard errors in this case?

Few countries gathered statistical data on death rates in the early part of the period, and those who did produced vital statistics more prone to error and variability than in the 20th Century. 1916-1919: war and Spanish disease. Flats over periods: Around 1900, several countries, including Norway, published vital statistics every fifth year. With the best practicing country in this group, the series has flats between publication years.

The standard errors for regression coefficients are biased if computed with the classical method rather than the robust method when there is heteroscedasticity such as the observed.

3. For a given year from 1841 on, the first difference in record life expectancy is . Figure 2 shows first differences record life expectancy versus year, and Exhibit 2 gives summary information for this variable. Comment briefly on Figure 2 in view of Figure 1, and explain how the regression result relates to the mean in Exhibit 2.

Figure 2 shows more variability early in the period (due to more variation around the regression curve), several flats at zero around 1900 (due to flats in Figure 1), and variation around a constant level slightly above zero (due to linearity in Figure 1). The mean 0.243 in Exhibit 2 estimates the yearly gain in record life expectancy, which is modelled as. It agrees with the regression estimate of 0.243. The standard error obtained from Exhibit 2, is 0.087, which does not match the standard error in Exhibit 1 (0.0024). The discrepancy is due to the former being based on all differences having the same variance, which certainly is not the case, while the latter is calculated by the robust method not relying on this assumption.

4. From 1946 on appears to have a rather stable development. An auto-regressive model of order 1 was estimated for this period. What could a rationale be for this model? The Stata result (in condensed form) is given in Exhibit 3, which means that the estimated model is. What is the standard error for the autoregressive coefficient? Is there a significant first order auto regression in the first differences in record life expectancy?

The standard error for the auto-regressive coefficient is 0.148. With a two-sided p-value of 0.36 (from the exhibit) when testing for no auto-correlation, the auto-regression coefficient is certainly not statistically different from zero. The first differences in record life expectancy might thus very well be uncorrelated.

5. It is puzzling that record life expectancy has been growing nearly linearly over such a long period, and indeed seems to continue to grow at about the same pace. Figure 3 is taken from Oeppen and Vaupel (2000). In the first half of the period, only a few countries had life expectancy close to record life expectancy (or, in fact, adequate vital statistics). In more recent years, more and more countries, such as Chile, are getting their vital statistics in shape, and are catching up with the leading group. That the group of nations with nearly record life expectancy is growing in number is due to economic and other development in many nations. Discuss whether the continued growth in record life expectancy could partly be a statistical consequence of the fact that more and more countries belong to the group of leading nations regarding female life expectancy.

In a hypothetical situation with life conditions (underlying mortality) not changing in the group of best practicing countries, but with new countries joining this group, estimated record life expectancy will tend to grow simply since the record is the maximum of a larger and larger number of largely independent random variables. If, say country in the group of size in year has observed life expectancy which are iid with cumulative distribution function F, the record has distribution due to independence. As increases this certainly decrease for each y such that F(y)1. The distribution of record life expectancy is thus moving to the right from year to year. This formal argument was not required at the exam.

6. Scholars have made claims of upper limits to female life expectancy. These claims have been based on a variety of biological, demographic and other grounds. A claim of an upper limit, say 64.8 years, is that no country will ever have a female life expectancy above that limit. Oeppen and Vaupel (2000) identify 19 independent such claims or asserted ceilings on female life expectancy, see Figure 4. The first claim was made by Dublin in 1928, and the claim was that female life expectancy could not exceed 64.8 years. This was a failure even when it was made; since the record life expectancy exceeded the limit already in 1921 (New Zealand had 65.9). Of the 19 claims, 14 have come out as failures by 2002. For claim let be the year the claim was made, and let be the binary variable (coded 1 for failure) recording whether the claim has come out as a failure, i.e. has been beaten by record life expectancy by 2002. Exhibit 4 shows output from two logistic regressions, both with as the dependent variable. The first logistic regression had only as regressor, while in the second case both year of claime and lapse time were attempted introduced as regressors. Interpret the two sets of results. In the second case was dropped by Stata. Why?

The two results agree since the linear predictor in the two logistic regressions are identical: . Here, is the regression estimate, and a is the intercept in Exhibit 1. and are perfectly collinear, and Stata rightly rejects to have both terms in a linear logistic regression. Otherwise regression coefficients would not have been identifiable.

7. To what extent is life expectancy determined by economic variables like GDP per capita? Suppose you had data on life expectancy and GDP per capita, , for your own country, or say Norway. Would you think that a regression of the form would yield valid results regarding the posed question? Regard your hypothetical data as the outcome of a quasi-experiment, and discuss potential threats to internal and external validity.

The suggested regression will probably be subject to omitted variables bias since both GDP and mortality is likely to depend on common variables. One might also have validity problems with the regression since GDP and life expectancy are likely to be mutually dependent in the sense that they are determined endogenously. These are threats to internal validity. The dependency between the two variables need not be the same in different parts of the world. One might, for example, have egalitarian societies like the Norwegian, which had high life expectancy while being among the poorer European nations in the 19th Century. Differences in relationship between the two variables are threats to external validity: results from Norway might not be valid for U.A.R. etc.

Exhibit 1

Regression with robust standard errors Number of obs = 161

F( 1, 159) = 9928.61

Prob > F = 0.0000

R-squared = 0.9821

Root MSE = 1.5325

------

| Robust

Y | Coef. Std. Err. t P>|t| [95% Conf. Interval]

------+------

year | .2429773 .0024385 99.64 0.000 .2381613 .2477933

_cons | -401.4199 4.754271 -84.43 0.000 -410.8096 -392.0303

Exhibit 2

Variable | Obs Mean Std. Dev. Min Max

------+------

D | 160 .2431875 1.096574 -4.209999 5.060001

Exhibit 3

Sample: 1946 to 2000 Number of obs = 55

Wald chi2(1) = 0.84

Prob > chi2 = 0.3607

------

D | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

_cons | .2566701 .0598694 4.29 0.000 .1393282 .3740119

------+------

ar L1 | -.1353576 .1480772 -0.91 0.361 -.4255836 .1548684

------+------

Exhibit 4

Logit estimates Number of obs = 19

LR chi2(1) = 14.17

Prob > chi2 = 0.0002

Log likelihood = -3.865933 Pseudo R2 = 0.6470

------

F | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

t | -.5955806 .4474039 -1.33 0.183 -1.472476 .2813149

_cons | 1184.766 889.9979 1.33 0.183 -559.5983 2929.129

note: t dropped

Logit estimates Number of obs = 19

LR chi2(1) = 14.17

Prob > chi2 = 0.0002

Log likelihood = -3.865933 Pseudo R2 = 0.6470

------

F | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

x | .5955806 .4474039 1.33 0.183 -.2813149 1.472476

_cons | -7.586783 5.772359 -1.31 0.189 -18.9004 3.726832

------

Figure 1. Female life expectancy (in years) in best practicing country by calendar year. Source: Oeppen and Vaupel (2000).

Figure 2. First differences in record life expectancy versus year.

Figure 3. Female life expectancy in five countries compared with the trend in record life expectancy. Source: Oeppen and Vaupel (2000).

Figure 4. Record female life expectancy from 1840 to the present. The linear-regression trend is depicted by a bold black line and the extrapolated trend by a dashed gray line. The horizontal black lines show asserted ceilings on life expectancy, with a short vertical line indicating the year of publication. The three dashed red lines denote projections of female life expectancy in Japan published by the United Nations in 1986, 1999, and 2001: It is encouraging that the U.N. altered its projection so radically between 1999 and 2001. Oeppen and Vaupel (2001).