STA 6127 – Homework 5 – Due April 20

Population Projections

Part 1

You are a demographer circa 1920, and have just been given the updated census information from the 1920 U.S. census. You fit the following models to describe the population growth of the U.S. since the first census (1790). First, convert population to units of Millions by dividing population by million, and use decade (not year) as X. Use data only up to 1920 (by Selecting Cases) to fit models. The first three models can be fit using SPSS by selecting:

Analyze à Regression à Curve Fitting

Dependent Variable: Population (use in millions form)

Independent Variable: Decade

·  Model 1: Pop = a + b1X + b2X2 (Quadratic)

·  Model 2: Pop = a + b1X + b2 X2 + b3X3 (Cubic)

·  Model 3: Pop = abX (Growth)

·  Model 4: Pop = a + b1X + b2X2 + b3log(X) (Pearl-Reed)

  1. Write out each of the models, in terms of their estimated regression coefficients.
  2. Obtain the fitted values for all years, by first selecting All Cases. Then computing 4 new variables (where the a’s and b’s are from above):

Transform à Compute à

Fit_quad = a + (b1*decade) + (b2*sqdec)

Fit_cube = a + (b1*decade) + (b2*sqdec) + (b3*cubdec)

Fit_exp = exp(a+(b*decade))

Fit_pr = a + (b1*decade) + (b2*sqdec) + (b3*logdec)

  1. Obtain the Forecast errors for 1930-2000 for each Model. First, select only the cases where year>1920. Then compute 8 new variables:

Transform à Compute à

afe_quad = abs(Pop-Fit_quad)

sfe_quad = afe_quad**2

(Repeat for Cube, Exp, and P-R)

  1. Which method provides the best forecasts in terms of mean absolute forecast error (MAE) and mean squared forecast error (MSE)? Again, use only years 1930-2000. You can use DESCRIPTIVES to obtain the MAE and MSE for each method. (Note: You want the method with the minimum).

Part 2

Download the donner party dataset, and fit a logistic regression model relating the probability of survival to age and gender among those who were 15 years of age or older.

1.  Obtain the fitted model.

2.  Test whether probability of survival is related to EITHER age or gender (a=0.05 significance level).

3.  Obtain a 95% confidence interval for the odds ratio when age increases by 1 year, controlling for gender.

4.  Obtain a 95% confidence interval for the odds ratio (Females/Males), controlling for age.

5.  Test whether there is an interaction between age and gender

6.  Obtain the fitted probabilities of survival for the following Groups (based on the no interaction model):

20 Years old / 50 Years old
Male
Female