Economics 3111 Assignment 4: Rate of Return to Education
Due Date: November 28, 2014 (Friday –leave it at my office)
Part 1:Using Gretl to Estimate an Earnings Equation from Survey of Labour and Income
Dynamics Data for 2009
The Survey of Labour and Income Dynamics (SLID) is an annual survey that collects data on labour market outcomes and income. It is the main regular source of Canadian data on income distribution and poverty. The course website contains the codebook (PDF format) and the SLID data file for 2009 (the file is called: SLID_person_2009.dta). The data file is a STATA dataset -- like the Census file used in Assignment 2 –as you may recall Gretl can read this type of file using its "open" command.
The SLID codebook is quite large. To help you out here is a list of the variables you will use and the pages in the PDF where you will find the variable definitions:
ecage26 : age in reference year p.241 of PDF
ecsex99 : sex p.241 of PDF
marst26 : Marital status in the reference year 2009 p. 242
immst15 :Immigrantstatus p. 245
disabs26 :Disability status p. 246
studtf26: student status in 2009, p. 319
hleveg18 : The person’s highest level of educational attainment . p.323
wgsal42 : Wages and salaries p.313 (this is referred to below as “Wage and salary income” or “Earnings” in the rest of the assignment.
(a) Generate the following variables:
Age (note that age just equals the variable “ecage26”) and Age-squared (i.e. age x age): call the
variables “age” and “agesq”
A dummy variable indicating that the person is a woman (use “ecsex99”) – call the dummy “woman”.
A dummy variable equal to 1 if the person’s marital status was “Single (never married)” – use the
variable marst26) and call the dummy “single”.
A dummy variable that equals 1 if an immigrant (use the variable “immst15”), call the dummy
“immig”
A dummy variable that equals 1 if the person is disabled (use the variable “disabs26”) call the
dummy “disabled”
Report the means of these variables.
(b) Estimates for those whose highest level of education is “Graduated High School”.
Use the “smpl” command to restrict your sample to includeonly observations that satisfy
the following conditions:
(1) each person must be at least age 18 and not older than age 65 (use the "ecage26" variable);
(2) each person must have data on wages and salaries (use the “wgsal42” variable and restrict to the sample of people with: wgsal42>0 and <99999995);
(3) each person must not be a student (use the variable “studtf26”)
(4) each person must have Graduated High School as their highest level of educational attainment (use the variable "hleveg18")
(i) On the sample of people with "Graduated High school" as their highest level of education attainment use ordinary least squares (ols) to estimate a wage and salary income equation. “wgsal42” is the dependent variable. The explanatory variables are:
age, agesq, woman, immig, single and disabled
(you generated each of these back in part (a)). As in previous assignments, be sure to include a
constant in your regression. Write out the implied regression equation.
(ii) Further restrict your sample of High School Graduates to people age 22-65 (i.e. get rid of
those age 18-21 – keep all of your other restrictions in place). Calculate the average, median
and 95th percentile “wage and salary” income for this sample (you can use the Gretl command "summary wgsal42" to do so).
(c) Estimates for those whose highest level of education is “Non-university postsecondary certificate”
Start again with the full sample and restrict the sample to observations satisfying conditions:
(1) each person must be at least age 20 and not older than age 65 (use the "ecage26" variable);
(2) each person must have data on wages and salaries (use the “wgsal42” variable and restrict to the sample of people with: wgsal42>0 and <99999995);
(3) each person must not be a student (use the variable “studtf26”)
(4) each person must have “Non-university postsecondary certificate” as their highest level of educational attainment.
(i) Repeat step (b)(i) for the new sample.
(ii) Repeat step (b)(ii) on the new sample.
(d)Estimates for those whose highest level of education is “Bachelor’s degree”
Start again with the full sample and restrict it to observations satisfying conditions:
(1) each person must be at least age 22 and not older than age 65 (use the "ecage26" variable);
(2) each person must have data on wages and salaries (use the “wgsal42” variable and restrict to the sample of people with: wgsal42>0 and <99999995);
(3) each person must not be a student (use the variable “studtf26”)
(4) each person must have “Bachelor’s degree” as their highest level of educational attainment.
(i) Repeat step (b)(i) for the new sample.
(ii) Repeat step (b)(ii) on the new sample (notice that the age group is already 22-65 this time).
(e) Put the results for the average, median and 95th percentile “wage and salary”levels of 22-65 year-
olds obtained in parts (b)(ii), (c)(ii) and (d)(ii) in a table:
Mean Median95th percentile
Graduated High School
Non-university Post-secondary
Bachelor’s degree
How do these wage and salary figures compare across the three educational attainmentgroups?
(f) Report the three estimated regression equations (write out the underlying equations). Based on the
estimation results answer the following:
How do the signs of the coefficients on age and age-squared compare across the three sets of
results?
What is the effect of being a woman on earnings? How does the effect differ by educational attainment?
What is the effect of on earnings of being an immigrant?
What is the effect of being disabled?
Part 2: Using Excel to Calculate Rates of Return to Education
(a) You now want to generate the age-earnings profiles for your three educational groups. We will assume that the person of interest is a man, is not an immigrant, is not disabled and is not single (so all of the dummies in your regressions from Part 1 are 0). This means that you can predict his wage and salary earnings at any age using the intercept, age and age-squared coefficients:
Predicted earnings = a + b Age + c Age2
where'a' is the constant or intercept of your regression, 'b' is the estimated coefficient on age and 'c' is the estimated coefficient on age-squared. You need to make such a prediction for each age from the relevant starting age up to age 65. For High school graduates the age range is 18-65 (assume they start working at age 18); for those with a Non-university post-secondary certificate assume they start work at age 20 (so the age range is 20-65) and for those with a Bachelor's degree assume that they start work at age 22 (so the age range is 22-65).
Generating predictions for each age is very repetitive. A spreadsheet can allow you to do this fairly easily. Have a look at the spreadsheet on the course website it will get you started.
(1) Enter your regression estimates:
Go to the example spreadsheet on the course website (you can build your own from it). Open the example spreadsheet. Rows 3 to 8 consist of a table into which you can type in your coefficient estimates. Variable names (the Constant or intercept of the regression, Age and Age-squared) are given in Column A. Column C is for your “High school graduate” regression; Column E for your "Non-university postsecondary certificate" regression and Column G for your "Bachelor's degree" regression.
I have typed in made-up numbers for the “High School Graduate” group assuming the constant was
-20000, the coefficient on Age was 3000 and the coefficient on Age-squared was -35. Replace these with your actual estimates from Part 1 (b)(i). Next replace the questions marks in Columns C and G with the coefficient estimates from your other two samples (i.e. from Part 1 (c) and (d)).
(2) Creating the Age-Earnings Profile for High School Graduates:
You will now use your regression estimates to generate the age-earnings profile for High School Graduates. Go down the spreadsheet and look at rows 14-61. Each row of Column A contains an individual age (18 in cell A14 up to age 65 in cell A61). These are the values of Age that you need to generate predicted earnings using the equation: Predicted earnings = a + b Age + c Age2
Look at the formula in cell B14 (i.e. click on the cell and look at the expression in the formula window):
=$C$6+$C$7*A14+$C$8*A14*A14
this is your predicted earnings equation for a High School Graduate (C6, C7 and C8 are the cell addresses for your High School regression coefficients; the'$' symbols just tell Excel that if you copy this formula to another cell Excel should continue to use C6, C7 and C8); A14 is just Age=18 and A14*A14 is just age-squared (18 squared for this cell). The formulae in cells B15-B61 are obtained by copying the formula in cell B14 into cells B15-B61 (right click to copy a cell). Excel automatically updates any cell address for its new location unless a $ symbol is present.
So for example cell B15 contains the formula:
=$C$6+$C$7*A15+$C$8*A15*A15
which is the same as in B14 except that A14 has been automatically replaced by A15 (which is age=19) since the formula has been moved down one cell. Similarly in cell B61 you have:
=$C$6+$C$7*A61+$C$8*A61*A61
which again uses the same coefficients C6, C7 and C8 (frozen by using the $ symbols) but which has
updated the age variable to the value in cell A61 (age=65).
If you have already replaced my example values in cells C6, C7 and C8 with your actual OLS estimates column B from B14-B61 will now contain your estimate of the age-earnings profile for High School Graduates.
(3) Generate the Age-Earnings Profiles for the Non-University Post-secondary Certificate (NUC) and for
theBachelor's degrees (BACH) samples:
Now you need to generate age-earnings profiles for the other two educational attainment groups. Notice first that for age 18 and 19 for NUC in column Cand for ages 18, 19, 20 and 21 in column Dfor BACH
I have entered zeroes. This means that I am assuming that NUC’s are enrolled in their program for two years (when age 18 and 19) and earning nothing and that those with Bachelor’s degrees are enrolled and earning nothing between ages 18 and 21. Leave the 0s there for now. In cell C16 type in the formula for predicting wage and salary earnings for someone with a Non-university post-secondary certificate (the formula has the same format as for the High School graduates in cell B16 but instead uses coefficient addresses $E$6, $E$7 and $E$8 for the coefficient estimates along with the relevant age in cell A16). Once you have typedin the new formula copy it to cells C17 to C61. You will now have your profile for the NUC group.
Go to cell D18 and put in the formula for those with Bachelor's degrees (use coefficients in cells $G$6, $G$7 and $G$8 and note that the relevant age is in cell A18). Copy this formula to cells D19 to D61.
(4) Display the profiles:
You now have three estimated age-earnings profiles in columns B, C and D. Create a graph showing the three earnings profiles in the same diagram (i.e. much like Figure 9.1 in the text). It is probably easiest to just do this in Excel (highlight the cells A13-D61 i.e. the cells containing age and the three profiles. Then click “Insert” and then in the "Charts" section of the toolbar indicate that you want a line graph). Include the graph as part of your answer.
(5) Calculate the rates of return to education:
You now have age-earnings profiles for each of your three educational outcomes. To calculate the rate of return you need to make some assumptions about the direct costs of post-secondary education in each year the person is enrolled. To make things simple lets assume that direct costs are $9300 per year for a Bachelor's degree (Ontario average tuition and fees plus $2000 in other direct costs) and $5500 for a Non-university postsecondary certificate. In your spreadsheet replace the zeroes in cells C14 and C15 with $5500 and the zeroes in cells D14-D17 with $9300.
Assume that the educational investment decision is made at age 18. Column G from G14-G61 shows 'Age' minus 18 (the number of years since the investment decision: call this “t” below). Column I is the difference between predicted earnings less direct costs if NUC rather than HS i.e. it is the difference between columns C and B. Column J is the discount factor (1+i)twhere ‘i’ is a first guess of the value of the rate of return (the guess if recorded in cell R12 and is initially set equal to .05). Column K equals Column I divided by Column J and so is the present value (PV) of the difference in Column I:
(NUC Earnings-HS earnings at time t)
(1+i)t
Copy the formula in cell I15 into all the cells I16-I61. Also copy the formula in J15 into J16-J61 and that in K15 into K16-K61.
If you sum the entries in column K from K14-K61 you will have the net present value of having a Non-university postsecondary certificate rather than High School assuming that the rate of interest in cell R12 i.e. i=.05 (5%). Create this sum and report the result in cell K63, i.e. in cell K63 input the formula: =sum(K14:K61). From the class notes,you know that the (internal) rate of return on education is the value of ‘i’ for which the net present value in cell K63 is 0. You now want to find the value of ‘i’ for which this is true i.e. for which K63=0 (see below).
Option 1: Trial and error
You can just use trial and error (e.g. if NPV in cell K63>0 then increase the starting value of ‘i’ in cell R12 and see what happens to the NPV in K63 (keep increasing i until the value in K63 is near 0; ifafter changing 'i' you find NPV<0 in K63 then lower the estimate of i).
Option 2: Using Solver in Excel
An alternative is to use Excel’s “Solver” (you may need to install it first as it is a free Excel add-in that is not always automatically installed – see the Solver example on the course website). First click the “Data” tab then under “Analysis” click “Solver” (if Solver is not there you need to install it). After clicking Solver the “Solver” window will open. Specify that $K$63 is your target cell, on the "Equal to" line turn the “Value of” option on set the desired value at 0 (you want K63 to equal 0) . The specify that you want to achieve this by changing the value in cell R12. Then click "Solve" in the upper right corner of the Solver window. The result will be recorded in cell R12.
Once you have calculated the return to NUC vs. High school use the same procedure to calculate the rate of return to a Bachelor's degree vs. High School in columns M, N and O (the starting value for i is in R13 put the Net present value for this exercise in cell O63).
Report yourtwo estimated rates of return.Which is higher? Submit a copy of your completed spreadsheet as part of your answer.
(b) For the Bachelor's degree vs. High School. Redo the rate of return calculation assuming that direct costs are $14,300 rather than $9,300 per year (e.g. because of a $5000 rise in tuition). How much lower is the rate of return?
(c) Redo the calculation for Bachelor's degree vs. High Schoolin (a) assuming that the student earns $8000 per year while in school i.e. when ages 18-21. How much higher is the rate of return?
1