Economics 3111 Assignment 2: Labour Supply

Due: October 15, 2014

Part 1: LFS Supply Measures for Women**

The Labour Force Survey provides data on a number of variables relevant to labour supply. Have a look at the LFS codebook used in assignment 1. The Labour force status variable (LFSSTAT) can be used to identify labour force participants, later questions ask about hours worked in the week the survey was undertaken (AHRSMAIN and UHRSMAIN for the person’s main job, ATOTHRS and UTOTHRS for all jobs the person held). Answers to the hours question are used to classify people who are working as either full-time (30 hours/week or more) or part-time workers (less than 30 hours/week); see FTPTMAIN. If working part-time in their main job workers are also asked why (WHYPTNEW). There are also questions regarding overtime hours (PAIDOT, UNPAIDOT) and why someone who is not working left their last job (WHYLEFTN).

In the following question you will create some summary statistics on labour supply for women in three age groups: 15-24 year-olds, 25-49 year-olds and those aged 50-64 (use the LFS variable AGE_12 to identify these age groups). Use Gretl and the June 2014 LFS data file used in Assignment 1.

Due to the LFS sample design the statistics created below all need to be weighted using the variable FWEIGHT (see assignment 1). In Assignment 1 you created weighted employment, unemployment and labour force measures by multiplying the relevant dummy variable by FWEIGHT and then summing. In this question you will want to calculate some weighted averages. Fortunately, Gretl’s ‘summary’ command provides you with an option that will allow you to do this easily. Say for example you want to obtain the weighted mean of two variables called ‘A1’ and ‘A2’ and that the weights are in the variable FWEIGHT. The command:

summary A1 A2 --weight=FWEIGHT --simple

will give you the weighted means, standard deviations as well as the minimum and maximum of the A1 and A2 (if you leave out the ‘simple’ option it will give you additional statistics on variables A1 and A2).

(a) In assignment 1 you calculated the labour force participation rate for the entire sample and for an assigned subsample. You could repeat what you did there on each of the three age specific samples for women to obtain the labour force participation rates. You can get the same result a bit more easily by defining a labour force dummy (use LFSSTAT) and then calculating its weighted average on each of the three samples using the ‘summary’ command and the weight option. Do this and record the three resulting labour force participation rates (put them in the table mentioned on the next page).

For parts (b), (c) and (d) you need to again restrict the sample to women in each of the three age groups. In addition, since the variables of concern are only defined for a subset of women (employed women in (b) and (c) or female wage-earners in (d)) you could use ‘smpl’ to further restrict your sample. However since the variables used in (b), (c) and (d) have missing values for the variables of concern for women who are not employed or are not wage-earners you can simply calculate the weighted averages using ‘summary’ on the sample of women in the relevant age group (remember to use the weight option!). Gretl’s summary command will automatically omit the observations for which data is missing and so effectively restricts the sample for you.

(b) Have a look at the codebook entry on the variable FTPTMAIN. Use it to generate a dummy equal to 1 if the person is a part-time worker in their main job. Generate the share of employed women working part-time as the weighted average of your part-time dummy on the relevant age-specific sample. Record the share of employment that is part-time for each of the three age groups (include this in your table).

(c) On the sample of employed women also calculate the weighted average for ‘Usual total hours worked per week’ (UTOTHRS) for each of the three age subsamples. Record the results in your table.

(d) Not all of those employed are paid a wage or salary e.g. some may be self-employed and while others could be unpaid workers. Those who are employed but have no wage data will have missing values for HRLYEARN. Calculate the weighted average of the hourly wage for each of the three age groups.

(e) Student status, marital status and the presence of children (AGYOWNKN) may affect labour supply.

(i) On the same samples as in (a) (all women in the relevant age group) calculate the share who are classified as students (SCHOOLN not equal to 1) and the share who are married or living common-law (use the variable MARSTAT – be sure to use the second version of MARSTAT in the codebook). In both cases define a dummy variable then take its weighted average.

(ii) Now use AGYOWNKN (Age of Youngest Own-child) to calculate the share of women in each of the three age groups who has a youngest child age 12 or under. A problem arises here since AGYOWNKN is recorded as a blank if the person has no children present. Creating a “child age 12 or under” dummy (call it ‘kidlt12’) the usual way will leave the dummy undefined (blank) for observations for which AGYOWNKN is blank whereas you want it to equal 0 for these observations. One way to fix this is to define ‘kidlt12’ the usual way and then convert the missing values to zero as follows:

genr kidlt12a=misszero(kidlt12). Do this and calculate the weighted mean of kidlt12a.

Summarize the results of (a)-(e) in a table:

Age groups
15-24 / 25-49 / 50-64
Supply measures:
Participation rate
Share Part-time
Average hours
Other characteristics:
Average wage
Student share
Married share
Share with Children under 12

(f) Based on the results in your table write a note comparing the labour supply outcomes for the three age groups (participation rates, average hours, share working part-time). Do the data on student status, marital status, share with children under 12 and average wage suggest any possible reasons for the differences in labour supply patterns by age group in your table? Explain.

(g) Go back to the data. Restrict the sample to women “Not-in-the-labour force” (LFSSTAT=6) for whom there is data on the variable WHYLEFTN (values of WHYLEFTN between 0 and 13). This is now a sample of people who are not currently in the labour force but had been working in the previous year.

(i) Calculate the shares of this sample that gave each of the possible responses. Report the results in a table. (For this one we will forego weighting so you can use Gretl's "freq" command to obtain the shares -- the command is also discussed below in Part 2 (b)).

(ii) Values 0-8 of WHYLEFTN define “job leavers” while values 9-13 apply to “jobs losers”? What share were 'leavers' in June 2014?

(iii) What were the three most popular reasons for leaving a job?

(iv) Think about the reasons for leaving the last job in terms of the labour supply model. The labour supply model says that people might leave jobs and become non-participants if: preferences (and so value of time in work vs. non-work) change, if wages fall or if non-labour income rises (assuming leisure is a normal good). Look at each response 0-8. Do any seem good candidates for preferences changes? wage falls? non-labour income rises? Which ones. Explain.

(v) Turn to the jobs losers now. These people lost their last job and have now left the labour force. Is this consistent with the labour supply model? Why or why not?

**Be sure to include a copy of the relevant parts of your Gretl output files along with your answers to the Part 1 questions.

Part 2: A Labour force participation regression from 2006 Census data

On the course website you will find a data file that I have created from the 2006 Census public-use microdata file. It contains data on 113,093 Ontario women who were aged 18-69 in May 2006 (the time when the Census was done). Like the LFS file used in the previous question each row of the data file contains data on an individual person. Unlike the LFS this sample was constructed to be like a random sample – the practical consequence of this is that we will not need to weight the results in this question. The data file is called:

Census_2006_ont_women_18_69.dta

it was created using a statistics package called STATA. Gretl can read data in many formats including STATA so all you need is an “open” statement in your script file with the appropriate folder address and file name (if you are opening the file using the Gretl menu you will need to "import" it as a STATA file).

The course website also contains a PDF version of the 2006 Census codebook which describes how each variable is coded. Pages 6-8 list the variables in the order that they are found in the codebook pp. 10-90 give the actual codings of the variables. Note that although they have the same names the codebook gives the variable names in uppercase while the variable names are all lowercase in the datafile. So for example “LFACT” in the codebook is called “lfact” in the datafile. Use lowercase names in your program.

The data file you will use is not quite complete. Some observations that were missing data on key variables have been dropped from the datafile. I have also dropped some variables that we will not need in this or other assignments in order to keep the size of the file down.

The Census variables needed for this question are (codebook pages are given):

lfact Labour force activity (p. 62)

agegrp Age group (p. 20)

marsth Marital status (historical version) p. 21 -- ** use marsth not marst!

pkid0_1 Presence of children aged 0-1 (p. 17)

pkid2_5 Presence of children aged 2-5 (p. 18)

pkid6_14 Presence of children aged 6-14 (p. 19)

attsch Attended school (p. 48)

hdgree Highest certificate, diploma or degree (p. 50)

totinc Total income (p. 83)

empin Employment income (p. 84)

(a) Write a Gretl script file that opens the dataset and generates the following variables:

Labour force participation dummy (lf): lf =1 if a labour force participant and 0 if not (use “lfact”)

Six age dummies (equal to 1 if in the relevant age group and 0 otherwise) for the age groups:

18-24, 25-34, 35-44, 45-54, 55-64, 65-69 (use “agegrp”)

(give them readily interpretable names like a1824, a2534 etc.)

Three marital status dummies (equal to 1 if in the group and 0 otherwise), use “marsth” :

Married or common-law (call it ‘married’)

Single (never married) (call it ‘single’)

Other (Widow-divorced-separated) (call it “wds”)

Two presence of children dummies (equal to 1 if had kids in the age groups of interest, 0

otherwise):

Kids age 5 or under (use “pkid0_1” and “pkid2_5”) call it “k05”

Kids age 6-14 (use “pkid6_14”) call it “k614”

A student dummy (=1 if attended school, 0 otherwise), use “attsch” call it “student”.

Five education dummies (use “hdgree”): each dummy equals 1 if in the category 0 otherwise.

No qualification (call it nqual)

High school graduate (call it hs)

Post-secondary but below bachelor’s (trades, apprenticeship, college): hdgree categories

3-8 (call it tradcoll).

Bachelor’s degree (call it bach)

Degree above bachelors level (call it higher)

Non-labour income = Total income – Employment income (use “totinc” and “empin”) call it

“nli”

Use the summary command to report the basic summary statistics of the variables just created

(i.e. “summary” followed by the list of variable names you want information on followed by “–simple” . The simple option will limit output to basic summary statistics like mean, standard deviation, minimum, maximum etc.) . Report the results (you can cut from the Gretl output).

(b) Gretl’s frequency command (freq) will give you the absolute number of observations with each possible value of that variable and the percent share of observations that have those values. Include a command taking the frequency of “lfact”. Based on the output answer the following:

- How many observations were Employed in the reference week? What share of the total did they

represent?

- How many people in this file had never worked? What share were they of the total?

(c) Gretl’s cross-tabulation command can be used to calculate the number or share of observations with combinations of characteristics (a cross-table). For example, add the command:

xtab lf agegrp –column

to your script (where lf is your lf dummy). Notice that the bottom row of the resulting table gives you the the share of observations in a particular age group that have lf=1 (are in the labour force), i.e. the labour force participation rate for that age group. Based on your output which age group has the highest and which has the lowest participation rate? What are the participation rates for these two groups.

(d) “ols” is Gretl’s ordinary least squares regression command. If “Y” was your dependent variable and “X1” and “X2” were your explanatory variables the command would be:

ols Y const X1 X2 where “const” tells Gretl to include an intercept

(constant term) in the equation.

Here the dependent variable is your labour force participant dummy “lf”. The explanatory variables are:

Age dummies (include only five of the six -- leave out 18-24 )*

Marital status dummies (include 2 of the three – leave out “single”)*

Both "presence of children" dummies

The student dummy

Education dummies (include 4 of the five – leave out no qualification)*

Non-labour income

*Recall from the first set of notes that when a set of dummy variables describes a common characteristic you must leave one dummy out of the regression and then interpret the coefficients on the remaining dummies as relative to the omitted group e.g. you are asked to leave out the 18-24 year age dummy so when looking at your output you can interpret the coefficients on the other age dummies as telling you how their outcome differs from that of 18-24 year-olds. This applies to age, education and marital status.