WHERE ARE WE?

Q.1. Since she knows you have taken Economics 101, you have been asked by your manager to evaluate an empirical research proposal prepared by an economist in another division of your company. As a dependent variable, the researcher plans to use the sales of your firm's product (laundry soap) in each of its 20 sales regions. Price differs across regions, so price is one logical explanatory variable. Also, since women typically do more laundry than men, and wealthier people have a higher proportion of clothing that requires dry-cleaning, the researcher also plans to use gender and household income to explain household demand for this product. What will be your main comment(s) about this proposal?

Q.2. If your dependent variable is a (0,1) dummy variable that indicates the category to which an observation belongs, ordinary least squares is still the best way to estimate the average effects of changes in the explanatory variables on category membership. True, False, Uncertain? Explain.

Q.3. Suppose you have supervised twenty different studies (for twenty different firms) of employee sick days. For each firm, you collected individual employee records on sick days taken per year (SICKi) as a function of daily average intake of Vitamin C supplements (VITCi) by that employee. For each firm, you have estimated a model of the following form:

SICKi = b1 + b2 VITCi + i, where i indexes individual employees.

Every one of these twenty different empirical models has shown that the coefficient b2 is negative and strongly statistically significantly different from zero. The empirical evidence is extremely robust across studies. Are you ready to order a press release announcing that taking of Vitamin C should become company policy for any firm that wishes to reduce losses due to employee health problems? Explain.

Q.4 Suppose you use a classroom survey to collect data on average hours of sleep per night (SLEEPi) as a function of age (AGEi). Everybody reports a value for SLEEPi, but 20% of your sample fails to report their ages. Suggest a model that will allow you to use all of the data and to estimate the effect of AGE on SLEEP, conditional on AGE being known, as well as expected SLEEP hours for the group that failed to report their age.

Q.5

a.) GUN CONTROL: Suppose your sample consists of households that have been victimized by robbery. The dependent variable takes a value of 1 if a household member is shot during the robbery and 0 otherwise. One of your explanatory variables is a dummy variable equal to 1 if there is a handgun present in the house, 0 otherwise. When a handgun is present in a household, an occupant of that house is much more likely to be shot in the process of a robbery than when no handgun is present. Therefore, to minimize injury and loss of life from robbery incidents, private ownership of handguns should be banned. Evaluate this policy proposal and the "evidence" upon which it is premised. Briefly describe the nature of the true "experiment" that would allow an unambiguous determination of the effect of handgun presence on robbery shootings via a regression like this

b.) LEGALIZATION OF MARIJUANA: Suppose you have a random sample of at-risk 18-year-olds. The dependent variable is the number of times each teenager has used heroin. Among the explanatory variables is a dummy variable that takes a value of 1 if the subject experimented with marijuana prior to age 13, and 0 otherwise. You find that the coefficient on this dummy variable is positive and strongly statistically significant. Therefore, we should not legalize marijuana use (which would make it much more accessible to pre-teens) since this will lead to widespread use of heroin. Evaluate this policy proposal and the "evidence" upon which it is premised. Briefly describe the nature of the true "experiment" that would allow an unambiguous determination of the effect of pre-teen marijuana use on subsequent heroin use via a regression like this.

Q.6.Assume your dependent variable takes on a value of 1 if a high-school student is affiliated with a gang and zero otherwise. Among your explanatory variables are included: family income level, GPA in school, dummy variables for father present in household and mother present in household, eligibility for after-school programs, educational attainment of each parent, etc. What sort of estimation method would you probably choose to determine empirically the effect of after-school program eligibility on gang affiliation? How would you interpret the results? Are there any caveats you might add concerning this single-equation model

Q.7 Suppose you are working with individual household survey data. If you do not have data at the individual household level for one of your explanatory variables, you might be able to use group averages as a proxy for this variable (e.g. 5-digit zip code median household income instead of individual household incomes for a nation-wide sample). To the extent that the groups you use are relatively homogeneous, the proxies may be very useful in mitigating what would otherwise be omitted variables bias. The same strategy is appropriate if you do not have any individual data for your desired dependent variable. True, False, Uncertain? Explain, suggesting the best alternative if you disagree.

Q.7Multicollinearity among the regressors can lead to problems in making clear inferences about the effects of changes in individual explanatory variables only in Ordinary Least Squares models. It is not a concern in fundamentally nonlinear estimation methods such as probit or logit models. True, False, Uncertain? Explain.

Q.8 If you estimate a regression model and get a counter-intuitive sign on a slope coefficient, what sort of problem(s) do you initially suspect? Explain.

Q.9 How many questions, including this one, were you able to answer correctly with little or no difficulty?