Sociology 709, Spring 2008
Study guide for the midterm
Remember, if you fail the midterm the first time, I will let you take a makeup.
There will be no extra time on the exam, which will run during the normal class time, 12:30-1:45. I will curve the exam upward if the average score is low.
You should bring a calculator. You may not bring notes.
I will provide:
1) A table of the pdf and cdf of the normal distribution (or t-distribution, if relevant).
2) Any equations that you will need that are not listed below.
Things to memorize:
You may be asked to make calculations using these formulas, which you will need to memorize. If there are any other
calculations that involve formulas, I will give you the formulas.
Fixed effects:
The equation for a confidence interval and a t-test.
Calculate the probability of a normal variable with mean x and standard deviation y falling in the interval (a,b)
How do you estimate the variance of the error term? What does the error term “mean”?
Things to review:
Why should we worry about the shorthand use of “effect” to describe the result of a statistical model being taken too literally?
What is the difference between experimental and non-experimental data? Why is randomization key?
Why would you want to use a “double-blind” test of a treatment effect?
Is multiple regression good as an experiment?
Let’s take Allison’s example of the effect of an SAT training class on SAT scores. How could we use OLS to find an answer?
What if someone says “you should have controlled for grades”?
…think of another possible thing to control for…and another…and another…and another
The standard error of B is given by
Why does the equation for SE(B) make sense?
Be able to construct the x% (i.e. 95% or some other number) confidence interval for B and test B at the y-level of significance (i.e. .05 or some other number)
Explain in words what we are doing when we test the null hypothesis that B=0.
(I will give you this formula if I ask questions about it)
this is similar to the equation for simple regression, except that the variance is inflated by
Why does it make sense that the variance is inflated by ? What happens if the independent variables are strongly correlated with each other?
Given this formula for the coefficients in multiple regression,
Explain in words how you would calculate the coefficients. When you come to technical terms involving matrix algebra (i.e., transpose, inverse matrix etc.) briefly explain what those terms mean.
If we have a normal variable X with mean 0 and standard deviation 2 (i.e., X~N(0,2) ) , what is the probability that |X| (i.e., the absolute value of X, or the “magnitude” X) will be greater than 1?
If X~N(0,5) what is the probability that |X|>10?
In a regression of income on education with 2000 cases, the estimated coefficient, B, is 4 with standard error 3.
The null hypothesis, H0, is that the actual coefficient is 0.
If H0 were true, what is the chance that |B| >4? In other words, what is the chance of observing a coefficient at least as big in magnitude as the one we estimated?
In a regression of income on education with 2000 cases, the estimated coefficient, B, is 6 with standard error 3.
Test H0: =0
Describe in words where, in simple regression, the standard error of B comes from.
. xi: reg happy i.type*i.size sugar
i.type _Itype_1-4 (naturally coded; _Itype_1 omitted)
i.size _Isize_0-1 (naturally coded; _Isize_0 omitted)
i.type*i.size _ItypXsiz_#_# (coded as above)
Source | SS df MS Number of obs = 2000
------+------F( 8, 1991) = 1042.28
Model | 852077.437 8 106509.68 Prob > F = 0.0000
Residual | 203457.573 1991 102.188635 R-squared = 0.8072
------+------Adj R-squared = 0.8065
Total | 1055535.01 1999 528.031521 Root MSE = 10.109
------
happy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
_Itype_2 | 9.937232 .8944368 11.11 0.000 8.183102 11.69136
_Itype_3 | 19.45203 .9320253 20.87 0.000 17.62419 21.27988
_Itype_4 | 28.51886 .9173906 31.09 0.000 26.71971 30.318
_Isize_1 | 9.359119 .9046301 10.35 0.000 7.584998 11.13324
_ItypXsi~2_1 | 6.397666 1.281947 4.99 0.000 3.883569 8.911764
_ItypXsi~3_1 | 30.98568 1.281288 24.18 0.000 28.47287 33.49848
_ItypXsi~4_1 | 12.75011 1.27919 9.97 0.000 10.24142 15.2588
sugar | .2162254 .0077933 27.75 0.000 .2009415 .2315092
_cons | 9.835219 .7751546 12.69 0.000 8.31502 11.35542
------
Type: 1 "red" 2 "green" 3 "purple" 4 "blue"
Size: 0 = small, 1= big
Sugar: 0 to 100, continuous variable
* Q: What does the coefficient on _Isize_1 indicate?
* Q: What does the coefficient _ItypXsi~3_1 indicate?
* Q: What is the effect of size for blue jellybeans using these regression results?
For the type*sugar interaction model, what is the effect of sugar for green jellybeans?
let’s see what the regression results look like:
(1) / (2) / (3)happy / Happy / happy
type==2 / 22.789 / 21.649 / 0.502
(3.709)** / (3.713)** / (0.635)
type==3 / 35.012 / 32.732 / 0.298
(3.709)** / (3.755)** / (0.649)
type==4 / 57.948 / 54.024 / 0.671
(3.709)** / (3.863)** / (0.687)
size==1 / 13.258 / 9.581
(3.764)** / (0.639)**
Sugar / 1.994
(0.008)**
Constant / 130.511 / 121.178 / 0.712
(2.623)** / (3.723)** / (0.784)
Observations / 2000 / 2000 / 2000
R-squared / 0.11 / 0.12 / 0.97
Standard errors in parentheses
* significant at 5%; ** significant at 1%
*Q: What happens when we go from model 1 to model 3? Why?
What does Graph 1 indicate? Why?
. * Model 4
. xi: reg wage_re i.re*i.sex [w=weight]
i.re _Ire_1-4 (naturally coded; _Ire_1 omitted)
i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted)
i.re*i.sex _IreXsex_#_# (coded as above)
(analytic weights assumed)
(sum of wgt is 2.2412e+08)
Source | SS df MS Number of obs = 108288
------+------F( 7,108280) = 992.39
Model | 922855.374 7 131836.482 Prob > F = 0.0000
Residual | 14384670.8108280 132.846979 R-squared = 0.0603
------+------Adj R-squared = 0.0602
Total | 15307526.2108287 141.360701 Root MSE = 11.526
------
wage_re | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------+------
_Ire_2 | -5.150555 .1787742 -28.81 0.000 -5.50095 -4.800161
_Ire_3 | 2.32965 .2482191 9.39 0.000 1.843144 2.816156
_Ire_4 | -7.976564 .1542988 -51.70 0.000 -8.278987 -7.674141
_Isex_2 | -4.229605 .0798763 -52.95 0.000 -4.386161 -4.073048
_IreXsex_2_2 | 2.601923 .2412023 10.79 0.000 2.12917 3.074676
_IreXsex_3_2 | .4760114 .3585717 1.33 0.184 -.2267841 1.178807
_IreXsex_4_2 | 2.988753 .2490093 12.00 0.000 2.500699 3.476808
_cons | 20.58432 .055977 367.73 0.000 20.47461 20.69403
------
What doe we learn from the following models (in the exam I will give you the Stata output)
(re is the categorical variable for race and ethnicity)
Model 1:
xi: reg wage i.sex
Model 2:
xi: reg wage i.sex i.re
Model 4:
xi: reg wage i.sex*i.re
In model 4, what is the gender gap among Asian Americans? Is it different than the gender gap among whites? How would we test whether the gender gap
among Asians was = 0?
What is the predicted hourly wage for Hispanic females?
Lecture 4: analysis of variance. What does the r-squared mean?
Lecture 7: Gauss-Markov theorem, the central limit theorem, and the large-sample properties of the OLS model….why is it important?
Lecture 8: What is a fixed effects model? When might it be useful? Explain using an example and an equation. I.e., the effect of education on wages or family fixed effects and teenage pregnancy. [I want you to be able to write out the equation at the beginning of lecture 9]
1