SC706: Longitudinal Data Analysis
Instructor: Natasha Sarkisian
Class notes: Categorical Dependent Variables
So far we’ve only dealt with continuous dependent variables, but panel data analysis is often interested in examining categorical variables as well. Your dependent variable can be dichotomous (0/1), categorical with multiple unordered categories, ordinal, or count variable. In such cases, linear models are inappropriate as there are no restrictions on the predicted values of outcome, the random effect (i.e. level 1 residual) cannot be normally distributed, and cannot have homogenous variance (the variance depends on the predicted value). Like in non-longitudinal models, analysis for such variables is accomplished by specifying a link function that transforms the dependent variable so that the predicted values are constrained to be within a specific interval. Specifically, we use logit models for dichotomous variables, multinomial logit for categorical with unordered categories, ordered logit for ordinal variables, and Poisson models for count variables.
Two-wave Datasets
Variable type / Lagged DV / Difference score / First difference / Cross-laggedContinuous / reg / reg / reg / sem
Dichotomous / logit / mlogit (change categories: 00, 11, 01, 10) / mlogit (change categories: 0 0, 11, 01, 10) / biprobit, gsem (logit, probit options)
Ordinal / ologit / reg / reg / bioprobit, gsem (ologit. oprobit options)
Nominal / mlogit / mlogit (1, 2, 3 = 11, 1 2, 13, 21, 22, 23, 31, 32, 33) / mlogit (1, 2, 3 = 11, 1 2, 13, 21, 22, 23, 31, 32, 33) / gsem (mlogit option)
Count / poisson, nbreg, zip, zinb / reg / reg / gsem (poisson, nbreg options)
Everything besides cross-lagged model is estimated using regular regression commands. But for mlogit, you need to create transition categories. For example:
. use hrs_hours.dta, clear
. tab r1poorhealth r2poorhealth
Poor | Poor health
health | 0 1 | Total
------+------+------
0 | 4,445 420 | 4,865
1 | 314 785 | 1,099
------+------+------
Total | 4,759 1,205 | 5,964
. egen health=group(r1poorhealth r2poorhealth)
(627 missing values generated)
. tab health
group(r1poo |
rhealth |
r2poorhealt |
h) | Freq. Percent Cum.
------+------
1 | 4,445 74.53 74.53
2 | 420 7.04 81.57
3 | 314 5.26 86.84
4 | 785 13.16 100.00
------+------
Total | 5,964 100.00
. mlogit health r1workhours80 r1married r1totalpar r1siblog h1childlg
Multinomial logistic regression Number of obs = 5558
LR chi2(15) = 723.19
Prob > chi2 = 0.0000
Log likelihood = -4209.2727 Pseudo R2 = 0.0791
------
health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
1 | (base outcome)
------+------
2 |
r1workhours80 | -.0130358 .0024295 -5.37 0.000 -.0177977 -.008274
r1married | -.5597468 .1395123 -4.01 0.000 -.8331859 -.2863078
r1totalpar | -.1197857 .0726706 -1.65 0.099 -.2622176 .0226461
r1siblog | .4067526 .0921614 4.41 0.000 .2261197 .5873856
h1childlg | .284537 .1008141 2.82 0.005 .0869451 .4821289
_cons | -2.343312 .2260522 -10.37 0.000 -2.786366 -1.900258
------+------
3 |
r1workhours80 | -.0180743 .0027839 -6.49 0.000 -.0235306 -.0126179
r1married | -.6254326 .1553243 -4.03 0.000 -.9298627 -.3210026
r1totalpar | -.2012291 .0870588 -2.31 0.021 -.3718613 -.0305969
r1siblog | .3246466 .104158 3.12 0.002 .1205006 .5287925
h1childlg | .1905984 .1145384 1.66 0.096 -.0338927 .4150895
_cons | -2.079726 .2496431 -8.33 0.000 -2.569017 -1.590435
------+------
4 |
r1workhours80 | -.0421077 .0020845 -20.20 0.000 -.0461933 -.0380222
r1married | -1.023912 .105026 -9.75 0.000 -1.229759 -.8180651
r1totalpar | -.1763883 .0614712 -2.87 0.004 -.2968697 -.0559069
r1siblog | .5245884 .0728385 7.20 0.000 .3818276 .6673492
h1childlg | .2207622 .0785831 2.81 0.005 .0667421 .3747823
_cons | -.7924487 .1702211 -4.66 0.000 -1.126076 -.4588215
------
For cross-lagged model with dichotomous dependent variables:
. biprobit ( r2poorhealth r1poorhealth r1married r1workhours80 r1totalpar) ( r2marriedr1married r1poorhealth r1workhours80 r1totalpar)
Seemingly unrelated bivariate probit Number of obs = 5880
Wald chi2(8) = 4194.46
Log likelihood = -2963.1838 Prob > chi2 = 0.0000
------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
r2poorhealth |
r1poorhealth | 1.807378 .0496701 36.39 0.000 1.710026 1.90473
r1married | -.1549207 .0539901 -2.87 0.004 -.2607394 -.049102
r1workhours80 | -.0091585 .0010034 -9.13 0.000 -.0111251 -.0071919
r1totalpar | -.0330402 .030248 -1.09 0.275 -.0923252 .0262447
_cons | -.906089 .0672771 -13.47 0.000 -1.03795 -.7742284
------+------
r2married |
r1married | 3.291951 .0673929 48.85 0.000 3.159864 3.424039
r1poorhealth | -.1818643 .077882 -2.34 0.020 -.3345102 -.0292183
r1workhours80 | .0011923 .0014246 0.84 0.403 -.0015999 .0039844
r1totalpar | .0707012 .0413782 1.71 0.088 -.0103986 .151801
_cons | -1.559129 .0907986 -17.17 0.000 -1.737091 -1.381167
------+------
/athrho | -.0035238 .0515823 -0.07 0.946 -.1046233 .0975757
------+------
rho | -.0035238 .0515817 -.1042432 .0972672
------
Likelihood-ratio test of rho=0: chi2(1) = .004666 Prob > chi2 = 0.9455
In this case, rho is the correlation between the residuals of two regressions (non-significant here), and athrho is the Fisher's Z transformation (the arc-hyperbolic tangent) of that correlation.
. test [r2poorhealth]r1married=[r2married]r1poorhealth
( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0
chi2( 1) = 0.08
Prob > chi2 = 0.7762
Another way, using gsem – but note that correlated residuals don’t work when it is estimated with logit etc.:
. gsem ( r2poorhealth <- r1poorhealth r1married r1workhours80 r1totalpar) ( r2married <- r1married r1poorhealth r1workhours80 r1totalpar), logit
Generalized structural equation model Number of obs = 5882
Log likelihood = -2962.1503
------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
r2poorhealth <- |
r1poorhealth | 3.06475 .0869457 35.25 0.000 2.894339 3.23516
r1married | -.2836505 .1005597 -2.82 0.005 -.4807439 -.0865571
r1workhours80 | -.0172547 .001859 -9.28 0.000 -.0208983 -.0136111
r1totalpar | -.0554403 .0576515 -0.96 0.336 -.1684352 .0575547
_cons | -1.52599 .1248862 -12.22 0.000 -1.770762 -1.281217
------+------
r2married <- |
r1married | 5.922521 .1436862 41.22 0.000 5.640901 6.204141
r1poorhealth | -.3999421 .1703288 -2.35 0.019 -.7337804 -.0661037
r1workhours80 | .0030176 .0031457 0.96 0.337 -.0031479 .0091831
r1totalpar | .1628322 .0946266 1.72 0.085 -.0226325 .3482968
_cons | -2.795544 .1991486 -14.04 0.000 -3.185868 -2.40522
------
. test [r2poorhealth]r1married=[r2married]r1poorhealth
( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0
chi2( 1) = 0.35
Prob > chi2 = 0.5566
Here’s another example of a cross-lagged model in gsem but now using Poisson models:
. gsem ( r2workhours <- r1allparhelptw r1poorhealth r1married r1workhours80 r1totalpar) ( r2allparhelptw <- r1allparhelptw r1married r1poorhealth r1workhours80 r1totalpar), family(poisson) link(log)
note: r2allparhelptw has noncount values;
you are responsible for the family(poisson) interpretation
Generalized structural equation model Number of obs = 5803
Log likelihood = -69816.414
------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
r2workhours80 <- |
r1allparhelptw | -.0108981 .0010395 -10.48 0.000 -.0129354 -.0088608
r1poorhealth | -.3278578 .0085658 -38.28 0.000 -.3446465 -.311069
r1married | .0185784 .0067215 2.76 0.006 .0054044 .0317523
r1workhours80 | .0283919 .0001318 215.49 0.000 .0281337 .0286501
r1totalpar | .0469421 .0031718 14.80 0.000 .0407255 .0531587
_cons | 2.218248 .0092682 239.34 0.000 2.200083 2.236413
------+------
r2allparhelptw <- |
r1allparhelptw | .0898139 .0020298 44.25 0.000 .0858355 .0937923
r1married | -.407909 .0278715 -14.64 0.000 -.4625362 -.3532818
r1poorhealth | -.1778003 .0326566 -5.44 0.000 -.2418061 -.1137945
r1workhours80 | -.0018228 .0005492 -3.32 0.001 -.0028991 -.0007465
r1totalpar | .1536927 .0152773 10.06 0.000 .1237499 .1836356
_cons | .2722775 .036161 7.53 0.000 .2014032 .3431518
------
Now let’s try ordered logit; need to generate two ordinal variables to use:
. recode r1workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r1workhours4)
(4675 differences between r1workhours80 and r1workhours4)
. recode r2workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r2workhours4)
(3935 differences between r2workhours80 and r2workhours4)
. recode r1allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r1allparhelptw3)
(441 differences between r1allparhelptw and r1allparhelptw3)
. recode r2allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r2allparhelptw3)
(1179 differences between r2allparhelptw and r2allparhelptw3)
. net search bioprobit
Install from: bioprobit from http://fmwww.bc.edu/RePEc/bocode/b
. bioprobit (r2allparhelptw3 r1workhours4 r1married r1poorhealth r1totalpar) (r2workhours4 r1allparhelptw3 r1married r1poorhealth r1totalpar)
group(r2wor |
khours4) | Freq. Percent Cum.
------+------
1 | 1,914 33.93 33.93
2 | 603 10.69 44.62
3 | 2,432 43.11 87.73
4 | 692 12.27 100.00
------+------
Total | 5,641 100.00
Seemingly unrelated bivariate ordered probit regression
Number of obs = 5641
Wald chi2(4) = 52.69
Log likelihood = -12884.726 Prob > chi2 = 0.0000
------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
r2allparhe~3 |
r1workhours4 | -.007293 .0228304 -0.32 0.749 -.0520398 .0374538
r1married | -.205262 .0432182 -4.75 0.000 -.2899682 -.1205559
r1poorhealth | -.1397515 .0480594 -2.91 0.004 -.2339463 -.0455568
r1totalpar | .1228002 .0221615 5.54 0.000 .0793644 .166236
------+------
r2workhours4 |
r1allparhe~3 | -.1803361 .0368463 -4.89 0.000 -.2525535 -.1081186
r1married | .0423012 .0389766 1.09 0.278 -.0340916 .118694
r1poorhealth | -.8130412 .0412363 -19.72 0.000 -.8938628 -.7322195
r1totalpar | .1512924 .0196722 7.69 0.000 .1127356 .1898492
------+------
athrho |
_cons | -.026925 .025833 -1.04 0.297 -.0775568 .0237068
------+------
/cut11 | .5303018 .0591844 .4143026 .6463011
/cut12 | .5308175 .0591851 .4148169 .6468181
/cut13 | .5328816 .0591878 .4168757 .6488876
/cut14 | .5349478 .0591905 .4189365 .6509591
/cut15 | .5370161 .0591932 .4209995 .6530327
/cut16 | .5432348 .0592011 .4272027 .6592669
/cut17 | .5458319 .0592044 .4297934 .6618703
/cut18 | .5468716 .0592057 .4308306 .6629127
/cut19 | .5473917 .0592063 .4313494 .663434
/cut110 | .5635893 .0592264 .4475077 .6796709
/cut111 | .5699004 .0592341 .4538037 .6859971
/cut112 | .5762354 .0592418 .4601235 .6923473
/cut113 | .5767644 .0592425 .4606512 .6928776
/cut114 | .6050495 .0592784 .4888661 .721233
/cut115 | .6093619 .0592839 .4931676 .7255562
/cut116 | .6267269 .0593053 .5104907 .7429631
/cut117 | .6272727 .0593059 .5110352 .7435101
/cut118 | .6278186 .0593066 .5115799 .7440573
/cut119 | .6283647 .0593072 .5121247 .7446047
/cut120 | .6415286 .0593231 .5252574 .7577997
/cut121 | .6453889 .0593277 .5291087 .7616692
/cut122 | .6459412 .0593284 .5296597 .7622227
/cut123 | .6659519 .0593511 .5496259 .7822778
/cut124 | .6665114 .0593517 .5501843 .7828386
/cut125 | .6698733 .0593552 .5535392 .7862074
/cut126 | .6704343 .0593558 .554099 .7867696
/cut127 | .6726802 .0593581 .5563404 .78902
/cut128 | .6732422 .0593587 .5569013 .7895831
/cut129 | .8039208 .0594996 .6873036 .9205379
/cut130 | .8063982 .0595025 .6897755 .9230209
/cut131 | 1.45921 .0612874 1.339089 1.579332
/cut21 | -.3239231 .0452237 -.4125599 -.2352863
/cut22 | -.025728 .0450239 -.1139733 .0625172
/cut23 | 1.334024 .0477328 1.24047 1.427579
------+------
rho | -.0269185 .0258143 -.0774017 .0237023
------
LR test of indep. eqns. : chi2(1) = 1.09 Prob > chi2 = 0.2973
Multiwave Panel Data
Types of models and commands:
Dependent Variable Type / Fixed effects (FE) / Random effects (RE) / Mixed effects (ME)Continuous / xtreg, fe / xtreg, re / mixed
Dichotomous / xtlogit, fe / xtlogit, re / melogit
Ordinal / Not recommended / meologit / meologit
Nominal / Not recommended / Gsem, mlogit option / Outside Stata: HLM 7
Count / xtpoisson, fe
xtnbreg, fe / xtpoisson, re
xtnbreg, re / mepoisson, menbreg
Dichotomous DV
We’ll use poorhealth variable in the hrs_hours_reshaped.dta file as the outcome for this example. To specify that the dependent variable is binary, we use xtlogit command, first specifying FE and then RE model (Stata also offers an xtprobit command if you prefer probit models):
. use hrs_hours_reshaped.dta, clear
. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw, fe
note: multiple positive outcomes within groups encountered.
note: 4409 groups (19766 obs) dropped because of all positive or
all negative outcomes.
Conditional fixed-effects logistic regression Number of obs = 10780
Group variable: hhidpn Number of groups = 1837
Obs per group: min = 2
avg = 5.9
max = 9
LR chi2(6) = 480.51
Log likelihood = -3898.61 Prob > chi2 = 0.0000
------
rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | -.0248479 .0015259 -16.28 0.000 -.0278387 -.0218571
rmarried | -.0069076 .129063 -0.05 0.957 -.2598664 .2460511
rtotalpar | -.2935712 .0407944 -7.20 0.000 -.3735267 -.2136156
rsiblog | -.3552165 .16598 -2.14 0.032 -.6805313 -.0299018
hchildlg | .4488363 .1593514 2.82 0.005 .1365133 .7611594
rallparhel~w | .010376 .0064248 1.61 0.106 -.0022164 .0229685
------
Note that those who experienced no change in health were omitted from this analysis. Also note that FE model doesn’t distinguish between two different stable health conditions (i.e., 0 0 and 1 1) and considers 0 1 and 1 0 as producing the same size effects in opposite direction.
Random effects:
. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re
Random-effects logistic regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 4.9
max = 9
Integration method: mvaghermite Integration points = 12
Wald chi2(10) = 1550.47
Log likelihood = -11019.627 Prob > chi2 = 0.0000
------
rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | -.0353336 .0013598 -25.99 0.000 -.0379987 -.0326685
rmarried | -.5402026 .0898874 -6.01 0.000 -.7163786 -.3640265
rtotalpar | -.1603994 .0359967 -4.46 0.000 -.2309517 -.0898472
rsiblog | -.0966548 .0698343 -1.38 0.166 -.2335276 .040218
hchildlg | .2094428 .0730742 2.87 0.004 .06622 .3526656
rallparhelptw | .0021523 .0059423 0.36 0.717 -.0094943 .0137989
female | -.6397422 .0902022 -7.09 0.000 -.8165353 -.4629491
raedyrs | -.3337791 .0157692 -21.17 0.000 -.3646861 -.3028721
age | -.0363513 .0144833 -2.51 0.012 -.064738 -.0079646
minority | 1.122804 .1038392 10.81 0.000 .9192827 1.326325
_cons | 4.836503 .866618 5.58 0.000 3.137963 6.535043
------+------
/lnsig2u | 1.921837 .046 1.831679 2.011995
------+------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000
You can evaluate the assumptions of RE model using Hausman test the same way you did for regular RE models; however, do not specify “sigmamore” option in hausman command.
Interpreting coefficients
The interpretation of coefficients is very similar to the interpretation of the results of logistic regression. Interpreting coefficients themselves allows us to discuss the direction and significance of effects, but not their size. To talk about the size, we use odds ratios. Odds are ratios of two probabilities – probability of a positive outcome and a probability of a negative outcome (e.g. probability of voting divided by a probability of not voting). But since probabilities vary depending on values of X, such a ratio varies as well. What remains constant is the ratio of such odds – e.g. odds of poor health for a female divided by odds of poor health for a mlae will be the same number regardless of the values of other variables on the model.
Odds ratios are exponentiated logistic regression coefficients. They are sometimes called factor coefficients, because they are multiplicative coefficients. Odds ratios are equal to 1 if there is no effect, smaller than 1 if the effect is negative and larger than 1 if it is positive. To get them in xtlogit command, use OR option:
. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or
Random-effects logistic regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 4.9
max = 9
Wald chi2(10) = 1550.47
Log likelihood = -11019.627 Prob > chi2 = 0.0000
------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593
rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728
rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709
rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038
hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855
rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895
female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247
raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936
age | .9643015 .0139663 -2.51 0.012 .937313 .992067
minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172
------+------
/lnsig2u | 1.921837 .046 1.831679 2.011995
------+------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000
So for example, the odds ratio for .53 for females indicates that the odds of reporting poor health for males are about half those for males –or we can say 47% lower. To get percent change, we subtract 1 from the odds ratio, and then multiply the result by 100.
Beware: if you would like to know what the increase would be per, say, 10 units increase in the independent variable – e.g. 10 years of age or education, you cannot simply multiple the odds ratio by 10! The coefficient, in fact, would be odds ratio to the power of 10. Or alternatively, you could take the regular logit coefficient, multiply it by 10 and then exponentiate it.
In addition, since odds ratios are multiplicative coefficients, when you want to interpret, for example, an interaction term, you would have to multiply rather than add the odds ratio numbers. Alternatively, you can add the numbers presented in the coefficient column and then exponentiate the result.
In addition to using odds ratios, we can use predicted probabilities (P) to interpret our results. We can get them by calculating predicted logits (L) and then recalculating them into probabilities. Since L=log(odds)=log(P/(1-P)), thenP=eL/(1+eL).
Luckily, we do not have to do that by hand – we can use predict command, adjust command, or margins comand. Predict command allows us to estimate predicted probabilities for the actual observations in the data. The options can be used with predict, margins, or adjust after xtlogit:
xb calculates the linear prediction. This is the default for the random-effects model.
stdp calculates the standard error of the linear prediction.
pc1 calculates the predicted probability of a positive outcome conditional on one positive outcome within group. This is the default for the fixed-effects model.
pu0 calculates the probability of a positive outcome, assuming that the fixed or random effect for that observation's panel is zero. This may not be similar to the proportion of observed outcomes in the group.
We can also use adjust or margins commands to calculate predicted probabilities for some hypothetical, strategically selected cases and to construct graphs:
. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)
. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)
Variables that uniquely identify margins: rworkhours80 rmarried
Interactions
Note that interactions as a method to compare two or more groups can be problematic in logit models because the coefficients are scaled according to the differences in residual dispersion. So it is not as appropriate to rely on the significance test of the interaction term to establish whether some process differs by group. The best approach to establish whether the two groups do differ is to examine differences in predicted probabilities. You would have to decide which values to assign to the rest of the variables in your model; the default is to use the mean.
. xtlogit rpoorhealth c.rworkhours80##i.rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or
Random-effects logistic regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 4.9
max = 9
Integration method: mvaghermite Integration points = 12
Wald chi2(11) = 1561.99
Log likelihood = -11012.155 Prob > chi2 = 0.0000
------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | .9556948 .0028358 -15.27 0.000 .9501529 .9612689
1.rmarried | .4733899 .0497145 -7.12 0.000 .3853251 .5815815
|
rmarried#c.rworkhours80 |
1 | 1.012447 .0032671 3.83 0.000 1.006064 1.018871
|
rtotalpar | .8485704 .0305472 -4.56 0.000 .7907624 .9106044
rsiblog | .908552 .0634898 -1.37 0.170 .7922599 1.041914
hchildlg | 1.231178 .0900109 2.84 0.004 1.066817 1.420862
rallparhelptw | 1.00155 .0059641 0.26 0.795 .9899281 1.013307
female | .5349173 .0482853 -6.93 0.000 .4481788 .6384427
raedyrs | .7170131 .0113045 -21.10 0.000 .6951955 .7395154
age | .9638097 .0139562 -2.55 0.011 .9368406 .9915553
minority | 3.054056 .3171545 10.75 0.000 2.491623 3.743447
_cons | 150.9996 131.0543 5.78 0.000 27.55549 827.453
------+------
/lnsig2u | 1.920336 .045987 1.830203 2.010468
------+------
sigma_u | 2.612135 .0600621 2.497028 2.732547
rho | .6746929 .0100933 .6546077 .6941559
------
Likelihood-ratio test of rho=0: chibar2(01) = 5472.85 Prob >= chibar2 = 0.000
. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)
. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)
Variables that uniquely identify margins: rworkhours80 rmarried
You can also create confidence intervals; for example:
. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)
. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) recastci(rarea) ciopts(fintensity(5))
We can also more explicitly graph the difference between the two groups – let’s try at a few levels of other variables:
. qui margins, dydx(rmarried) at(rworkhours80=(0(10)80) female=(0/1) minority=(0/1)) predict(pu0)
. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) noci
Variables that uniquely identify margins: rworkhours80 female minority
Variables that uniquely identify margins: rworkhours80 rmarried
For more detail on doing these graphical comparisons, see Scott Long’s article at:
http://www.indiana.edu/~jslsoc/files_research/groupdif/groupwithprobabilities/groups-with-prob-2009-06-25.pdf
Variance components
Note that the variance component does not contain an estimate of level 1 variance. That is because in logistic regression models, it is not possible to estimate both the coefficients and the error variance; therefore, in all logistic regression models, the error variance is always fixed to the same number which is 2/3 = 3.29. That rule also applies to multilevel models, but only to their level 1 residuals. Knowing thismeans that we can calculate the intraclass correlation coefficient or the proportion of variance explained. For both, we can follow the procedures described on pp.224-227 of the Snijders and Bosker chapter on dichotomous outcomes. For instance, the ICC would be calculated as
where tau zero squared is variance of individual-level residuals U. And the proportion of variance explained can be calculated as
Note that in addition to the variance of individual-level residuals U denoted 0 and level 1 variance 2R = 3.29, we need to know the variance of fitted values 2F. That refers to the variance of linear predictions, which are the values that results if we multiply our coefficients by our variable values and add up these products. That is, we are talking about the predicted values of logits. To obtain the variance of fitted values, we can use predict with xb option:
. qui xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or
Random-effects logistic regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 4.9
max = 9
Wald chi2(10) = 1550.47
Log likelihood = -11019.627 Prob > chi2 = 0.0000
------
rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593
rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728
rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709
rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038
hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855
rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895
female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247
raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936
age | .9643015 .0139663 -2.51 0.012 .937313 .992067
minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172
------+------
/lnsig2u | 1.921837 .046 1.831679 2.011995
------+------
sigma_u | 2.614096 .0601242 2.498872 2.734634
rho | .6750224 .0100909 .6549413 .6944799
------
Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000
. predict xb, xb
(24127 missing values generated)
. sum xb if e(sample)
Variable | Obs Mean Std. Dev. Min Max
------+------
xb | 30541 -2.724561 1.63337 -7.265207 3.923255
Rho:
. di 2.614096^2/(2.614096^2+c(pi)^2/3)
.67502231
R-squared:
. di 1.633347^2/(1.633347^2+2.614096^2+3.29)
.20856505
Note that such R squared values are pseudo-R squared and are typically lower than values we are used to with OLS because 2R is a fixed number.
Obtaining residuals after xtlogit is not possible with predict command, as we saw above. We can do it, however, by reestimating RE model using mixed effects logit syntax:
. melogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelpt
> w female raedyrs age minority || hhidpn: , or
Mixed-effects logistic regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Obs per group: min = 1
avg = 4.9
max = 9
Integration method: mvaghermite Integration points = 7
Wald chi2(10) = 1557.22
Log likelihood = -11008.605 Prob > chi2 = 0.0000
------
rpoorhealth | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------
rworkhours80 | .965261 .0013117 -26.02 0.000 .9626936 .9678352
rmarried | .5875815 .052345 -5.97 0.000 .4934447 .6996773
rtotalpar | .847142 .0305427 -4.60 0.000 .7893455 .9091704
rsiblog | .9013827 .0618536 -1.51 0.130 .7879507 1.031144
hchildlg | 1.238295 .0890782 2.97 0.003 1.075455 1.425793
rallparhelptw | 1.002406 .0059541 0.40 0.686 .990804 1.014144
female | .5267198 .0463214 -7.29 0.000 .4433251 .6258021
raedyrs | .7136284 .0111674 -21.56 0.000 .6920729 .7358553
age | .9635443 .0135893 -2.63 0.008 .9372744 .9905504
minority | 3.124398 .3184031 11.18 0.000 2.558713 3.815146
_cons | 132.5952 112.0297 5.78 0.000 25.31322 694.5572
------+------
hhidpn |
var(_cons)| 7.442719 .3733355 6.745813 8.211621
------
LR test vs. logistic regression: chibar2(01) = 5498.85 Prob>=chibar2 = 0.0000
After that we can use predict command to get a range of residuals, like for xtmixed. For example, you can get option reffects to get random effects (both random intercepts and random slopes, if you decide to introduce them). You can also use xb option to get fitted values based on coefficients only (random effects not included), and you can get three types of overall residuals:
pearson calculates Pearson residuals. Pearson residuals large inabsolute value may indicate a lack of fit. By default, residualsinclude both the fixed portion and the random portion of the model.The fixedonly option modifies the calculation to include the fixed portion only.
deviance calculates deviance residuals. Deviance residuals are recommended by McCullagh and Nelder (1989) as having the best properties for examining the goodness of fit of a GLM. They are approximately normally distributed if the model is correctlyspecified. They may be plotted against the fitted values or against a covariate to inspect the model's fit. By default, residualsinclude both the fixed portion and the random portion of the model. The fixedonly option modifies the calculation to include the fixed portion only.
anscombe calculates Anscombe residuals, residuals that are designed to closely follow a normal distribution. By default, residuals includeboth the fixed portion and the random portion of the model. The fixedonly option modifies the calculation to include the fixed portion only.
Ordered Logit
Much of what we discussed applies to ordered logit models. To better understand interpretation of coefficients in ordered logit, you should review my SC704 notes for that topic.
. recode rworkhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(rworkhours4)
(23290 differences between rworkhours80 and rworkhours4)
. meologit rworkhours4 rpoorhealth rmarried rtotalpar rsiblog hchildlg rallparhelpt
> w female raedyrs age minority || hhidpn: , or
Mixed-effects ologit regression Number of obs = 30541
Group variable: hhidpn Number of groups = 6243
Obs per group: min = 1
avg = 4.9
max = 9
Integration method: mvaghermite Integration points = 7
Wald chi2(10) = 3191.62
Log likelihood = -29982.693 Prob > chi2 = 0.0000
------
rworkhours4 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------
rpoorhealth | .2844605 .0135639 -26.36 0.000 .2590801 .3123272
rmarried | .7912272 .0501324 -3.70 0.000 .6988256 .8958465
rtotalpar | 2.424358 .0562904 38.14 0.000 2.316504 2.537234
rsiblog | 1.100873 .0558983 1.89 0.058 .9965895 1.216069
hchildlg | .8502 .045471 -3.03 0.002 .7655905 .9441601
rallparhelptw | .9483347 .0037667 -13.36 0.000 .9409809 .955746
female | .2667214 .0174542 -20.19 0.000 .2346148 .3032217
raedyrs | 1.113577 .0126085 9.50 0.000 1.089138 1.138566
age | .869043 .0091889 -13.27 0.000 .8512184 .8872409
minority | 1.017714 .0796891 0.22 0.823 .8729212 1.186524
------+------
/cut1 | -7.122403 .6293412 -11.32 0.000 -8.355889 -5.888917
/cut2 | -6.146649 .6289257 -9.77 0.000 -7.37932 -4.913977
/cut3 | -2.693722 .6279935 -4.29 0.000 -3.924567 -1.462878
------+------
hhidpn |
var(_cons)| 4.819499 .1531122 4.528557 5.129133
------
LR test vs. ologit regression: chibar2(01) = 8739.46 Prob>=chibar2 = 0.0000
Briefly, the odds ratios for ordered logit are cumulative odds of belonging to a certain category or lower versus belonging to one of the higher categories. For example, if our dependent variable is the level of agreement with some statement and the categories are agree=3, not sure=2, and disagree=1, and if the odds ratio for gender as a predictor of that agreement is 2.00, we can say that the odds of disagreeing rather than agreeing or being not sure are 2 times higher for women than for men. Similarly, the odds of disagreeing or being not sure are also twice as high for women than for men.
What this means is that ologit assumes that these two odds ratios are essentially the same and thus uses the average. That is called the parallel slopes assumption. So we are assuming these two odds ratios are the same – if they differ significantly, the assumption is violated.
Stata does not provide diagnostic tools for panel data for testing the parallel slopes assumption, so in order to obtain a rough test, you might want to run your model without taking panel nature of the data into account(using regular ologit command) and test that assumption that way, even though such a test will be approximate.
Another way to do so would be to create the corresponding dichotomies and estimate models separately for each dichotomy using xtlogit – we can then examine whether odds ratios indeed look similar in such models.