Class Notes: Categorical Dependent Variables

SC706: Longitudinal Data Analysis

Instructor: Natasha Sarkisian

Class notes: Categorical Dependent Variables

So far we’ve only dealt with continuous dependent variables, but panel data analysis is often interested in examining categorical variables as well. Your dependent variable can be dichotomous (0/1), categorical with multiple unordered categories, ordinal, or count variable. In such cases, linear models are inappropriate as there are no restrictions on the predicted values of outcome, the random effect (i.e. level 1 residual) cannot be normally distributed, and cannot have homogenous variance (the variance depends on the predicted value). Like in non-longitudinal models, analysis for such variables is accomplished by specifying a link function that transforms the dependent variable so that the predicted values are constrained to be within a specific interval. Specifically, we use logit models for dichotomous variables, multinomial logit for categorical with unordered categories, ordered logit for ordinal variables, and Poisson models for count variables.

Two-wave Datasets

Variable type / Lagged DV / Difference score / First difference / Cross-lagged
Continuous / reg / reg / reg / sem
Dichotomous / logit / mlogit (change categories: 00, 11, 01, 10) / mlogit (change categories: 0 0, 11, 01, 10) / biprobit, gsem (logit, probit options)
Ordinal / ologit / reg / reg / bioprobit, gsem (ologit. oprobit options)
Nominal / mlogit / mlogit (1, 2, 3 = 11, 1 2, 13, 21, 22, 23, 31, 32, 33) / mlogit (1, 2, 3 = 11, 1 2, 13, 21, 22, 23, 31, 32, 33) / gsem (mlogit option)
Count / poisson, nbreg, zip, zinb / reg / reg / gsem (poisson, nbreg options)

Everything besides cross-lagged model is estimated using regular regression commands. But for mlogit, you need to create transition categories. For example:

. use hrs_hours.dta, clear

. tab r1poorhealth r2poorhealth

Poor | Poor health

health | 0 1 | Total

------+------+------

0 | 4,445 420 | 4,865

1 | 314 785 | 1,099

------+------+------

Total | 4,759 1,205 | 5,964

. egen health=group(r1poorhealth r2poorhealth)

(627 missing values generated)

. tab health

group(r1poo |

rhealth |

r2poorhealt |

h) | Freq. Percent Cum.

------+------

1 | 4,445 74.53 74.53

2 | 420 7.04 81.57

3 | 314 5.26 86.84

4 | 785 13.16 100.00

------+------

Total | 5,964 100.00

. mlogit health r1workhours80 r1married r1totalpar r1siblog h1childlg

Multinomial logistic regression Number of obs = 5558

LR chi2(15) = 723.19

Prob > chi2 = 0.0000

Log likelihood = -4209.2727 Pseudo R2 = 0.0791

------

health | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

1 | (base outcome)

------+------

2 |

r1workhours80 | -.0130358 .0024295 -5.37 0.000 -.0177977 -.008274

r1married | -.5597468 .1395123 -4.01 0.000 -.8331859 -.2863078

r1totalpar | -.1197857 .0726706 -1.65 0.099 -.2622176 .0226461

r1siblog | .4067526 .0921614 4.41 0.000 .2261197 .5873856

h1childlg | .284537 .1008141 2.82 0.005 .0869451 .4821289

_cons | -2.343312 .2260522 -10.37 0.000 -2.786366 -1.900258

------+------

3 |

r1workhours80 | -.0180743 .0027839 -6.49 0.000 -.0235306 -.0126179

r1married | -.6254326 .1553243 -4.03 0.000 -.9298627 -.3210026

r1totalpar | -.2012291 .0870588 -2.31 0.021 -.3718613 -.0305969

r1siblog | .3246466 .104158 3.12 0.002 .1205006 .5287925

h1childlg | .1905984 .1145384 1.66 0.096 -.0338927 .4150895

_cons | -2.079726 .2496431 -8.33 0.000 -2.569017 -1.590435

------+------

4 |

r1workhours80 | -.0421077 .0020845 -20.20 0.000 -.0461933 -.0380222

r1married | -1.023912 .105026 -9.75 0.000 -1.229759 -.8180651

r1totalpar | -.1763883 .0614712 -2.87 0.004 -.2968697 -.0559069

r1siblog | .5245884 .0728385 7.20 0.000 .3818276 .6673492

h1childlg | .2207622 .0785831 2.81 0.005 .0667421 .3747823

_cons | -.7924487 .1702211 -4.66 0.000 -1.126076 -.4588215

------

For cross-lagged model with dichotomous dependent variables:

. biprobit ( r2poorhealth r1poorhealth r1married r1workhours80 r1totalpar) ( r2marriedr1married r1poorhealth r1workhours80 r1totalpar)

Seemingly unrelated bivariate probit Number of obs = 5880

Wald chi2(8) = 4194.46

Log likelihood = -2963.1838 Prob > chi2 = 0.0000

------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

r2poorhealth |

r1poorhealth | 1.807378 .0496701 36.39 0.000 1.710026 1.90473

r1married | -.1549207 .0539901 -2.87 0.004 -.2607394 -.049102

r1workhours80 | -.0091585 .0010034 -9.13 0.000 -.0111251 -.0071919

r1totalpar | -.0330402 .030248 -1.09 0.275 -.0923252 .0262447

_cons | -.906089 .0672771 -13.47 0.000 -1.03795 -.7742284

------+------

r2married |

r1married | 3.291951 .0673929 48.85 0.000 3.159864 3.424039

r1poorhealth | -.1818643 .077882 -2.34 0.020 -.3345102 -.0292183

r1workhours80 | .0011923 .0014246 0.84 0.403 -.0015999 .0039844

r1totalpar | .0707012 .0413782 1.71 0.088 -.0103986 .151801

_cons | -1.559129 .0907986 -17.17 0.000 -1.737091 -1.381167

------+------

/athrho | -.0035238 .0515823 -0.07 0.946 -.1046233 .0975757

------+------

rho | -.0035238 .0515817 -.1042432 .0972672

------

Likelihood-ratio test of rho=0: chi2(1) = .004666 Prob > chi2 = 0.9455

In this case, rho is the correlation between the residuals of two regressions (non-significant here), and athrho is the Fisher's Z transformation (the arc-hyperbolic tangent) of that correlation.

. test [r2poorhealth]r1married=[r2married]r1poorhealth

( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0

chi2( 1) = 0.08

Prob > chi2 = 0.7762

Another way, using gsem – but note that correlated residuals don’t work when it is estimated with logit etc.:

. gsem ( r2poorhealth <- r1poorhealth r1married r1workhours80 r1totalpar) ( r2married <- r1married r1poorhealth r1workhours80 r1totalpar), logit

Generalized structural equation model Number of obs = 5882

Log likelihood = -2962.1503

------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

r2poorhealth <- |

r1poorhealth | 3.06475 .0869457 35.25 0.000 2.894339 3.23516

r1married | -.2836505 .1005597 -2.82 0.005 -.4807439 -.0865571

r1workhours80 | -.0172547 .001859 -9.28 0.000 -.0208983 -.0136111

r1totalpar | -.0554403 .0576515 -0.96 0.336 -.1684352 .0575547

_cons | -1.52599 .1248862 -12.22 0.000 -1.770762 -1.281217

------+------

r2married <- |

r1married | 5.922521 .1436862 41.22 0.000 5.640901 6.204141

r1poorhealth | -.3999421 .1703288 -2.35 0.019 -.7337804 -.0661037

r1workhours80 | .0030176 .0031457 0.96 0.337 -.0031479 .0091831

r1totalpar | .1628322 .0946266 1.72 0.085 -.0226325 .3482968

_cons | -2.795544 .1991486 -14.04 0.000 -3.185868 -2.40522

------

. test [r2poorhealth]r1married=[r2married]r1poorhealth

( 1) [r2poorhealth]r1married - [r2married]r1poorhealth = 0

chi2( 1) = 0.35

Prob > chi2 = 0.5566

Here’s another example of a cross-lagged model in gsem but now using Poisson models:

. gsem ( r2workhours <- r1allparhelptw r1poorhealth r1married r1workhours80 r1totalpar) ( r2allparhelptw <- r1allparhelptw r1married r1poorhealth r1workhours80 r1totalpar), family(poisson) link(log)

note: r2allparhelptw has noncount values;

you are responsible for the family(poisson) interpretation

Generalized structural equation model Number of obs = 5803

Log likelihood = -69816.414

------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

r2workhours80 <- |

r1allparhelptw | -.0108981 .0010395 -10.48 0.000 -.0129354 -.0088608

r1poorhealth | -.3278578 .0085658 -38.28 0.000 -.3446465 -.311069

r1married | .0185784 .0067215 2.76 0.006 .0054044 .0317523

r1workhours80 | .0283919 .0001318 215.49 0.000 .0281337 .0286501

r1totalpar | .0469421 .0031718 14.80 0.000 .0407255 .0531587

_cons | 2.218248 .0092682 239.34 0.000 2.200083 2.236413

------+------

r2allparhelptw <- |

r1allparhelptw | .0898139 .0020298 44.25 0.000 .0858355 .0937923

r1married | -.407909 .0278715 -14.64 0.000 -.4625362 -.3532818

r1poorhealth | -.1778003 .0326566 -5.44 0.000 -.2418061 -.1137945

r1workhours80 | -.0018228 .0005492 -3.32 0.001 -.0028991 -.0007465

r1totalpar | .1536927 .0152773 10.06 0.000 .1237499 .1836356

_cons | .2722775 .036161 7.53 0.000 .2014032 .3431518

------

Now let’s try ordered logit; need to generate two ordinal variables to use:

. recode r1workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r1workhours4)

(4675 differences between r1workhours80 and r1workhours4)

. recode r2workhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(r2workhours4)

(3935 differences between r2workhours80 and r2workhours4)

. recode r1allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r1allparhelptw3)

(441 differences between r1allparhelptw and r1allparhelptw3)

. recode r2allparhelptw (0=0) (1/5=1) (5.001/20=2), gen( r2allparhelptw3)

(1179 differences between r2allparhelptw and r2allparhelptw3)

. net search bioprobit

Install from: bioprobit from http://fmwww.bc.edu/RePEc/bocode/b

. bioprobit (r2allparhelptw3 r1workhours4 r1married r1poorhealth r1totalpar) (r2workhours4 r1allparhelptw3 r1married r1poorhealth r1totalpar)

group(r2wor |

khours4) | Freq. Percent Cum.

------+------

1 | 1,914 33.93 33.93

2 | 603 10.69 44.62

3 | 2,432 43.11 87.73

4 | 692 12.27 100.00

------+------

Total | 5,641 100.00

Seemingly unrelated bivariate ordered probit regression

Number of obs = 5641

Wald chi2(4) = 52.69

Log likelihood = -12884.726 Prob > chi2 = 0.0000

------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

r2allparhe~3 |

r1workhours4 | -.007293 .0228304 -0.32 0.749 -.0520398 .0374538

r1married | -.205262 .0432182 -4.75 0.000 -.2899682 -.1205559

r1poorhealth | -.1397515 .0480594 -2.91 0.004 -.2339463 -.0455568

r1totalpar | .1228002 .0221615 5.54 0.000 .0793644 .166236

------+------

r2workhours4 |

r1allparhe~3 | -.1803361 .0368463 -4.89 0.000 -.2525535 -.1081186

r1married | .0423012 .0389766 1.09 0.278 -.0340916 .118694

r1poorhealth | -.8130412 .0412363 -19.72 0.000 -.8938628 -.7322195

r1totalpar | .1512924 .0196722 7.69 0.000 .1127356 .1898492

------+------

athrho |

_cons | -.026925 .025833 -1.04 0.297 -.0775568 .0237068

------+------

/cut11 | .5303018 .0591844 .4143026 .6463011

/cut12 | .5308175 .0591851 .4148169 .6468181

/cut13 | .5328816 .0591878 .4168757 .6488876

/cut14 | .5349478 .0591905 .4189365 .6509591

/cut15 | .5370161 .0591932 .4209995 .6530327

/cut16 | .5432348 .0592011 .4272027 .6592669

/cut17 | .5458319 .0592044 .4297934 .6618703

/cut18 | .5468716 .0592057 .4308306 .6629127

/cut19 | .5473917 .0592063 .4313494 .663434

/cut110 | .5635893 .0592264 .4475077 .6796709

/cut111 | .5699004 .0592341 .4538037 .6859971

/cut112 | .5762354 .0592418 .4601235 .6923473

/cut113 | .5767644 .0592425 .4606512 .6928776

/cut114 | .6050495 .0592784 .4888661 .721233

/cut115 | .6093619 .0592839 .4931676 .7255562

/cut116 | .6267269 .0593053 .5104907 .7429631

/cut117 | .6272727 .0593059 .5110352 .7435101

/cut118 | .6278186 .0593066 .5115799 .7440573

/cut119 | .6283647 .0593072 .5121247 .7446047

/cut120 | .6415286 .0593231 .5252574 .7577997

/cut121 | .6453889 .0593277 .5291087 .7616692

/cut122 | .6459412 .0593284 .5296597 .7622227

/cut123 | .6659519 .0593511 .5496259 .7822778

/cut124 | .6665114 .0593517 .5501843 .7828386

/cut125 | .6698733 .0593552 .5535392 .7862074

/cut126 | .6704343 .0593558 .554099 .7867696

/cut127 | .6726802 .0593581 .5563404 .78902

/cut128 | .6732422 .0593587 .5569013 .7895831

/cut129 | .8039208 .0594996 .6873036 .9205379

/cut130 | .8063982 .0595025 .6897755 .9230209

/cut131 | 1.45921 .0612874 1.339089 1.579332

/cut21 | -.3239231 .0452237 -.4125599 -.2352863

/cut22 | -.025728 .0450239 -.1139733 .0625172

/cut23 | 1.334024 .0477328 1.24047 1.427579

------+------

rho | -.0269185 .0258143 -.0774017 .0237023

------

LR test of indep. eqns. : chi2(1) = 1.09 Prob > chi2 = 0.2973

Multiwave Panel Data

Types of models and commands:

Dependent Variable Type / Fixed effects (FE) / Random effects (RE) / Mixed effects (ME)
Continuous / xtreg, fe / xtreg, re / mixed
Dichotomous / xtlogit, fe / xtlogit, re / melogit
Ordinal / Not recommended / meologit / meologit
Nominal / Not recommended / Gsem, mlogit option / Outside Stata: HLM 7
Count / xtpoisson, fe
xtnbreg, fe / xtpoisson, re
xtnbreg, re / mepoisson, menbreg

Dichotomous DV

We’ll use poorhealth variable in the hrs_hours_reshaped.dta file as the outcome for this example. To specify that the dependent variable is binary, we use xtlogit command, first specifying FE and then RE model (Stata also offers an xtprobit command if you prefer probit models):

. use hrs_hours_reshaped.dta, clear

. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw, fe

note: multiple positive outcomes within groups encountered.

note: 4409 groups (19766 obs) dropped because of all positive or

all negative outcomes.

Conditional fixed-effects logistic regression Number of obs = 10780

Group variable: hhidpn Number of groups = 1837

Obs per group: min = 2

avg = 5.9

max = 9

LR chi2(6) = 480.51

Log likelihood = -3898.61 Prob > chi2 = 0.0000

------

rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | -.0248479 .0015259 -16.28 0.000 -.0278387 -.0218571

rmarried | -.0069076 .129063 -0.05 0.957 -.2598664 .2460511

rtotalpar | -.2935712 .0407944 -7.20 0.000 -.3735267 -.2136156

rsiblog | -.3552165 .16598 -2.14 0.032 -.6805313 -.0299018

hchildlg | .4488363 .1593514 2.82 0.005 .1365133 .7611594

rallparhel~w | .010376 .0064248 1.61 0.106 -.0022164 .0229685

------

Note that those who experienced no change in health were omitted from this analysis. Also note that FE model doesn’t distinguish between two different stable health conditions (i.e., 0  0 and 1  1) and considers 0  1 and 1  0 as producing the same size effects in opposite direction.

Random effects:

. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re

Random-effects logistic regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 4.9

max = 9

Integration method: mvaghermite Integration points = 12

Wald chi2(10) = 1550.47

Log likelihood = -11019.627 Prob > chi2 = 0.0000

------

rpoorhealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | -.0353336 .0013598 -25.99 0.000 -.0379987 -.0326685

rmarried | -.5402026 .0898874 -6.01 0.000 -.7163786 -.3640265

rtotalpar | -.1603994 .0359967 -4.46 0.000 -.2309517 -.0898472

rsiblog | -.0966548 .0698343 -1.38 0.166 -.2335276 .040218

hchildlg | .2094428 .0730742 2.87 0.004 .06622 .3526656

rallparhelptw | .0021523 .0059423 0.36 0.717 -.0094943 .0137989

female | -.6397422 .0902022 -7.09 0.000 -.8165353 -.4629491

raedyrs | -.3337791 .0157692 -21.17 0.000 -.3646861 -.3028721

age | -.0363513 .0144833 -2.51 0.012 -.064738 -.0079646

minority | 1.122804 .1038392 10.81 0.000 .9192827 1.326325

_cons | 4.836503 .866618 5.58 0.000 3.137963 6.535043

------+------

/lnsig2u | 1.921837 .046 1.831679 2.011995

------+------

sigma_u | 2.614096 .0601242 2.498872 2.734634

rho | .6750224 .0100909 .6549413 .6944799

------

Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000

You can evaluate the assumptions of RE model using Hausman test the same way you did for regular RE models; however, do not specify “sigmamore” option in hausman command.

Interpreting coefficients

The interpretation of coefficients is very similar to the interpretation of the results of logistic regression. Interpreting coefficients themselves allows us to discuss the direction and significance of effects, but not their size. To talk about the size, we use odds ratios. Odds are ratios of two probabilities – probability of a positive outcome and a probability of a negative outcome (e.g. probability of voting divided by a probability of not voting). But since probabilities vary depending on values of X, such a ratio varies as well. What remains constant is the ratio of such odds – e.g. odds of poor health for a female divided by odds of poor health for a mlae will be the same number regardless of the values of other variables on the model.

Odds ratios are exponentiated logistic regression coefficients. They are sometimes called factor coefficients, because they are multiplicative coefficients. Odds ratios are equal to 1 if there is no effect, smaller than 1 if the effect is negative and larger than 1 if it is positive. To get them in xtlogit command, use OR option:

. xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 4.9

max = 9

Wald chi2(10) = 1550.47

Log likelihood = -11019.627 Prob > chi2 = 0.0000

------

rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593

rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728

rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709

rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038

hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855

rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895

female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247

raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936

age | .9643015 .0139663 -2.51 0.012 .937313 .992067

minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172

------+------

/lnsig2u | 1.921837 .046 1.831679 2.011995

------+------

sigma_u | 2.614096 .0601242 2.498872 2.734634

rho | .6750224 .0100909 .6549413 .6944799

------

Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000

So for example, the odds ratio for .53 for females indicates that the odds of reporting poor health for males are about half those for males –or we can say 47% lower. To get percent change, we subtract 1 from the odds ratio, and then multiply the result by 100.

Beware: if you would like to know what the increase would be per, say, 10 units increase in the independent variable – e.g. 10 years of age or education, you cannot simply multiple the odds ratio by 10! The coefficient, in fact, would be odds ratio to the power of 10. Or alternatively, you could take the regular logit coefficient, multiply it by 10 and then exponentiate it.

In addition, since odds ratios are multiplicative coefficients, when you want to interpret, for example, an interaction term, you would have to multiply rather than add the odds ratio numbers. Alternatively, you can add the numbers presented in the coefficient column and then exponentiate the result.

In addition to using odds ratios, we can use predicted probabilities (P) to interpret our results. We can get them by calculating predicted logits (L) and then recalculating them into probabilities. Since L=log(odds)=log(P/(1-P)), thenP=eL/(1+eL).

Luckily, we do not have to do that by hand – we can use predict command, adjust command, or margins comand. Predict command allows us to estimate predicted probabilities for the actual observations in the data. The options can be used with predict, margins, or adjust after xtlogit:

xb calculates the linear prediction. This is the default for the random-effects model.

stdp calculates the standard error of the linear prediction.

pc1 calculates the predicted probability of a positive outcome conditional on one positive outcome within group. This is the default for the fixed-effects model.

pu0 calculates the probability of a positive outcome, assuming that the fixed or random effect for that observation's panel is zero. This may not be similar to the proportion of observed outcomes in the group.

We can also use adjust or margins commands to calculate predicted probabilities for some hypothetical, strategically selected cases and to construct graphs:

. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)

Variables that uniquely identify margins: rworkhours80 rmarried

Interactions

Note that interactions as a method to compare two or more groups can be problematic in logit models because the coefficients are scaled according to the differences in residual dispersion. So it is not as appropriate to rely on the significance test of the interaction term to establish whether some process differs by group. The best approach to establish whether the two groups do differ is to examine differences in predicted probabilities. You would have to decide which values to assign to the rest of the variables in your model; the default is to use the mean.

. xtlogit rpoorhealth c.rworkhours80##i.rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 4.9

max = 9

Integration method: mvaghermite Integration points = 12

Wald chi2(11) = 1561.99

Log likelihood = -11012.155 Prob > chi2 = 0.0000

------

rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | .9556948 .0028358 -15.27 0.000 .9501529 .9612689

1.rmarried | .4733899 .0497145 -7.12 0.000 .3853251 .5815815

rmarried#c.rworkhours80 |

1 | 1.012447 .0032671 3.83 0.000 1.006064 1.018871

rtotalpar | .8485704 .0305472 -4.56 0.000 .7907624 .9106044

rsiblog | .908552 .0634898 -1.37 0.170 .7922599 1.041914

hchildlg | 1.231178 .0900109 2.84 0.004 1.066817 1.420862

rallparhelptw | 1.00155 .0059641 0.26 0.795 .9899281 1.013307

female | .5349173 .0482853 -6.93 0.000 .4481788 .6384427

raedyrs | .7170131 .0113045 -21.10 0.000 .6951955 .7395154

age | .9638097 .0139562 -2.55 0.011 .9368406 .9915553

minority | 3.054056 .3171545 10.75 0.000 2.491623 3.743447

_cons | 150.9996 131.0543 5.78 0.000 27.55549 827.453

------+------

/lnsig2u | 1.920336 .045987 1.830203 2.010468

------+------

sigma_u | 2.612135 .0600621 2.497028 2.732547

rho | .6746929 .0100933 .6546077 .6941559

------

Likelihood-ratio test of rho=0: chibar2(01) = 5472.85 Prob >= chibar2 = 0.000

. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) noci plotop(msymbol(i)) scheme(s1mono)

Variables that uniquely identify margins: rworkhours80 rmarried

You can also create confidence intervals; for example:

. qui margins, at(rworkhours80=(0(10)80) rmarried=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) recastci(rarea) ciopts(fintensity(5))

We can also more explicitly graph the difference between the two groups – let’s try at a few levels of other variables:

. qui margins, dydx(rmarried) at(rworkhours80=(0(10)80) female=(0/1) minority=(0/1)) predict(pu0)

. marginsplot, x(rworkhours80) plotop(msymbol(i)) scheme(s1mono) noci

Variables that uniquely identify margins: rworkhours80 female minority

Variables that uniquely identify margins: rworkhours80 rmarried

For more detail on doing these graphical comparisons, see Scott Long’s article at:

http://www.indiana.edu/~jslsoc/files_research/groupdif/groupwithprobabilities/groups-with-prob-2009-06-25.pdf

Variance components

Note that the variance component does not contain an estimate of level 1 variance. That is because in logistic regression models, it is not possible to estimate both the coefficients and the error variance; therefore, in all logistic regression models, the error variance is always fixed to the same number which is 2/3 = 3.29. That rule also applies to multilevel models, but only to their level 1 residuals. Knowing thismeans that we can calculate the intraclass correlation coefficient or the proportion of variance explained. For both, we can follow the procedures described on pp.224-227 of the Snijders and Bosker chapter on dichotomous outcomes. For instance, the ICC would be calculated as

where tau zero squared is variance of individual-level residuals U. And the proportion of variance explained can be calculated as

Note that in addition to the variance of individual-level residuals U denoted 0 and level 1 variance 2R = 3.29, we need to know the variance of fitted values 2F. That refers to the variance of linear predictions, which are the values that results if we multiply our coefficients by our variable values and add up these products. That is, we are talking about the predicted values of logits. To obtain the variance of fitted values, we can use predict with xb option:

. qui xtlogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelptw female raedyrs age minority, re or

Random-effects logistic regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Random effects u_i ~ Gaussian Obs per group: min = 1

avg = 4.9

max = 9

Wald chi2(10) = 1550.47

Log likelihood = -11019.627 Prob > chi2 = 0.0000

------

rpoorhealth | OR Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | .9652833 .0013126 -25.99 0.000 .9627142 .9678593

rmarried | .5826302 .0523711 -6.01 0.000 .4885182 .6948728

rtotalpar | .8518035 .0306621 -4.46 0.000 .7937778 .9140709

rsiblog | .9078693 .0634004 -1.38 0.166 .7917358 1.041038

hchildlg | 1.232991 .0900998 2.87 0.004 1.068462 1.422855

rallparhel~w | 1.002155 .0059551 0.36 0.717 .9905506 1.013895

female | .5274284 .0475752 -7.09 0.000 .4419602 .6294247

raedyrs | .716212 .0112941 -21.17 0.000 .6944146 .7386936

age | .9643015 .0139663 -2.51 0.012 .937313 .992067

minority | 3.073459 .3191454 10.81 0.000 2.507491 3.767172

------+------

/lnsig2u | 1.921837 .046 1.831679 2.011995

------+------

sigma_u | 2.614096 .0601242 2.498872 2.734634

rho | .6750224 .0100909 .6549413 .6944799

------

Likelihood-ratio test of rho=0: chibar2(01) = 5476.80 Prob >= chibar2 = 0.000

. predict xb, xb

(24127 missing values generated)

. sum xb if e(sample)

Variable | Obs Mean Std. Dev. Min Max

------+------

xb | 30541 -2.724561 1.63337 -7.265207 3.923255

Rho:

. di 2.614096^2/(2.614096^2+c(pi)^2/3)

.67502231

R-squared:

. di 1.633347^2/(1.633347^2+2.614096^2+3.29)

.20856505

Note that such R squared values are pseudo-R squared and are typically lower than values we are used to with OLS because 2R is a fixed number.

Obtaining residuals after xtlogit is not possible with predict command, as we saw above. We can do it, however, by reestimating RE model using mixed effects logit syntax:

. melogit rpoorhealth rworkhours80 rmarried rtotalpar rsiblog hchildlg rallparhelpt

> w female raedyrs age minority || hhidpn: , or

Mixed-effects logistic regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Obs per group: min = 1

avg = 4.9

max = 9

Integration method: mvaghermite Integration points = 7

Wald chi2(10) = 1557.22

Log likelihood = -11008.605 Prob > chi2 = 0.0000

------

rpoorhealth | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

------+------

rworkhours80 | .965261 .0013117 -26.02 0.000 .9626936 .9678352

rmarried | .5875815 .052345 -5.97 0.000 .4934447 .6996773

rtotalpar | .847142 .0305427 -4.60 0.000 .7893455 .9091704

rsiblog | .9013827 .0618536 -1.51 0.130 .7879507 1.031144

hchildlg | 1.238295 .0890782 2.97 0.003 1.075455 1.425793

rallparhelptw | 1.002406 .0059541 0.40 0.686 .990804 1.014144

female | .5267198 .0463214 -7.29 0.000 .4433251 .6258021

raedyrs | .7136284 .0111674 -21.56 0.000 .6920729 .7358553

age | .9635443 .0135893 -2.63 0.008 .9372744 .9905504

minority | 3.124398 .3184031 11.18 0.000 2.558713 3.815146

_cons | 132.5952 112.0297 5.78 0.000 25.31322 694.5572

------+------

hhidpn |

var(_cons)| 7.442719 .3733355 6.745813 8.211621

------

LR test vs. logistic regression: chibar2(01) = 5498.85 Prob>=chibar2 = 0.0000

After that we can use predict command to get a range of residuals, like for xtmixed. For example, you can get option reffects to get random effects (both random intercepts and random slopes, if you decide to introduce them). You can also use xb option to get fitted values based on coefficients only (random effects not included), and you can get three types of overall residuals:

pearson calculates Pearson residuals. Pearson residuals large inabsolute value may indicate a lack of fit. By default, residualsinclude both the fixed portion and the random portion of the model.The fixedonly option modifies the calculation to include the fixed portion only.

deviance calculates deviance residuals. Deviance residuals are recommended by McCullagh and Nelder (1989) as having the best properties for examining the goodness of fit of a GLM. They are approximately normally distributed if the model is correctlyspecified. They may be plotted against the fitted values or against a covariate to inspect the model's fit. By default, residualsinclude both the fixed portion and the random portion of the model. The fixedonly option modifies the calculation to include the fixed portion only.

anscombe calculates Anscombe residuals, residuals that are designed to closely follow a normal distribution. By default, residuals includeboth the fixed portion and the random portion of the model. The fixedonly option modifies the calculation to include the fixed portion only.

Ordered Logit

Much of what we discussed applies to ordered logit models. To better understand interpretation of coefficients in ordered logit, you should review my SC704 notes for that topic.

. recode rworkhours80 (0=0) (1/30=1) (31/50=2) (51/80=3), gen(rworkhours4)

(23290 differences between rworkhours80 and rworkhours4)

. meologit rworkhours4 rpoorhealth rmarried rtotalpar rsiblog hchildlg rallparhelpt

> w female raedyrs age minority || hhidpn: , or

Mixed-effects ologit regression Number of obs = 30541

Group variable: hhidpn Number of groups = 6243

Obs per group: min = 1

avg = 4.9

max = 9

Integration method: mvaghermite Integration points = 7

Wald chi2(10) = 3191.62

Log likelihood = -29982.693 Prob > chi2 = 0.0000

------

rworkhours4 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

------+------

rpoorhealth | .2844605 .0135639 -26.36 0.000 .2590801 .3123272

rmarried | .7912272 .0501324 -3.70 0.000 .6988256 .8958465

rtotalpar | 2.424358 .0562904 38.14 0.000 2.316504 2.537234

rsiblog | 1.100873 .0558983 1.89 0.058 .9965895 1.216069

hchildlg | .8502 .045471 -3.03 0.002 .7655905 .9441601

rallparhelptw | .9483347 .0037667 -13.36 0.000 .9409809 .955746

female | .2667214 .0174542 -20.19 0.000 .2346148 .3032217

raedyrs | 1.113577 .0126085 9.50 0.000 1.089138 1.138566

age | .869043 .0091889 -13.27 0.000 .8512184 .8872409

minority | 1.017714 .0796891 0.22 0.823 .8729212 1.186524

------+------

/cut1 | -7.122403 .6293412 -11.32 0.000 -8.355889 -5.888917

/cut2 | -6.146649 .6289257 -9.77 0.000 -7.37932 -4.913977

/cut3 | -2.693722 .6279935 -4.29 0.000 -3.924567 -1.462878

------+------

hhidpn |

var(_cons)| 4.819499 .1531122 4.528557 5.129133

------

LR test vs. ologit regression: chibar2(01) = 8739.46 Prob>=chibar2 = 0.0000

Briefly, the odds ratios for ordered logit are cumulative odds of belonging to a certain category or lower versus belonging to one of the higher categories. For example, if our dependent variable is the level of agreement with some statement and the categories are agree=3, not sure=2, and disagree=1, and if the odds ratio for gender as a predictor of that agreement is 2.00, we can say that the odds of disagreeing rather than agreeing or being not sure are 2 times higher for women than for men. Similarly, the odds of disagreeing or being not sure are also twice as high for women than for men.

What this means is that ologit assumes that these two odds ratios are essentially the same and thus uses the average. That is called the parallel slopes assumption. So we are assuming these two odds ratios are the same – if they differ significantly, the assumption is violated.

Stata does not provide diagnostic tools for panel data for testing the parallel slopes assumption, so in order to obtain a rough test, you might want to run your model without taking panel nature of the data into account(using regular ologit command) and test that assumption that way, even though such a test will be approximate.

Another way to do so would be to create the corresponding dichotomies and estimate models separately for each dichotomy using xtlogit – we can then examine whether odds ratios indeed look similar in such models.