REGRESSION ANALYSIS WITH THE ORDERED MULTINOMIAL 1

LOGISTIC MODEL

Regression Analysis with the Ordered Multinomial Logistic Model

Braden Hoelzle

Southern Methodist University

______

Paper presented at the annual meeting of the Southwest Educational Research Association, San Antonio, TX, February 2-4, 2011.

Regression Analysis with the Ordered Multinomial Logistic Model

While logistic regression is a commonly utilized subset of regression when modeling a dichotomous dependent variable, very few researchers make use of the ordered multinomial logistic model. The ordered multinomial logistic model can be understood as an extension of the logistic regression model that is useful when modeling ordinal dependent variables with three or more categories (eg. yes, maybe, no; first, second, third;Likert scale, etc). The rarity of its use in educational research stands in sharp contrast to the abundance of ordinal data in educational studies. In fact Johnson and Albert argue that “ordinal data are the most frequently encountered type of data in the social sciences” (1999, p. 126). If ordinal data types occur so frequently in social science research and if the ordered multinomial model is rarely used, how are researchers modeling ordinal data? In many cases they are treating ordinal data like interval scale data and modeling it using standard regression techniques. However, ordinal dependent variables often violate the assumptions of regression and produce unreliable results. This paper will give examples of ordinal data, examine briefly the debate around modeling ordinal data using linear regression, and demonstrate how to model ordinal data with the ordered multinomial logistic model using the open-source statistical program R. In addition, the appendix of this paper will include a script file containing all the R code necessary for running two example regressions using the ordered logistic model.

Ordinal Data in Educational Research

Were Johnson and Albert (1999) exaggerating when they said that ordinal data is the most frequently found data type in social science research? Probably not. Table 1 contains a few examples of ordinal data frequently found in education research. However, it should be noted that the ordered multinomial logistic model is only appropriate when the dependent variable follows an ordinal scale. Ordinal data can serve as predictor variables in linear regression without increasing the likelihood of violating the assumptions of the model.

Table 1

Examples of Ordinal Data Found in Educational Research

always, frequently, sometimes, rarely, never / low, medium, high
no high school diploma, high school diploma, some college, bachelor’s degree, master’s degree, doctoral degree / 0-10K per year, 10-20K per year, 20-30K per year, 30 – 60K per year, > 60K per year
strongly agree, agree, neither agree nor disagree, disagree, strongly disagree / basic math, regular math, pre-AP math, AP math
yes, maybe, no / first, second, third, fourth
free school lunch, reduced school lunch, full price lunch / A, B, C, D, F
not proficient, proficient, commended / tier 1, tier 2, tier 3

Modeling Ordinal Data Using Linear Regression

A common practice among social science researchers is to treat ordinal data as intervally-scaled data (Jamieson, 2004). For example, a person might conduct a survey using a Likert scale that attempts to measure a person’s level of agreement with the 2010 health care reform bill. Often the Likert scale is coded from one to five and modeled using linear regression as if it consisted of five equally distanced integers. However, if the Likert scale were intervally scaled one could argue that the difference in the level of agreement between someone who markedStrongly Disagree (1) and someone who marked Neither Agree nor Disagree (3) is the same as the distance between someone who marked Disagree (2) and someone who marked Agree (4). While mathematically each of the differences equal 2, it would be inappropriate to say that the difference in the level of agreement within pair one is the same as that within pair two. Therefore, treating ordinal data as intervally scaled creates problems for interpreting the results and often violates the assumptions necessary for regression.

One of the underlying assumptions of linear regression is that the data is distributed around a single linear line defined as with a normally distributed error term which has a mean of zero and constant variance (McKelveyZavoina, 1975). However, often an ordinal dependent variable violates this assumption of linearity and constant error variance.When this assumption is not met the results of the regression cannot be trusted and according to Lu should be “accepted with a grain of salt” (1999, p. 264). However, the decision between linear regression and ordered multinomial regression is not always black and white. According to Gelman and Hill (2007) unequal error variance is in many cases a minor issue. In addition, they indicate that when an ordinal dependent variable has a large number of categories that can be considered equally spaced simple linear regression is an optional alternative. While significant debate surrounds the decision of when ordinal data can be modeled using parametric measures designed for interval data (CarifioPerla, 2007; Jamieson, 2004; Knapp, 1990; Labovitz, 1970; Lu, 1999) this paper is primarily concerned with showing how to model ordinal data using the ordered multinomial logistic regression model. It is this author’s opinion that when modeling between three and five ordered categories the ordered multinomial logistic model is highly preferable. Further, when setting up a research study in which you intend to use standard parametric statistics on ordinal data it is usually preferable to use more rather than fewer categories – ie. ten instead of five (Knapp, 1990).

Modeling Ordinal Data with the Ordered Multinomial Logistic Model

As mentioned earlier, the ordered multinomial logistic model is an extension of logistic regression when the data falls into more than two ordered categories. Like logistic regression, the ordered logistic model requires that the continuously-scaled linear coefficients derived from regression must be transformed using the inverse logit function onto the range (0, 1). The inverse logit function can be written

This transformation enables the calculation of probabilities that in logistic regression can be interpreted as the probability that a person does or doesn’t have a particular trait. However, in the ordered multinomial logistic modelthis transformation enables the calculation of the probability that a person belongs in a given category. In order to conceptualize this concept one needs some knowledge of cumulative distribution functions (see figure 1). A cumulative distribution function (CDF)can take values in the range [0,1] that are equivocal to the probability that the random variable X takes a value less than or equal to x. For example, in figure 1 the probability that X is less than or equal to x1 (P0) is 45 percent. And by definition,

Figure 1

Cumulative Distribution Function

theprobability that X is greater than x1 is equal to 55 percent. However to determine the probability that X falls somewhere between x1 and x2 (P1) we would need to subtract the probability that X ≤ x1 from the probability that X ≤ x2 (85 - 45 = 40 percent). Therefore, for each cut point (x1, x2, and x3) the CDF is equal to the probability that X isless than or equal to that cut point. To crystallize these concepts we will apply the ordered multinomial logistic model to a datasetusing the R statistical package “bayespolr” in the “arm” library.

Example Ordered Multinomial Logistic Model

The hypothetical dataset for this analysis can be downloaded from the UCLA Academic Technology Services( n.d.). Before running the analysis the “arm” and “psych” libraries must be loaded and the dataset read into R.

grad<-read.csv("

This dataset contains four variables defined in table 2. In this ordered regression model apply will serve as the ordinal outcome variable and will be modeled by the variable gpa.

Table 2

Legend for Variables

Variable Names / Code / Definition
apply / 0 = unlikely, 1 = somewhat likely, 2 = very likely / college juniors reported likelihood of applying to graduate school
pared / 0 = no, 1 = yes / indicating whether at least one parent has a graduate degree
public / 0 = private, 1 = public / indicating whether the undergraduate institution is public or private
gpa / Real number from 0 – 4 / college gpa

The reliability of an ordered multinomial regression model is contingent upon the assumptions that there is sufficient data that falls into each category of the dependent variable. This assumption can be checked using crosstabs to identify any cells with none or very few elements. It appears that this data met our assumptions for the ordered regression model.

xtabs(~ pared + apply)

apply

pared 0 1 2

0 200 110 27

1 20 30 13

xtabs(~ public + apply)

apply

public 0 1 2

0 189 124 30

1 31 16 10

Notice in the following code that for the ordered regression model the dependent variable must be defined as ordinal in the script.

summary(m1 <- bayespolr(as.ordered(apply)~gpa,data=mydata))

Call:

bayespolr(formula = as.ordered(apply) ~ gpa, data = mydata)

Coefficients:

Value Std. Error t value

gpa 0.7109826 0.247078 2.877563

Intercepts:

Value Std. Error t value

0|1 2.3308 0.7502 3.1068

1|2 4.3508 0.7744 5.6183

Residual Deviance: 737.6921

AIC: 743.6921

The model produces a coefficient for the predictive variable gpa, and two cut points (0|1 and 1|2) representing the division between the categories 0 and 1 and 1 and 2 respectively. Hence, at cut point 0|1 the CDF would produce the probability that our dependent variable (apply) is less than or equal to the dividing line between 0 and 1 (ie. apply = 0). However, these coefficients are not interpretable untilthey are mapped onto the range (0,1) using the inverse logit function which can be expanded to read

Rather than calculating by hand the probabilities of a person falling into each of the three categories of the outcome variable apply the function can be coded into R using the following script.

(beta <- coef(m1))

(tau <- m1$zeta)

x<- 3

logit.prob <- function(eta){exp(eta)/(1+exp(eta))}

(p0<- logit.prob(tau[1] - x * beta))

(p1<- logit.prob(tau[2] - x*beta)-logit.prob(tau[1] - x * beta))

(p2<- 1 - logit.prob(tau[2] - x * beta))

Using the function entitled logit.probthe script transforms the linear coefficients such that p0 equals the probability that apply = 0 given a gpa of 3. The gpa was set to 3 since the mean of gpa for this sample was 2.99. Using these formulas we derive that for students with a gpa of 3, about 55 percent indicated that they were unlikely to apply to graduate school, 35 indicated that they were somewhat likely to apply, and 10 percent indicated that they were very likely to apply. Notice, that p1 is calculated by subtracting the probability that apply is less than cut point 0|1 ( tau[1] ) from the probability that apply is less than cut point (1|2) ( tau[2] ). Also, p2 is calculated by subtracting the probability that apply is less than cut point 1|2 from 1, since the sum of all the subset probabilities (p0, p1, p2) must equal 1.

By allowing gpa to vary and calculating a probability matrix for each of the three categories of apply we can determine the likelihood of applying to graduate school given various gpa’s. Further, we can add other predictor variables to the model to determine the likelihood of applying to graduate school given a range of predictor variable values. Figure 2 graphically displays the different probabilities of applying to graduate school for students who had parents who attended graduate school and those who did not across a range of undergraduate gpa’s.

Figure 2

Probability of Applying to Grad School

From this graph we can draw two primary conclusions. First, students with at least one parent who attended graduate school are more likely to apply to graduate school than students whose parent(s) did not, across the entire range of potential gpa scores. Second, the likelihood of applying for graduate school increases as GPA increases for all students.

Conclusion

The ordered multinomial logistic model offers an important alternative to standard linear regression when modeling ordinal data. While significant debate still surrounds whether ordinal data can be modeled using standard parametric regression techniques, in general, ordinal data with few categories that do not approximate aninterval scale should be modeled with non-parametric models such as the ordered multinomial logistic regression model. For a comparison of the various models available for modeling ordinal data consultFullerton (2009) or Winshipand Mare (1984). To replicate using R the analyses conducted for this article and to explore a second example see Appendix A. The ordered multinomial model is an under-utilized regression model that offers a non-parametric approach to modeling ordinal data.

References

Carifio, J. & Perla, R. J. (2007).Ten common misunderstandings, misconceptions, persistent myths and urban legends about Likert scales and Likert response formats and their antidotes. Journal of Social Sciences, 3 (3), 106-116.

Fullerton, A. S. (2009). A conceptual framework for ordered logistic regression models. Sociological Methods & Research, 38(2), 306–347. doi: 10.1177/0049124109346162

Gelman, A. & Hill, J. (2007).Data analysis using regression and multilevel/hierarchical models. New York: Cambridge University Press.

Jamieson, S. (2004).Likert scales: How to (ab)use them. Medical Education, 38, 1212-1218.

Johnson, V. E. & Albert, J. H. (1999). Statistics for the social sciences and public policy: Ordinal data modeling. New York: Springer.

Knapp, T. R. (1990). Treating ordinal scales as interval scales: An attempt to resolve the controversy. Nursing Research, 39 (2), 121-123.

Labovitz,S. (1970). The assignment of numbers to rank order categories. American Sociological Review, 35 (3), 515-524.

Lu, M. (1999). Determinants of residential satisfaction: Ordered logit vs. regression models. Growth and Change, 30, 264-287.

McKelvey, R. D.Zavoina, W.(1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 104-120.

Quinn, K. (n.d.). NES 1996 [Data file and Code Book] Retrieved from

UCLA Academic Technology Services ( n.d.).[Hypothetical Data file] Retrieved from

Winship, C. & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49, 512-525.

Appendix A

###### Script for Running the Ordered Multinomial Logist Regression

#### Load Libraries

library(arm)

library(psych)

#### Load Data

grad <- read.csv("

##### Examine Data

str(grad)

summary(grad$gpa)

table(grad$apply)

table(grad$pared)

table(grad$public)

xtabs(~ grad$pared + grad$apply)

xtabs(~ grad$public + grad$apply)

##### Run ordered multinomial logistic regression

summary(m1 <- bayespolr(as.ordered(apply)~gpa,data=grad))

##### In order to interpret it we need to transform the data into a probability.

str(m1)

(beta <- coef(m1))

(tau <- m1$zeta)

x<- 3

##### Note: mean = 2.999

logit.prob <- function(eta){exp(eta)/(1+exp(eta))}

(p1 <- logit.prob(tau[1] - x * beta))

(p2<- logit.prob(tau[2] - x * beta) - logit.prob(tau[1] - x * beta))

(p3<- 1 - logit.prob(tau[2] - x * beta))

p1+p2+p3

##### Adding multiple predictors

summary(m2 <- bayespolr(as.ordered(apply)~gpa + pared + public ,data=grad))

#### Transforming the Coefficients to probabilities

(beta <- coef(m2))

(tau <- m2$zeta)

(x<- cbind(0:4, 0 , .15))

(x2<-cbind(0:4, 1 , .15))

logit.prob <- function(eta){exp(eta)/(1+exp(eta))}

(p1 <- logit.prob(tau[1] - x %*% beta))

(p2<- logit.prob(tau[2] - x %*% beta) - logit.prob(tau[1] - x %*% beta))

(p3<- 1 - logit.prob(tau[2] - x %*% beta))

(p4 <- logit.prob(tau[1] - x2 %*% beta))

(p5<- logit.prob(tau[2] - x2 %*% beta) - logit.prob(tau[1] - x2 %*% beta))

(p6<- 1 - logit.prob(tau[2] - x2 %*% beta))

#### Plotting our probability curves

Undergrad.GPA <-0:4

plot(Undergrad.GPA, p1, type="l", col=1, ylim=c(0,1))

lines(0:4, p2, col=2)

lines(0:4, p3, col=3)

lines(0:4, p4, col=1, lty = 2)

lines(0:4, p5, col=2, lty = 2)

lines(0:4, p6, col=3, lty = 2)

legend(1.5, 1, legend=c("P(unlikely)", "P(somewhat likely)",

"P(very likely)", "Line Type when Pared = 0",

"Line Type when Pared = 1"), col=c(1:3,1,1), lty=c(1,1,1,1,2))

##### Run model using Linear Regression

summary(m1.2<-lm(apply~gpa, data=grad))

#### Plot regression

plot(apply~gpa, data=grad)

abline(m1.2, col = "red")

#### Check Assumptions

par(mfrow = c(2,2))

plot(m1.2)

#### Will someone w/ a 3.0 gpa likely apply to gradschool?

(y.hat<-(-.2201 + (.2568 * 3)))

#### .5503 is their predicted apply score which means they are somewhere

#### between unlikely and somewhat likely. This is somewhat difficult to interpret

#### and it completely violates the assumptions for regression.

#### Practice Problem ****Quinn, K. (n.d.). *****

##### Description of Data (nes96)

### For more details see "NES 1996 [Data file and Code Book] ##Retrieved from

## /quinn/classes/536/data"

#ClinLR = Ordinal variable from 1-7 indicating ones view of Bill Clinton’s

#political leanings, where 1 = extremely liberal, 2 = liberal, 3 = slightly

#liberal, 4 = moderate, 5= slightly conservative, 6 = conservative, 6 =

#extremely conservative.

#PID = Ordinal variable from 0-6 indicating ones own political identification,

#where 0 = Strong Democrat and 6 = Strong Republican

#educ = Ordinal variable from 1-7 indicating ones own level of education,

#where 1 = 8 grades or less and no diploma, 2 = 9-11 grades, no further

#schooling, 3 = High school diploma or equivalency test, 4 = More than 12

#years of schooling, no higher degree, 5 = Junior or community college level

#degree (AA degrees), 6 = BA level degrees; 17+ years, no postgraduate degree,

#7 = Advanced degree

# load the data

nes96 <- read.table(" header=TRUE)

# check to see if the data were read in properly

head(nes96)

describe(nes96)

table(nes96$ClinLR)

table(nes96$PID)

table(nes96$educ)

xtabs(~ nes96$ClinLR + nes96$PID)

xtabs(~ nes96$ClinLR + nes96$educ)

#### Note: we do violate some of our assumptions, but we will run the model

#### anyway. We would probably not want to use these results for anything

#### except an practice example.

summary(m3 <- bayespolr(as.ordered(ClinLR)~PID+educ,data=nes96))

(beta <- coef(m3))

(tau <- m3$zeta)

(X <- cbind(0:6,4.57))

#### use this function to transform into probabilities

logit.prob <- function(eta){exp(eta)/(1+exp(eta))}

(p1 <- logit.prob(tau[1] - X %*% beta))