1

Logistic Regression, Testing for Interaction

Example, Influenza Shots

A local health clinic sent fliers to its clients to encourage everyone, but especially older persons at high risk of complications, to get a flu shot for protection against an expected flu epidemic. In a pilot follow-up study, 159 clients were randomly selected and asked whether they actually received a flu shot. A client who received a flu shot was coded Y = 1; and a client who did not receive a flu shot was coded Y = 0. In addition, data were collected on their age (X1) and their health awareness. The latter data were combined into a health awareness index (X2), for which higher values indicate greater awareness. Also included in the data were client gender (X3), with males coded X3 = 1 and females coded X3 = 0.

It is suspected that there may be some interactions between predictor variables; e.g., perhaps the relationship between health awareness and the response variable is mediated by gender. Hence, we want to test for interaction effects. To do this, we will estimate two logistic regression models – one with interactions included and another without.

Assuming that we have already looked at the univariate relationships between Y and each explanatory variable, we will proceed to look for possible interaction effects.

The reduced model (with only the three explanatory variables) is

,

where the i subscript denotes the ith observation in the data set, and is a random error term associated with the ith observation.

The full model (with interaction terms included) is

.

We want to test whether there are interaction effects present.

Step 1: H0: v.HA: Not all 0.

Step 2: We have ,  = 0.05.

Step 3: The test statistic is the Likelihood Ratio chi-square statistic , where is the maximum of the log-likelihood function for the model with the interaction terms as well as the predictors, and is the maximum of the log-likelihood function for the model with just the predictor variables. Under the null hypothesis, the statistic has a chi-square distribution with

d.f. = 3.

Step 4: We will reject the null hypothesis if G2.

Step 5: From the output, we find G2 = 105.093 – 104.994 = 0.099.

Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have sufficient evidence to conclude that the interaction terms need to be included in the model.

If we had rejected the null hypothesis, then we could have used a follow-up procedure, such as stepwise multiple regression, to find which explanatory variables and which interaction terms would need to be included in the model. If we find that a particular interaction term is significant, then we would also want to include those two explanatory variables in our final model

(Note: In this case, if we were to perform stepwise regression, we would find that only two of the predictors, Age and Health Awareness, would need to be included in our final model.)

The estimated model is therefore (from the output of the last PROC LOGISTIC):

, or

(1) .

We also want to check the assumption that the logit is linear in each of the (continuous) predictor variables. There are several ways to do this. One way is a rather tedious, graphical approach, involving grouped data. A simpler approach is to test whether we need to include nonlinear terms in the model. To do this, we will use the final model above as our reduced model, and add quadratic terms in each of the two explanatory variables, so that the full model is

.

We want to test whether these quadratic terms are needed. The last PROC LOGISTIC in the SAS program below estimates the model with the quadratic terms included.

Step 1: H0: v.HA: Not both 0.

Step 2: We have ,  = 0.05.

Step 3: The test statistic is the Likelihood Ratio chi-square statistic , where is the maximum of the log-likelihood function for the model with the quadratic terms as well as the predictors, and is the maximum of the log-likelihood function for the model with just the predictor variables. Under the null hypothesis, the statistic has a chi-square distribution with

d.f. = 2.

Step 4: We will reject the null hypothesis if G2.

Step 5: From the output, we find G2 = 105.795 – 104.706 = 1.089.

Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have sufficient evidence to conclude that the quadratic terms need to be included in the model.

Our final model therefore is given by Equation (1) above.

The estimate of the regression slope for Age is , with a standard error of . Thus, a 95% confidence interval estimate for the slope is

. Now, since Age is a continuous variable, it is not very interesting to consider the odds ratio for a unit increase in Age. Instead, we will calculate a 95% confidence interval estimate of the odds ratio for an increase in Age of 5 years. The point estimate of the odds ratio is , and a 95% confidence interval estimate is . We are 95% confident that the odds of having had a flue shot increase by between 1.1034 and 1.9750 for each 5-year increase in Age, for this population.

The SAS program for conducting data analysis is given below, followed by the output.

SAS Program

procformat;

value difmt 0 = "No "

1 = "Yes";

value sexfmt 0 = "Female"

1 = "Male ";

;

data flushot;

input y x1 x2 x3;

x1x2 = x1*x2;

x1x3 = x1*x3;

x2x3 = x2*x3;

x1sq = x1**2;

x2sq = x2**2;

label y = "Flu Shot?"

x1 = "Age in Years"

x2 = "Health Awareness Index"

x3 = "Gender"

x1x2 = "Interaction of Age with Health Awareness"

x1x3 = "Interaction of Age with Gender"

x2x3 = "Interaction of Health Awareness with Gender"

x1sq = "Square of Age in Years"

x2sq = "Square of Health Awareness";

format y difmt. x3 sexfmt.;

cards;

The data set is listed in the appendix.

;

proclogistic;

model y (order=formatted event='Yes') = x1 x2 x3;

title"Multiple Logistic Regression of Flu Shot";

title2"Against Age, Health Awareness, and Gender";

;

proclogistic;

model y (order=formatted event='Yes') = x1 x2 x3 x1x2 x1x3 x2x3;

title"Multiple Logistic Regression of Flu Shot";

title2"Against Age and Health Awareness";

title3"With Interaction Terms Included";

;

proclogistic;

model y (order=formatted event='Yes') = x1 x2;

title"Multiple Logistic Regression of Flu Shot";

title2"Against Age and Health Awareness";

title3;

;

proclogistic;

model y (order=formatted event='Yes') = x1 x2 x1sq x2sq;

title"Multiple Logistic Regression of Flu Shot";

title2"Against Age and Health Awareness";

title3"With Quadratic Terms Included";

;

run;

Output of SAS Program

Multiple Logistic Regression of Flu Shot

Against Age, Health Awareness, and Gender

The LOGISTIC Procedure

Model Information

Data Set WORK.FLUSHOT

Response Variable y Flu Shot?

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 159

Number of Observations Used 159

Response Profile

Ordered Total

Value y Frequency

1 No 135

2 Yes 24

Probability modeled is y='Yes'.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 136.941 113.093

SC 140.010 125.369

-2 Log L 134.941 105.093

Multiple Logistic Regression of Flu Shot

Against Age, Health Awareness, and Gender

The LOGISTIC Procedure

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 29.8476 3 <.0001

Score 27.0173 3 <.0001

Wald 19.9803 3 0.0002

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.1772 2.9824 0.1558 0.6930

x1 1 0.0728 0.0304 5.7401 0.0166

x2 1 -0.0990 0.0335 8.7419 0.0031

x3 1 0.4339 0.5218 0.6917 0.4056

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

x1 1.076 1.013 1.141

x2 0.906 0.848 0.967

x3 1.543 0.555 4.291

Association of Predicted Probabilities and Observed Responses

Percent Concordant 82.1 Somers' D 0.644

Percent Discordant 17.7 Gamma 0.645

Percent Tied 0.2 Tau-a 0.166

Pairs 3240 c 0.822

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

With Interaction Terms Included

The LOGISTIC Procedure

Model Information

Data Set WORK.FLUSHOT

Response Variable y Flu Shot?

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 159

Number of Observations Used 159

Response Profile

Ordered Total

Value y Frequency

1 No 135

2 Yes 24

Probability modeled is y='Yes'.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 136.941 118.994

SC 140.010 140.476

-2 Log L 134.941 104.994

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

With Interaction Terms Included

The LOGISTIC Procedure

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 29.9472 6 <.0001

Score 32.4819 6 <.0001

Wald 19.4560 6 0.0035

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 2.6164 13.5486 0.0373 0.8469

x1 1 0.0138 0.2007 0.0047 0.9452

x2 1 -0.1650 0.2490 0.4394 0.5074

x3 1 0.3907 6.0739 0.0041 0.9487

x1x2 1 0.00103 0.00371 0.0773 0.7809

x1x3 1 0.00588 0.0615 0.0091 0.9238

x2x3 1 -0.00634 0.0679 0.0087 0.9255

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

x1 1.014 0.684 1.503

x2 0.848 0.520 1.381

x3 1.478 <0.001 >999.999

x1x2 1.001 0.994 1.008

x1x3 1.006 0.892 1.135

x2x3 0.994 0.870 1.135

Association of Predicted Probabilities and Observed Responses

Percent Concordant 82.5 Somers' D 0.653

Percent Discordant 17.2 Gamma 0.656

Percent Tied 0.4 Tau-a 0.168

Pairs 3240 c 0.827

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

The LOGISTIC Procedure

Model Information

Data Set WORK.FLUSHOT

Response Variable y Flu Shot?

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 159

Number of Observations Used 159

Response Profile

Ordered Total

Value y Frequency

1 No 135

2 Yes 24

Probability modeled is y='Yes'.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 136.941 111.795

SC 140.010 121.002

-2 Log L 134.941 105.795

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

The LOGISTIC Procedure

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 29.1454 2 <.0001

Score 26.7071 2 <.0001

Wald 19.8291 2 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.4578 2.9153 0.2500 0.6170

x1 1 0.0779 0.0297 6.8761 0.0087

x2 1 -0.0955 0.0324 8.6786 0.0032

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

x1 1.081 1.020 1.146

x2 0.909 0.853 0.969

Association of Predicted Probabilities and Observed Responses

Percent Concordant 80.7 Somers' D 0.618

Percent Discordant 18.9 Gamma 0.620

Percent Tied 0.4 Tau-a 0.159

Pairs 3240 c 0.809

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

With Quadratic Terms Included

The LOGISTIC Procedure

Model Information

Data Set WORK.FLUSHOT

Response Variable y Flu Shot?

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 159

Number of Observations Used 159

Response Profile

Ordered Total

Value y Frequency

1 No 135

2 Yes 24

Probability modeled is y='Yes'.

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 136.941 114.706

SC 140.010 130.050

-2 Log L 134.941 104.706

Multiple Logistic Regression of Flu Shot

Against Age and Health Awareness

With Quadratic Terms Included

The LOGISTIC Procedure

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 30.2348 4 <.0001

Score 34.2112 4 <.0001

Wald 19.4995 4 0.0006

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.2193 14.2594 0.0002 0.9877

x1 1 0.2296 0.4052 0.3210 0.5710

x2 1 -0.3518 0.2638 1.7780 0.1824

x1sq 1 -0.00112 0.00303 0.1363 0.7120

x2sq 1 0.00238 0.00236 1.0171 0.3132

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

x1 1.258 0.569 2.784

x2 0.703 0.419 1.180

x1sq 0.999 0.993 1.005

x2sq 1.002 0.998 1.007

Association of Predicted Probabilities and Observed Responses

Percent Concordant 81.2 Somers' D 0.629

Percent Discordant 18.3 Gamma 0.632

Percent Tied 0.4 Tau-a 0.162

Pairs 3240 c 0.815

Appendix: Flu Shot Data

0 59 52 0

0 61 55 1

1 82 51 0

0 51 70 0

0 53 70 0

0 62 49 1

0 51 69 1

0 70 54 1

0 71 65 1

0 55 58 1

0 58 48 0

0 53 58 1

0 72 65 0

0 56 68 0

0 56 83 0

0 81 68 0

0 62 44 0

0 49 70 0

0 56 69 1

0 50 74 0

0 53 57 0

0 56 64 1

0 56 67 1

0 50 83 1

0 52 48 1

0 52 81 0

0 67 53 1

0 51 61 0

0 70 51 0

0 64 51 0

0 61 65 1

0 53 51 0

0 77 54 1

0 73 64 1

0 67 69 0

0 50 71 0

1 80 38 0

1 75 51 0

0 65 54 1

0 60 59 1

1 68 57 1

0 61 63 0

1 62 48 0

0 53 58 0

0 72 56 0

0 54 59 0

1 59 75 0

0 61 48 0

0 50 79 1

0 48 66 0

0 52 57 1

0 54 68 0

0 62 48 0

0 71 60 0

1 65 63 0

0 49 61 0

0 58 57 0

0 62 69 0

0 69 38 1

1 56 50 1

1 76 45 1

0 51 72 0

0 64 51 0

0 57 62 1

0 51 81 0

0 81 55 1

0 50 77 0

0 64 65 1

0 64 53 1

1 59 49 1

0 53 65 0

0 63 58 0

0 59 60 1

1 70 57 1

0 72 37 0

0 68 49 0

0 75 55 1

0 57 60 0

0 67 57 1

0 59 56 1

0 55 58 0

1 75 64 1

0 66 51 0

0 67 59 0

0 59 61 0

0 78 49 1

0 59 49 0

0 68 55 0

1 59 61 1

0 68 50 1

1 78 47 1

0 55 73 1

1 71 45 1

0 51 45 0

0 65 59 0

0 54 61 1

0 79 52 0

0 64 50 0

1 82 46 1

0 64 67 0

0 70 56 1

1 59 50 0

0 59 56 1

0 63 61 1

0 48 74 0

0 61 78 0

0 51 68 0

0 48 71 0

1 71 58 1

0 51 57 0

0 57 51 1

0 49 74 0

0 67 56 1

0 73 57 0

0 73 65 0

0 56 47 0

0 48 69 1

0 50 71 0

0 50 76 1

0 66 60 1

0 53 75 1

0 50 65 1

1 51 42 0

0 68 66 1

1 72 49 1

0 51 58 1

0 62 61 1

0 60 55 0

0 67 60 1

0 70 54 1

0 55 63 1

0 66 56 0

0 65 59 1

0 84 52 1

0 58 63 0

1 68 57 1

0 51 59 1

0 67 53 1

0 52 67 0

0 68 62 0

0 76 63 1

0 54 62 1

0 50 52 1

0 63 58 0

0 77 49 1

0 60 65 1

0 51 55 0

0 51 60 1

0 66 51 1

0 52 67 0

0 66 64 1

0 56 55 1

0 49 58 0

0 67 66 0

0 57 64 1

0 56 66 0

1 76 22 1

1 68 32 0

1 73 56 1