Logistic Regression Using SPSS - Practical Session

Logistic regression using SPSS - Practical session

The data for this session comes from a recently completed RCT, the PRESSURE trial (Nixon et al, BMJ (2006), 332 (7555), p1413). This was a large randomised trial comparing two alternating mattress surfaces which are designed to reduce areas of high pressure on hospitalised patients. The primary endpoint was the development of a new pressure ulcer of grade 2 or above.

Description of the data:

Trialno = patient identification number

Alloc = randomised mattress (1=overlay, 2=replacement)

Newpu60 = primary outcome (1=yes, 0=no)

Typead = type of admission (1= acute, 2 = elective)

Typesp = type of specialty (1=vascular, 2= orthopaedic, 3 = elderly)

Psgrade = existing pressure ulcer on any site at baseline (1=yes, 2=no)

Age = age at baseline in years

Diabetes = (1=yes, 2=no)

Bascore = Braden activity score (1=bedfast, 2=chairfast, 3 = walks occasionally, 4 = walks frequently)

Bmbscore = Braden mobility score (1=completely immobile, very limited, slightly limited, no limitations)

Part 1

Start by assessing the effect of mattress type on the probability of developing a new pressure ulcer. Use SPSS to create a 2 by 2 table of Alloc and Newpu60 to assess if there is an association between them using a chi-squared test. What is the null hypothesis, what do you conclude?

(Hint: use crosstabs under the descriptive statistics command).

Part 2

The randomisation was stratified by some factors that were considered to be prognostic factors for the development of an ulcer. This was to ensure balance in the numbers of patients allocated to each mattress within each of these strata. The stratification factors were: type of admission, type of specialty and existing pressure ulcer at baseline.

Because the randomisation was stratified, we need to adjust for these factors in the analysis.

Perform a univariate logistic regression including Alloc, compare the result with the result from part 1. Now include Typead, Typesp, Psgrade as well and report your findings. Which factors are having a significant effect on ulcer development; does your conclusion about the mattress change; and what are the odds ratios and 95% confidence intervals for all the terms in the model?

(Hint: these are all categorical predictors so don’t forget to set them as this. To get the 95% CI for the odds ratios click the box for “CI for exp(b)” under options)

Part 3

We have established that there is no evidence of a difference between the mattress groups. Now we are going to conduct some exploratory modelling to identify which patient factors are predictive of pressure ulcer development, look at measures of model fit, and the predicted probabilities from our final model.

Consider all the patient characteristics listed previously. We will not be looking at Alloc in this model.

Do you think any of these factors will be highly correlated? Before we start produce a correlation matrix using Spearman correlation as we have some categorical variables and assess if any of the correlations are larger than 0.5. Think about the classification of these terms from the description earlier and if it makes sense to include both in the same model.

Perform a backwards selection by fitting all terms (excluding any that you have decided that are highly correlated with another factor) in the model and then removing each of the least significant in turn (the highest p-value). What variables are in the final model? What are their p-values, odds ratios and corresponding 95% CIs?

Once you have arrived at the final model we will assess its fit and predictive ability. Rerun the model but this time click on the options box and select Hosmer and Lemeshow test, and classification plots. Under save, request some diagnostic measures (as per last weeks session on linear regression) by clicking on all the predicted values and influence boxes and select standardised residuals.

The Hosmer and Lemeshow test is a test of the goodness of fit of the model; the null hypothesis is that the model does not fit the data, what is the p-value and what does this tell us?

Have a look at the classification table and plot of predicted group membership. This is based on the predicted probabilities of developing a new ulcer for each patient and there actual outcome. Do you think this model is useful for predicting if a patient will develop a new ulcer?

Have a look of a plot of residuals against predicted values. Is it useful?

Plot Cooks statistics, and leverage, both against Trialno (on the x axis). What do you conclude?