This Paper Illustrates How to Prepare a Manuscript

Assessment of the benefits of Discrete Conditional Survival Models in modelling ambulance response times

Karen J. Cairns, Adele H. Marshall

Centre for Statistical Science and Operational Research (CenSSOR),

Queen’s University Belfast.

Keywords: Time-to-event data; Distribution-fitting; Data-mining; Ambulance response.

Acknowledgements: KJC is supported through an Engineering & Physical Sciences Research Council (EPSRC) RCUK Academic Fellowship.

Corresponding Author:

Dr Karen Joanne Cairns

Centre for Statistical Science and Operational Research (CenSSOR)

Sir David Bates Building, Room 01.008

Queen’s University Belfast

University Road

Belfast BT7 1NN

Email:

Telephone: +44 (0)28 9097 6058

Fax: +44 (0)28 9097 6061

Assessment of the benefits of Discrete Conditional Survival Models in modelling ambulance response times

Abstract: Many of the challenges faced in health care delivery can be informed through building models. In particular, Discrete Conditional Survival (DCS) models, recently under development, can provide policymakers with a flexible tool to assess time-to-event data. The DCS model is capable of modelling the survival curve based on various underlying distribution types and is capable of clustering or grouping observations (based on other covariate information) external to the distribution fits. The flexibility of the model comes through the choice of data mining techniques that are available in ascertaining the different subsets and also in the choice of distribution types available in modelling these informed subsets. This paper presents an illustrated example of the Discrete Conditional Survival model being deployed to represent ambulance response-times by a fully parameterised model. This model is contrasted against use of a parametric accelerated failure-time model, illustrating the strength and usefulness of Discrete Conditional Survival models.

1 Introduction

Policy makers and health care providers must determine how to provide the most effective health care to citizens using the limited resources available to them. They need effective methods for planning, prioritisation, and decision making, as well as effective methods for management and improvement of health care systems (Brandeau et al 2004). Operational Research (OR) techniques can inform these processes with a wide range of health OR illustrated in the literature (Davies and Bensley 2005, Brailsford and Harper 2007, Baker et al 2008, Royston 2009). Methodologies considered vary with the problems being addressed and range from ‘soft’ OR techniques to more quantitative approaches such as mathematical modelling, simulation, queuing theory and system dynamics.

Emergency response issues are amongst the problems being addressed, with much of the earlier research focusing on location planning issues particularly in urban areas (Simpson and Hancock 2009). Part of this research area, originating in the research of Kolesar and Blum (1973), has examined the relationship between emergency response times and distance (the ‘square-root law’). The derivation of such relationships has informed and aided understanding, and has formed the cornerstone of further analytic models developed (Green and Kolesar 2004, Budge et al 2010). Indeed, Erkut et al (2008) indicates that it is generally more useful to know the entire response time distribution, rather than considering just specific quantiles of it – something which many performance measures typically correspond to (e.g. the percentage of urgent emergency incidents reached within 8 minutes is a performance measure used within the National Health Service in the United Kingdom (Department of Health 2009)).

This paper presents the family of Discrete Conditional Survival (DCS) models recently under development. This toolkit of models aims to aid understanding of time-to-event data, by modelling the entire time-to-event distribution through a fully parameterised model, and should be flexible to modelling many forms of such data within health care. In particular this paper presents the application of the DCS model to a particular emergency response example – modelling ambulance response times.

The paper first presents some background information on the data being analysed. An outline of Discrete Conditional Survival models is then presented, together with information on the choice of techniques deployed from the toolkit for this particular example. In order to assess the usefulness of this model, a comparison has then been made with results from one of the most common regression-based techniques used to fully parameterise survival data, namely the parametric accelerated failure-time model (Kalbfleisch and Prentice 1980).

2 Data Set of Ambulance Response Times

Ambulance response time data has been examined from a region of the United Kingdom (Northern Ireland), where the Northern Ireland Ambulance Service (NIAS) is responsible for providing emergency medical response. NIAS currently responds to over 115,000 emergency calls in a year, through a fleet of over 300 ambulances, operating from 52 ambulance stations and sub-stations. It serves a population of over 1.7 million, with an operational area of 14 000 square kilometres (Northern Ireland Ambulance Service, 2009a).

The data set considered contains dispatch event details (for example, the date and time of an event, its geographical position, and the perceived severity categorisation of the emergency call) together with response time information (e.g. response time(s), type of emergency vehicle(s) responding) for all emergency calls in Northern Ireland in the year 2003.

The illustrated example considered in this paper considers the sub-set of all emergency response activations where an ambulance response to the incident was achieved (i.e. excludes cancellations, where the ambulance never reached the scene) and considers only the best response time (in the case where multiple vehicles are dispatched). In 2003, there were 75,774 such responses to emergency incidents across the entire Northern Ireland region. Removal of records where either the response time has not been collected, or geographical location information is incomplete reduces the number of observations to 73,190 (96.6%).

For the purposes of assessing how well the DCS and parametric accelerated failure-time models perform, the data was separated into training (50%) and test sets. It was ensured that training and test sets contained observations across the Northern Ireland region, at locations both close and far away from ambulance stations.

3 Discrete Conditional Survival (DCS) Model

Discrete Conditional Survival (DCS) models are a family of models capable of representing a skewed survival distribution as a Process Component preceded by a set of related variables that determine the clustering or grouping of entities (or observations) into distinct classes (the discrete classes), that may be referred to as the Conditional Component. The models possess the following characteristics:

· The Conditional Component comprises a structure that captures the nature of the data by representing the various inter-relationships between variables, and thus can categorise observations into a number of discrete classes.

· The Process Component represents the skewed survival distribution of each discrete class by an appropriate distribution form.

Figure 1 illustrates the general form of the DCS model comprising these two components. This figure illustrates that many kinds of data-mining techniques could represent the Conditional Component, with the illustrated example in this paper utilising multinomial logistic regression. The figure also highlights that a number of survival distribution forms can be considered for the Process Component, with the DCS model incorporating the assessment of the most appropriate fit.

This model expands previous research which had led to the development of the Conditional Phase-type (C-Ph) model, which describes duration until an event occurs in terms of a process consisting of a sequence of latent phases (the Process Component) which are conditioned on a set of inter-related variables represented by a Bayesian network (the Conditional Component) (Marshall and McClean 2003). Previous research fitted the C-Ph model, a special type of DCS model, by considering the model structure as having one entity for which the likelihood value was calculated. This led to very cumbersome and difficult calculations for the likelihood value of every possible Conditional Component structure along with every possible survival distribution fit. To ease this process, the flexible nature of the DCS model allows for the two components in the model to be fitted separately combining the result in an overall likelihood. To do this, requires the separate inspection of each combination of component variables and how they relate to survival. As a result this will ultimately reduce the complexity in model fitting.

3.1 Conditional Component

The Conditional Component of the DCS model categorises observations into a number of discrete classes, with the aim that the survival of entities in each discrete class differ (and so the resulting survival distributions of the discrete classes will be distinguishable). To achieve this, various data-mining techniques (see Figure 1) can be used to consider the influence of covariates on survival or, has in previous research, on a correlated intermediate variable (Marshall and Burns 2007).

3.1.1 Multinomial Logistic Regression

In this illustrated example multinomial logistic regression is used with the aim to accurately predict the most probable response time-band for each emergency incident (through the consideration of the influence of other covariates). In multinomial logistic regression, a special case of the discrete choice model introduced by McFadden (1974), the probability a response, Y, belongs to the ith of k+1 classes satisfies the following relationship:

(1)

where Y is the discrete response of an entity (or observation) taking one of k+1 possible values (discrete classes), is the vector of explanatory variables for the entity, are the k intercept parameters, and are k vectors of parameters.

The fitting of multinomial logistic regression models is possible in a number of software packages. This work was performed in SAS (version 9.2), using the PROC LOGISTIC procedure.

3.1.2 Application to Ambulance Response Time Data

To fit such a model the continuous ambulance response time variable had to be converted into a discrete response, Y. The discrete response, Y, considered was directed by the target and performance measures of NIAS. Over the last number of years performance has been monitored and targets set by considering the proportion of incidents responded to within 8 minutes and again within 18 minutes (Northern Ireland Ambulance Service, 2004 and Northern Ireland Ambulance Service, 2009b). However, rather than limit the response, Y, to just three response time-bands: [0, 8); [8, 18); and [18, ¥], the following five response time-bands: [0, 5.5); [5.5, 8); [8, 11.5); [11.5, 18); and [18, ∞) were considered. The reason for further subdividing was to enhance the quality of the resulting multinomial logistic regression model (bearing in mind the large volume of data available).

Within the ambulance response time data set there are a number of covariates available that could be incorporated into the vector of explanatory variables . These covariates may provide information on either the geographical location of an incident, temporal information on when an incident occurred, or information relating to the response deployment. The functional form utilised for these covariates within the model may improve the goodness of fit. Table 1 provides details of the different covariates that have been considered for inclusion in different multinomial logistic regression model fits. Notice in the case of some pairs of the categorical covariates listed (e.g. u and d, h and g), the covariates actually correspond to different levels of sub-grouping of categorical variables, and thus only one is potentially selected in any given set of explanatory variables . Similarly, for highly correlated variables (e.g. the geographical location information r2 and s2) only one is potentially selected in any given set of explanatory variables . Over 100 different sets of covariates have been considered for inclusion in different multinomial logistic regression model fits. All of these sets included either r1 or s1, a measure of the distance between the incident and the closest ambulance station, given its strong influence on response time (Kolesar and Blum 1973). Some of these sets also considered the effects of interaction between different covariates e.g. the interaction between u and r1. Fits were also performed based on utilising backward elimination and forward selection techniques.

The optimal choice of model (and the explanatory variables to be retained) was selected using Schwarz’s Bayesian Criterion (SBC) (Schwarz 1978), calculated as follows:

(2)

where is the number of parameters to be estimated for the model with a set of explanatory variables i, n is the number of observations, and is the maximised value of the likelihood function (in this case based on a generalised logit model with a set of explanatory variables i). The model with the set of explanatory variables corresponding to the lowest SBC value was selected. The SBC tends to penalise overly complex models (more than the Akaike’s Information Criterion (AIC) (Akaike 1974)) and is useful for finding the simplest model that still represents the data accordingly.

The multinomial logistic regression model has been fitted to the training data from each of the 26 Local Government Districts (LGDs), with optimal model fits determined in each case using SBC. Examination of these optimal models suggests that the explanatory variables (influencing the prediction of response time-band) vary across the 26 LGDs. For 4 of the 26 LGDs only the radial distance r1 is suggested by the optimal model to influence the prediction of the response time-band. In other LGDs however a number of covariates are found to influence the prediction of the response time-band.

Ards Local Government District

For example, in the case of Ards LGD (LGD=2), in the east of Northern Ireland, the optimal model is found include interaction terms between the radial distance r1 and the urban/rural indicator variable, u, such that:

(3)

This model suggests that within this LGD changes in response time-band predicted may occur at comparably shorter radial distances in urban areas in comparison to that of rural areas. The model also suggests the predicted response time-band is also influenced by proximity to the second closest ambulance station.

A few illustrative situations for the model of this LGD are considered in Figure 2. For example, consider an incident occurring in an urban region, where there are two ambulance stations relatively close (r1≤r2=2.4 miles). In this case the response is likely to be quick, with the predicted response time-band being either band 1 or 2 (i.e. predicted below the 8 minutes target), except for values of r1>2.14 miles where the response time-band changes to band 4 (i.e. predicted below the 18 minutes target). The second sub-plot of the figure illustrates the prediction for an incident occurring in an urban region where the second closest ambulance station would be considered at a large distance away (r2=7.4 miles). Here the predicted response is below the 8 minutes target provided r1<1.5 miles (not as large as when there are two ambulance stations relatively close), otherwise it is predicted below the 18 minutes target.