Monitoring Statistical Indicators Using R/Shiny: a Case Studyon Public Transport Safety

Monitoring Statistical Indicators Using R/Shiny: A Case Studyon Public Transport Safety in Switzerland

TobiasSchoch ()[1]

Keywords:R/shiny, interactive visualization and evaluation, public transport, event data, statistical indicators

1.Introduction

The goal of the paper is to present the R-software tool [1],designed to support anindicator-based safety monitoring, which has been adopted by the Swiss safety authority for public transportation. This case study may serve as a showcase project

for analysis and monitoring of statistical indicators in Official Statistics using R;
on how to implement and operatean R-based, interactive monitoring system that meets the requirements of a corporate IT environment.

The paper is organized as follows. Section 2 sheds light on safety performance monitoring in the EU;in Sections 3, we discuss the system of indicators and the data in use. Section 4 is devoted to the implementation of the tool.Finally, Section 5summarizes the key messages.

2.Public Transport Safety in the European Union

The European Union Agency for Railways(ERA; prior to June 2016, European Railway Agency)strives to develop, promote and monitor a common EU approach to safety management and governance across stakeholders. Among the Agency’s core activities is themonitoring of safety performancein order to provide safety intelligence and information on risks to EU and national policy-making bodies (and to the general public). In a concerted activity, Member States (and associated countries) report event data on significant and serious railway accidents (resulting in the death of at least one person or five seriously injured persons or extensive damage to rolling stock, the infrastructure or the environment).Information about less serious accidents and other hazardous incidents are currently not systematically collected at the EU level. On the basis of these data, the Agency evaluates and monitors a set of 23 indicators (Common Safety Indicators) and publishesthe results on a biennial basis since 2006.

3.Safety Monitoring of Public Transportation in Switzerland

In Switzerland, the Swiss Federal Office of Transport (FOT) is the National Safety Authority in charge of monitoring (and regulation of) the safety performance concerning:

rail transport,
local transport (bus, tramways),
cable car (funicular, aerial lift, cableways),
ship transport / boats (on rivers and lakes).

As an associated country of the EU, Switzerland is affiliated with ERA and reports data on significant and serious railway accidents to the Agency. The safety monitoring effort undertaken by FOT is extensive and multi-layered,and is an important part of the risk management (including audits, inspections, etc.). The aspect of the safety monitoring referred to in this article, that is the indicator-based safety monitoring, is carried out by FOT in addition to audits and other monitoring approaches.

3.1.Data and Indicators

The data on accidents and severe incidents in Switzerland are reported to FOT by public and private transport operators in all four sectors of public transportation (see above). FOT processes and validatesthe reported data and compiles the Event Database on an annual basis; data are available since 2009 (for the period 2000-2008, only a reduced set of data exists).

Each eventin the Event Database is assigned a set of classifiers / codes due to the event’s characteristics in terms of the dimensions / questions “who”, “what”,“why” and “where”. Based on these classifiers, the events are then assigned a set ofBASE-level safety indicators (within each of the four “w”-dimensions), which serves as the foundation of the hierarchical system of indicators. Any set of indicators ranged higher in the overall hierarchy is composed of a nonoverlapping set of subordinate indicators. In total, the system of safety indicatorsis organized along the following three levels of abstraction:

TOP-levelsafety indicators (5 indicators; highest abstraction level;serves policy-making bodies; e.g. “TOP 5: Fatalities and weighted serious injuries”)
FOT-level safety indicators (55 indicators; e.g., “EA11: Train collisions”)
BASE-levelsafety indicators (229 highly specific indicators; e.g., “EA114: Collisions train with road vehicle”)

Along with the indicators and the four “w”-dimensions, the Event Database stores further event-specific attributes, such as data on the severityof personal injuries and damage to material as well as additional contextual information.

3.2.Monitoring Objective and Evaluation

The main objective of the indicator-based monitoringis to evaluate the safety performance in terms of the criterion: “maintain safety at least at its current level”. Therefore, an intertemporal evaluation of the safety performance is carried out; i.e., the current safety performance is related to the level of safety from a previous period (safety target). Safety performance and target are expressed in terms of the indicators and the following dimensions ofanalysis:

1. frequency / occurrence of events (e.g. number of accidents per month),
2. damage to persons (e.g. fatality weighted injuries),
3. damage to material (in monetary terms).

Besides overall evaluation, whether the current safety levels meet the targets, the software tool in use also facilitates subgroup comparisons (e.g., person and freight transport). Moreover, the tool provides the means to compare individual transport carriers or companies with each otherinstead of intertemporal comparisons. In these cases, the event data are normalized (e.g. by train-kilometre, route length, freight volume, etc.) prior to analysis in order to account for the companies’ characteristics.

3.3.Statistical Methods in Use

The set of statistical methods implemented in the monitoring tool has been chosen to meet the requirements of the evaluation of the indicators (notably, for intertemporal analysis) and includes:

overlapmethod of the confidence intervals[2] for the arithmetic mean, where the underlying distributional assumptions can be chosen by the analyst (among the models: Poisson, negative binomial, exponential, gamma, Weibull; alternatively, an approximate ABC bootstrap method is available),
Wilcoxon-Mann-Whitney test on homogeneity of the distributions of the safety performance and target; (nonparametric,rank-based statistic and thus not heavily influenced by extreme events),
trend estimation and evaluation (Kendall-Mann test statistic or a model with a deterministic trend),
moving averagesfor sub-annual analysis (approximate ABC bootstrap for statistical inference).

The choice of methods is left to the analyst in charge. The tool provides graphical diagnostic measures the analyst can consult when choosing an appropriate statistical method.

Figure 1.Schematic display of the interaction between R and a web browser via shiny

4.Implementation in R /Shiny

The monitoring tool is implemented in theR statistical software and uses the web application frameworkshiny [2]; see Fig. 1.The tool is operated using a web-based graphical user interface and does not require any knowledge of R. The data managementbuilds on the functionality of the R-packagedata.table [3].To be functional, the tool only requires (i) a local installation of R and (ii) a web browser (Mozilla Firefox, Internet Explorer, etc.).

Figure 2. Monitoring tool with feature “risk overview” (left panel: data and indicator selection, specification of the time window etc.; main panel: “risk overview” showing some key figures for a set of indicators)

In the current implementation, the user can choose and modify the selected indicators, time windows, statistical methods, etc. in an interactive manner; see left-hand side panel in Fig. 2. The main features of the tool are organized as tabs (see Fig. 2) and encompass the following:

visualization and evaluation features
time series plots of the indicators (and evaluation; see Fig. 3)
trend evaluation
moving average plots (monthly / sub-annual data)
causal analysis (contingency tables and histograms)
risk overview
utility features
distributional analysis (diagnostic measures to check whether the imposed distributional assumptions are met)
tabular display of an event’s characteristics (in-depth analysis of singular events)
automatic generation of an evaluation report (output format: Microsoft Excel™)

Figure 3. Time series plot and statistical evaluation whether the actual safety performance meets the safety target (here, occurrence of events; confidence intervals on the right-hand side).

5.Key Messages

The scope of the presented tool is not limited to safety-related monitoring applications; the tool – at least its basic structure – could be applied inthe contextof virtually any indicator-based monitoring project.

The R/shiny monitoring tool can be operated without knowledge of R.Thus,the tool is open to a wide range of potential users / analysts.

Shiny-apps exploit R’s rich and extensively tested set of statistical methods (this is not necessarily the case for e.g. Java-based implementations).

 The R/shiny configuration is “mature” and meets the demands in terms of reliability (also in a corporate IT environment).

References

[1]R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

[2]W. Chang, J. Cheng, JJ Allaire, Y.Xie and J. McPherson (2016). shiny: Web Application Framework for R. R package version 0.14.2.

[3]M.Dowle and A. Srinivasan (2016). data.table: Extension of `data.frame`. R package version 1.10.0.

[1]ECOPLAN AG – Research in Economics and Policy Consultancy, Monbijoustrasse 14, CH-3011 Bern, Switzerland,

The software tool presented in this article has been developed on behalf of the Swiss Federal Office of Transport (FOT). The information and views set out in this article are those of the author and do not necessarily reﬂect the oﬃcial opinion of FOT. Responsibility for the information and views expressed therein lies entirely with the author.

[2]We use a modified method of the confidence interval overlap, not the naïve method; see e.g. Schenker / Gentleman (2001): On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals, The American Statistician 55, pp. 182-86.