Report of CfH Evaluation Programme workshop on evaluation of large-scale information systems

Prof Jeremy Wyatt, IDH Warwick, 16-7-12

The workshop was held on 7-2-12 and attracted 26 invited participants and speakers from the US, Scotland, Netherlands & England to discuss the methods used during the evaluation of NHS systems during the 5 year, £5M CFHEP set up by Professor Richard Lilford and Professor Jeremy Wyatt in 2005/6. This was by any standards a highly productive small research programme and had resulted in 54 publications so far, including one in the Lancet and seven in the BMJ from the 8 commissioned projects. See Appendix for the programme.

Topics covered in the workshop included:

  • The range of approaches applied [see later] and necessary to give a rounded picture of the use and impact of large scale systems, from very positivist techniques like RCTs to more constructivist approaches like ethnography, narratology and case studies
  • The importance of the context in which a study is performed [eg. clinical setting, type of users, type / version of system, type of problems studied], and the risks of interpreting study results ignoring the context.
  • Issues around investigator independence and integrity and the politics of large scale systems and remote care systems.
  • The role of professional societies such as EFMI / IMIA, with a database of 1500 evaluation studies, generic guidelines on the conduct (GEP-HI) and reporting (STARE-HI) of studies. See: [ importance of reminding funders and others about the value of rigorous studies eg. Declaration of Innsbruck, Bellagio [

The value of evaluation studies:

  1. To learn as we go along [formative]
  2. To ensure our systems are safe & effective, solve more problems than they create
  3. To inform decisions made by others [summative]
  4. To publish, add to the evidence base
  5. To account for money spent (ie. cover our backs)
  6. To persuade stakeholders: health professionals, politicians, organisations, patients…
  7. To support reflective practice by HI professionals

The potential barriers to carrying out informative evaluation studies, including:

  • “Fear of the clear” ie. reluctance to commission evaluation studies whose results might not favour the technology under test– linked to the political and economic consequences of study results
  • Training in the full range of evaluation methods
  • Funding of studies
  • The challenging complexity of perspectives and technologies
  • Mismatch in the timescales of evaluation and system implementation; several projects had experienced delays in system delivery or scaling that prevented the start of studies until months later.
  • The lack of validated instruments to support many studies
  • The fact that IT constantly evolves, and that installations nearly always differ from site to site
  • A mismatch in the academic criteria for proof (p< 0.05) and managerial standards of proof (perhaps around P < 0.5 ?)

It was unclear where we could turn to address many of these barriers but an agreement between the DH Informatics Division and the NIHR to explore these would be a promising start.

A wide range of evaluation methods and techniques were used, including:

  • Multi-channel video recording and encoding of the video using ObsWin to segment the consultation. [NB. Pringle showed in 1990 BMJ paper that videoing had no observable effect on the consultation]
  • Video prompted recall to help patients & clinicians recall their feelings about the consultation at each stage
  • Use of the Roter Interaction Analysis system to classify the nature and content of spoken dialogue
  • Using a tabular layout to allow the researcher to interpret across multiple different streams of observation made using different methods (eg. spoken narrative; GP action; GP prompted recall; patient recall)
  • Studying the macro, meso and micro levels using a range of observational and other methods to explore the lived experience of systems users and others
  • Internal and externally controlled before after studies to strengthen simple before after designs [Wyatt J et al. When and how to evaluate clinical information systems ? International Journal of Medical Informatics 2003; 69: 251 - 259]
  • Capturing reasons for failure using a variety of methods – needs to be anticipated in all studies that it might be negative, and we need to learn why
  • Actor-network theory and other socio technical systems analysis methods
  • Action Research [but see Lilford R et al. Action research: a way of researching or a way of managing? J Health Serv Res Policy.2003; 8: 100-104]
  • Simply consulting people on the ground for their views and predictions about the system, which were often remarkably accurate
  • Taking a local, lifetime view and document the journey of local implementation, analogous to documenting a child’s birth and growth to adulthood.
  • Multiple case study approach, treating each site / system combination as a distinct case
  • Realist Evaluation (Pawson et al)

Ways to improve the number and quality of studies carried out could include:

  • Encouraging more journals to adopt the published reporting guidelines
  • Ensuring that evaluation is included as a core module in all masters courses in the field.
  • Carrying out regular audits of the quality of studies published in major conferences and journals, such as the EFMI pilot study using STARE-HI derived criteria
  • Augmenting current databases of studies with exemplars of both good and bad practice
  • Systematic reviews to identify and promote the best quality studies of certain system types or use in certain care groups, eg. the 2008 SR of eHealth systems in less developed settings [Fraser et al. Health Affairs]
  • Identifying key generic questions about eHealth eg. when is it essential, for whom does it bring benefit ? Who does not benefit / which kinds of system are less effective ?
  • Training policy makers, system purchasers to demand good quality evidence form studies before adopting a new technology
  • A taxonomy of system types, to enable decision makers (and those carrying out systematic reviews) to identify studies more accurately
  • A register of all RCTs in the area – though registration is probably already mandatory
  • Exploring the literature around complex adaptive systems and lessons from this for health informatics

Nick Barber provided an entertaining and thoughtful interlude on the theme of building a house on shifting sands. During two evaluation projects, the following were in frequent / constant flux:

  • Implementation dates and sites
  • Software functions and versions
  • The number of pilot sites
  • The amount of notice given to evaluators
  • The work practices of key partners in the evaluation
  • The priorities of the audiences for the study results

This meant that trying to adhere to a fixed protocol for an RCT became increasingly untenable. The underlying assumption of RCTs, that the intervention does not change, was challenged. [but see “When and how to assess fast-changing technologies: a comparative study of medical applications of four generic technologies”.HTA ref: 93/42/01.].

Richard Lilford described a series of new methods for evaluating the effectiveness and particularly cost effectiveness of service level interventions cf. clinical level. Following on from earlier speakers, he identified health information systems as a diffuse intervention acting at a higher level than eg. drugs, with multiple effects mediated through several pathways, though sometimes a single measure such as improved HCP knowledge or leadership could be the main agent of change. He distinguished generic service processes (eg. introduction of PDSA cycles) from targeted processes (eg. reminders) [Brown & Lilford BMJ 2008 / QSHC 2008]. Such technologies need to be developed incrementally with more complex assessment models (eg. accept / reject / improve). Various metrics and techniques could be used to inform this development and stage gate evaluation process, such as:

  • The minimum necessary effectiveness – based on the cost and an assumed cost effectiveness threshold for NHS adoption
  • The headroom – using a limiting case analysis
  • A back of envelope Value of Information analysis, to inform the decision about whether a study to evaluate the technology could ever be was affordable [Lilford et al. BMJ.2010;341:c4413]
  • Option value analysis – based on the economics of futures and hedge funds.

A further insight was that it was challenging to retain the threshold of £20k per QALY for more diffuse interventions such as ICT, and that often it would need to be significantly lower.

The meeting concluded with an informal discussion at which some of the problem owners shared their insights, including:

  • There is a keen interest in recording and sharing the lessons learned from NPfIT / CfH, both on the technology / implementation issues and on lessons learned about how to evaluate such systems.
  • There is a continuing need to research barriers to the uptake and spread of evidence about health information systems, and how to overcome them to ensure that good practice is widely disseminated across the NHS.
  • With the changing NHS, fragmentation and reduction in scale of the central functions, there is concern that there will be insufficient resources to support these activities or any future national evaluation programme

Conclusions

Despite some differences in approach documented above, there is a large and active community of well-informed academics keen to engage in the independent evaluation of health information systems. They bring expertise in study design, novel data collection and analysis methods and can contribute to establishing the track record of NHS and commercial information systems, which will help with adoption and spread across the health system.

Appendix: Workshop programme

Evaluating large scale health information systems: problems and solutions

Tuesday 7th February 2012 09:30 – 16:30

Centre for Professional Development, Medical School, University of Birmingham

09:30 – 10:00: Registration Tea and Coffee

Session 1: Setting the scene

10:00 – 10:30: Welcome and Introduction – “What is summative evaluation, and why is it needed?” - Professor Jeremy Wyatt

10:30 – 11:00: Whole Scale Demonstrator Programme – Dr Jane Hendy, Imperial College London

11:00 – 11:30: European Federation for Medical Informatics (EFMI) evaluation activities – Professor Jan Talmon, Maastricht University

11:30 – 11:45: Tea/Coffee

Session 2: The problems of evaluation

11:45 – 12:15: A systematic review of eHealth evaluation studies in resource poor settings - Professor Hamish Fraser, Harvard School of Public Health / Partners in Health

12:15 – 12:45: The quality of primary studies in systematic reviews – Dr. Claudia Pagliari, University of Edinburgh

12:45 – 13:30: Lunch

Session 2: Possible solutions

13:30 – 14:00: The pros and cons of “agile” evaluation methods - Professor Jeremy Wyatt, University of Warwick

14:00 – 14:30: ‘Building a house on shifting sand’ - Professor Nick Barber, UCL School of Pharmacy

14:30 – 15:00: Innovative evaluation methods used in the INTERACT-IT study - Dr. Hilary Pinnock, University of Edinburgh

15:00 – 15:30: Health economics of service delivery interventions - Professor Richard Lilford, University of Birmingham

15:30 – 16:00: Panel discussion and feedback chaired by Professor Jeremy Wyatt, University of Warwick

16:00 – 16:10: Conclusions / recommendations - Professor Richard Lilford, University of Birmingham

16:10: Tea/Coffee and Close

1