Guidelines for Good Evaluation Practices in Health Informatics

Version: 0.16 November 30, 2008

Coordinators: Pirkko Nykänen and Jytte Brender

Core team:

Pirkko Nykänen, Jytte Brender, Elske Ammenwerth, Jan Talmon, Nicolette de Keizer, Michael Rigby

Contributors to this document (alphabetic order):

Jos Aarts, Elske Ammenwerth, Marie-Catherine Beuscart-Zéphir, Jytte Brender, Nicolette de Keizer, Pirkko Nykänen, Jan Talmon, Vivian Vimarlund, Christa Wessel, *** (add/delete as appropriate)

THIS IS WORK IN PROGRESS

Any comments welcome to or .

The most recent version of GEP-HI can always be found at

Information on further work on GEP-HI is provided through the mailing list of the EFMI WG on Assessment of Health Information Systems ( free membership to the mailing list through

This version of GEP_HI includes all materials considered relevant for the good evaluation practice guidelines. We will alsoproduce another paper that presents the motivation and supporting material for these guidelines development.

GEP-HI: Guidelines for GoodEvaluation Practices in Health Informatics – The Product of an Iterative Expert Study

Abstract

Objective: Development of guidelines for good practices to plan and perform evaluation studies in health informatics.

Methods: An initial list of issues to be addressed inevaluation studies was drafted based on experiences by key players and on the evaluation literature in health informatics, as editors and frequent reviewers of journals in health informatics, taking into account other types of guidelines within health informatics and medical research. This list was discussed in several rounds by an increasing number of experts in health informatics evaluation during conferences and by using e-mail. Moreover, at a fairly early point it was put up for comments on the web.

Results: A set of 60 issues has been identified that is relevant for planning and implementation of an evaluation study in health informatics domain. These issues cover all phases of an evaluation study: Study exploration, first study design, operationalisation of methods, detailed study design and implementation of an evaluation study. Issues of risk management and project controlling as well as reporting and publication of the evaluation study are also addressed.

Conclusion: A comprehensive list of issues relevant for properly planning and performing health informatics evaluations has been developed as a guideline for Good Evaluation Practices in Health Informatics (GEP-HI). As discussed in section 5 of this paper, application of these guidelines shall increase the general validity and generalisability of evaluation studies, since a number of omissions, pitfalls and dangers are avoided. Therefore, also the likelihood of being fit in the scope of meta-analyses of health informatics interventions will increase and hence these guidelines are an important step towards the vision of evidence-based health informatics.

Keywords:

Health informatics, Guidelines, Publishing Standards, Research Design, Evaluation

1Introduction

Development and use of health informatics applications in health care offer tremendous opportunities to improve health care, its delivery and outcome. However, there are also hazards and problems related to the use of IT in health care. Bad Health informatics can kill – is evaluation the answer? [Ammenwerth & Shaw 2005]. Evaluation is the means to assess the quality, value, effects and impacts of information technology (IT) in the health care environment. More generally, evaluation is defined as the “act of measuring or exploring properties of a health informatics application (in planning, development, implementation, or operation), the result of which informs a decision to be made concerning that system in a specific context” [Ammenwerth et al. 2004, p. 480].

The working conference HIS-EVAL funded by the European Science Foundation (ESF)has been instrumental in the identification of the state of affairs with respect to evaluation of health informatics applications and in defining necessary steps to further evaluations in health informatics. The results of this workshop were reported earlier in [Ammenwerth et al. 2004]. An important result described in this workshop report is the Declaration of Innsbruck. The Declaration summarizes the importance of evaluation as: “Health information systems are intended to improve the functioning of health professionals and organizations in managing health and delivering health care. Given the significance of this type of intervention, and the intended beneficial effect on patients and professionals, it is morally imperative to ensure that the optimum results are achieved and any unanticipated outcomes identified. The necessary process is evaluation and this should be considered an essential adjunct to design and implementation of health information systems” [Ammenwerth et al., 2004, p. 487].

The Declaration also presents recommendations to improve the current status of evaluation and one important recommendation is to develop guidelines for Good Evaluation Practices in Health Informatics (abbreviated ‘GEP-HI’). The importance of developing an evidence base of good evaluation practices is also emphasized in [Talmon 2006].

One of the problems in evaluation as identified in the Declaration of Innsbruck is of a methodological nature. The problem in short is: “Case studies on evaluation are often not sufficiently grounded in theory, and established evaluation methods are frequently poorly applied. Evaluators are often insufficiently trained to select methods from various disciplines and to apply and combine these adequately. The proper design of evaluation studies, the selection of a framework to be applied and of methods to be used is difficult.” [Ammenwerth et al. 2004, p. 484]. It is this that the present guideline attempts to compensate for.

This summary of the state-of-affairs is later confirmed by a review of pitfalls and perils in evaluation studies, see [Brender 2006, pp. 243-323]. Brender found numerous examples on these and she could identify many example cases for most of the identified perils and pitfalls in the literature on evaluation of IT-based systems and applications in healthcare, while it was difficult to find publications on good studies that avoided the pitfalls and perils. The experience is that evaluation of IT-based solutions is both a discipline and a profession in itself, and as such it can be compared with health technology assessment (HTA) of health care methods, techniques and tools. HTA, however, is based on a pre-defined formal framework and usually has a summative nature using general evaluation methods and approaches (often quantitative), while evaluation in general can be constructive and summative and use a variety of methodologies and methods. A review of the current scene with regard to health informatics evaluation commissioned for theInternational Medical Informatics Association Yearbook showed the importance of this discipline, the challenges, and the currently very limited response, both in recognising integrated methodologies and in endorsing the importance of robust studies [Rigby, 2006].

Reflective deliberations at the HIS-EVAL workshop led to the development of two guidelines: STARE-HI for reporting of evaluation studies and GEP-HI for good practices for planning and execution of evaluation studies. The STARE-HI guidelines for reporting of evaluation studies are finalized and published [Talmonet al.,2009].

Since evaluation of IT-based solutions is a difficult and important task, it is worth gathering guidelines that enable the majority of evaluators to avoid the major pitfalls. There is no single global approach or methodology that is valid for all evaluation studies in any context. On the contrary, for any single context there are a number of approaches and methods that can be applied. Therefore guidelines are needed that give us advice on how to design and how to carry out evaluation studies, and what issues to consider in various study phases. Examples and literature on evaluation studies build the background for these GEP-HI guidelines, see e.g. [Coolican 1999; Schalock 2001; Quinn Patton 2002; Kaplan and Shaw 2004; Ammenwerth and de Keizer 2005; Vimarlund, 2005; Davidson 2005; Fink 2005; Friedman & Wyatt 2006; Brender 2006; Owen 2007; Westbrook et al., 2007;Hyppönen et al., 2007; Yusof et al., 2008; Talmon et al., 2008].

Also noteworthy here is the differentiation between science and pragmatism, as expressed in [Brender 2006, p. 321]: “Firstly, for practical or other reasons it may not be feasible to design the perfect assessment study for a given case; therefore it is preferable to design an assessment study in compliance with the pre-conditions. Secondly, even if one or more biases may be present, their actual influence may not be significant enough to wreck the global picture of the IT-based solution. Thirdly, the efforts invested in an assessment study have to be balanced against the information gained and the intended effect of the conclusion; if no consequence is to be taken, then why assess at all? Finally, even if it is unfeasible to accomplish the perfect assessment study both in principle and in practice, the resources to be invested have to be balanced against the information gained.”

2OBJECTIVE IN GUIDELINES DEVELOPMENT

The objective of this guidelines development is to identify groundingprinciples and issues for good evaluation practices for health informatics applications. In this respect we see health information systems as technical constructs as well as organisational information systems covering all relevant phenomena within the organizational environment, from behavioural aspects to organizational structure, and from users and use processes to developers and development processes [Ives et al., 1980]. From these principles it follows that also evaluation of health information systems must cover all needed aspects, from organizational to users and use processes as well as all development aspects.

These guidelines are designed for health care professionals, health informatics professionals, decision makers, and other health IT stakeholders who plan to carry out, or participate in, or use the results of formal evaluation studies, but who are not necessarily evaluation experts themselves. Our target audience covers also health IT professionals working with the development and implementation of health information systems in healthcare, and the guidelines can be utilised also by users of health informatics applications. However, since these guidelines are not a methodological cookbook on how to carry out evaluation studies, we assume that the target audience has the required scientific maturity as well as some evaluation or project management expertise. The essential message is that application of these guidelines requires a professional approach: an evaluator should be able to assess which parts of the guidelines are applicable and where the context requires deviation.

These guidelines aim to be general and practical and shall provide evaluators, users and health professionals with a set of structured, comprehensive and understandable rules for good evaluation practices. These rules are grounded on a stringency corresponding to scientific principles and inspired by the best practice evaluation as implicitly described in the literature; finally, as a function of time they will enter into a cycle of feedback from practical experience by evaluation people worldwide, leading to renewed versions. The guidelines list the criteria and aspects on how to design evaluation studies, how to select methodologies, how to conduct studies, both in quantitative and qualitative terms, and how to define evaluation criteria at specific phases of the health informatics applications’ design, development, adoption, implementation and installation. The guidelines cover the issues which need to be considered at each evaluation phase and to design and manage the evaluation study.

Important issues that have been considered in the development of the GEP-HI guidelines are:

  • Application range: Types of evaluation studies that these guidelines are relevant for,
  • Applicability: The potential barriers for guidelines application, including organizational barriers and cost issues,
  • Stakeholder involvement: The intended users of these guidelines, and their needs as regards to comprehensibility, practicality, and understandability,
  • Guidelines validity and completeness: The general validity and completeness of these guidelines, and their conformance to the general criteria of scientific rigor as well as their cultural dependency,
  • Clarity and presentation: The comprehensiveness of the guidelines presentation and the ability of the guidelines to serve as an instrument for the intended users.

These issues will be reflected on in section 5.1

This GEP_HI work is a shared activity of EFMI’s Working Group EVAL and IMIA’s Working Group on Technology Assessment and Quality Development.

3Method

The method used to develop GEP-HI was a consensus-making process in the community of health informatics evaluation experts. The starting point of the guideline development was the existing knowledge and experience – that is, existing literature and published materials on evaluation studies, methodologies, evaluation experiences, guidelines development, codes of ethics and good implementation practices. In particular the following recent review material, encyclopaedias and textbooks may provide an overview:[Talmon et al., 1999; Ammenwerth et al., 2004, Kaplan and Shaw 2004; Friedman & Wyatt 2006; Brender 2006; Brender et al., 2006; Westbrook et al., 2007; Yusof et al., 2008 a; Talmon et al., 2008].

An initial list of issues that were addressed in reports on evaluation studies was drafted based on the literature as well as on the experiences by the core team, in terms of editors and frequent reviewers of evaluation studies in health informatics journals, taking into account other types of guidelines within health informatics and medical research.

The differentversions of the guidelines were drafted and distributed for critical review and modifications to a community of selected evaluation experts, the core team. Key persons, 6 in total, from the ESF HIS-EVAL Workshop and the EFMI and IMIA working groups dealing with technology assessment and evaluation of health informatics applications constituted this core team, the authors of this contribution.

At regular intervals, the guidelines were presented and/or submitted for discussion amongan increasing number of evaluation experts, for instance during the MIE (Medical Informatics Europe) and MEDINFO (World Congress on Medical Informatics) conferences and by a dedicated e-mail list (through EFMI’s WG Eval website: while calling for feedback. Very first ideas of the guidelines were presented and discussed in workshops during the MEDINFO2004, MIE2005 and -2006 congresses. A first full draft of the guidelines was presented at MEDINFO 2007 and elaborated at a workshop at MIE2008. After critical review and modifications by the core team the guidelines were again made available on the web for a larger community of health informatics experts for external review and comments.

The entire mailing list collected under EFMI’s EVAL-working group includes a comprehensive list of people with expertise in evaluation (xx in total as per November 2008), ranging from experts to practitioners, and from universities, to health organizations, software industry, and more. They were all invited to participate in the consensus-making process that iterated between the core team editing and open forum feedback.

4GEP-HI guidelines for evaluation

These guidelines present aspects and activities to take into account at the design of an evaluation study at various phases of health informatics application design, development, implementation and installation, including the management of an evaluation study.

Methods to be used at each phase and links between specific information needs and such methods will only exceptionally be mentioned in the guidelines. It is up to the reader to identify which method is applicable in the situation by means of the latest edition of handbooks like [Brender 2006], textbooks like [Coolican 1999; Schalock 2001; Quinn Patton 2002; Davidson 2005; Fink 2005; Friedman & Wyatt 2006; Owen 2007], and other central literature like [Kaplan and Shaw 2004; Ammenwerth and de Keizer 2005; Vimarlund, 2005; Westbrook et al., 2007; Yusof et al., 2008b], as well as websites collecting evaluation studies like or specific reporting guidelines like

The guidelines are divided into parts, corresponding to the study phases, see the flowchart, Figure 1. ‘Phase’ is used here in the sense defined by the ISO as “a segment of work” [ISO 9000-3]. The theoretical background for the evaluation study phases is analogous to the general approach in information systems development (ISD) models, for instance, the lifecycle model that emphasises the iterative nature and division of the process into phases to manage and control the process. Phases are further divided into tasks that from a planning perspective are coherent and meaningful components of the given phases.

The phases in GEP_HI guidelines are:

  • Study exploration focuses on the starting question of the evaluation study, see 4.1,
  • First study design focuses on the preliminary design of the evaluation study, see 4.2,
  • Operationalisation of methods focuses on making the design and evaluation methods concrete and compliant with the organizational setting and the information need, while taking into account the known pitfalls and perils, see4.3 with two sub-sections on:
  • Methodological approach, see 4.3.1,
  • Methodical aspects, see 4.3.2,
  • Detailed study plan and project plan focuses on providing plans, prescriptions and procedures detailed to the level necessary for the specific study, see 4.4,
  • Evaluation study implementation focuses on activities related with the actual accomplishment of the designed evaluation study, see4.5 with two sub-sections on:
  • Project controlling and risk management focuses on the good project management practices specifically for an evaluation study, see 4.5.1,
  • Reports and publications focuses on how to report evaluation studies in terms of the STARE-HI guidelines, see 4.6.

Figure 1: Flowchart of an evaluation study. [PNy1]

The Flowchart must not be mistaken for a waterfall model. There may certainly be iterations with one or more tasks of a previous phase in case problems or omissions show up during the current phase, and hence there may be feedback loops between the different phases. Moreover, the activities within a given phase do not necessarily follow each other in a linear or sequential fashion, so there may also be iterations internally within a given phase.

Note that phases three and four (Operationalisation of Methods and Detailed Project Planning, respectively) are activities suitable for plugging in the relevant methods, singular or plural, that one find applicable for the study’s specific purpose. It is not the purpose of the present guideline to make specific recommendations as regards evaluation methods or project management methods.

The flowchart phases and their related issues are described one by one in more detail in the following sections according to the overview in Table 1. The fact that a specific item is not mentioned for a specific phase does not mean that it can be ignored in that phase. In principle, each item is reiterated in all phases, but some items are more dominant than others in a given phase, and they are the ones shown. For instance, the topic ‘the rationale for the study’ is the foundation for each and every decisionas regards who, what and how to consider, and hence it may need elaboration or refinement where more detail turns out to beneeded. Another example is the concepts of stakeholders (interest groups). In an explorative phase one needs to know who the stakeholders are but not necessarily more; during the first study design – and/or later – oneneeds more information about stakeholders in order to take their issues into account (who will benefit or potentially the opposite,and how), so an elaborated stakeholder analysis may be needed or perhaps even a Social network analysis depending on the study objective and anticipated usage of methods. We have, however, not included this principle in every detail in every phase for sake of clarity and brevity.