Supplementary Appendix. Instrument Development and Validation

Supplementary Appendix. Instrument Development and Validation.

This Appendix summarizes the adaptation and validation of the instruments used to assess the quality and quantity of patient preference and treatment effectiveness integration.

Methods

Adapted AGREE Instrument.

The AGREE Instrument is an internationally developed, frequently used and validated method for assessing the quality of CPGs [1]. It consists of twenty-three items scored on a four-point Likert scale and organized in six different domains. We selected four relevant items from the domain of “Rigour of Development” and adapted them to specifically apply to patient preferences. These four items thus encompassed an adapted “quality of patient preference integration” domain that we scored as per the AGREE methods (sum of the scores / total possible score). The four items were also adapted to specifically address treatment effectiveness, and are described in Table 1. The AGREE instrument does contain a single preference-relevant item under the domain of “Stakeholder Involvement” that asks how well “the patients’ views and preferences have been sought.” The item is intended to broadly survey whether the CPG development process included patient representation, or information from patient interviews and literature reviews of patient experiences. This holistic, global judgment of patient preference integration was assessed separately.

Adapted Shaneyfelt et al. Instrument

We also adapted an instrument developed by Shaneyfelt et al. [2]. This group constructed a twenty-five-item survey meant to assess overall CPG quality in three domains. This instrument has also been validated, though not as extensively as the AGREE instrument, and has been widely cited [3-5]. We selected seven relevant questions, answered in a yes/no format, from the domain “Frequency of adherence to methodological standards on evidence identification and summary” and adapted them to apply specifically to patient preferences and treatment effectiveness. (Table 2). We scored this adapted seven-item domain as the percentage of items which received a “yes” response.

Word and reference count rules

In the absence of a gold standard or widely accepted definition, we defined text on the following topics as relevant to patient preferences: discussion on i) the quality of life for health outcomes; ii) the methods by which patients choose therapies/make decisions; and iii) the patient expectation of/satisfaction with health care. For effectiveness, we used “discussions on the effectiveness of an intervention for achieving specific outcomes” as a guiding definition. A summary of the rules used to quantify the amount of text and identify references discussing patient preferences and treatment effectiveness is described in Table 3. Text included in tables and figure captions were considered. For the denominator of the total word count of the CPGs, we included all text except for references. Words in Table and Figures that may have duplicated text were counted twice. Citations that were both effectiveness- and preference-related were counted in the numerators for both the effectiveness and preference percentages.

Pilot testing

A protocol for auditing the CPGs was developed by the investigators (CAKYC, MDK, and GN). Preliminary versions of our instruments were tested on ten CPGs not included in the final analysis. As they were tested, the investigators identified items that were ambiguous or allowed too broad an interpretation. Items were clarified with rewording and examples when necessary. The auditors, one a clinical medicine fellow (CAKYC) and a student (IC), were trained on six other CPGs and concordance of the evaluation was sought prior to reviewing the final dataset.

Validation of appraisal techniques and data analysis

The inter-rater reliability of the reviewers’ independent usage of the adapted AGREE and Shaneyfelt et al. instruments was assessed using the Intraclass Correlation Coefficient. Internal consistency of these two instruments was assessed using Cronbach’s alpha. For construct validity, we had a pre-set hypothesis that all four outcome measures (the two quality instruments, word count and reference count) would be strongly correlated with one another with coefficients 0.5. For criterion validity, in the absence of a gold standard, we took the same approach applied in the validation of the original AGREE instrument by using the reviewers’ subjective assessments as a criterion standard [1]. Reviewers scored the effort put into incorporating patient preference/treatment effectiveness (scale of 1 - 5), and then how well they felt patient preference/treatment effectiveness evidence was ultimately incorporated irrespective of effort (scale of 1 - 5) for an overall score of 2 - 10. This score was then correlated with the four outcome measurements. To reduce the potential for bias, this analysis was done comparing one reviewer’s independent scoring on the four outcome measures with the other reviewer’s independent overall judgment and vice versa; both analyses were very similar and thus, for simplicity, we present only one set of correlations.

Results

The inter-rater reliability of our four adapted quality instruments was very good, with ICC coefficients ranging from 0.81 to 0.92 (Table 4a). Internal consistency was moderate, with Cronbach alphas ranging from 0.66 to 0.76 (Table 4a), comparable with values obtained in the original AGREE validation study of 0.64 to 0.88 [1]. For preferences, the average absolute differences between the rater’s independent quantitative word and reference count measures were 1.8% and 2.0% respectively. The mean discrepancies for the quantitative effectiveness measures were higher at 3.4% and 6.3% for word count and reference count, respectively (Table 4b).

Tables 5a and 5b describe the evaluation of construct validity by evaluating the extent to which results obtained from the various measures correlate with each other. The quality and quantity measures of preference integration are highly correlated throughout, with Spearman coefficients ranging from 0.68 to 0.84 (p < 0.001). Correlations for the effectiveness measures are not as high, although almost all are statistically significant (r = 0.24 to 0.82, p <0.001 except for lowest r at p = 0.057). For criterion validity, all four outcome measures were highly correlated with the reviewers’ subjective overall assessment (Spearman r’s ranging from 0.44 to 0.84, p < 0.001, all results not shown).

Table 1. Items from AGREE adapted to apply specifically to how well evidence on

patient preferences or treatment effectiveness is incorporated into CPGs. Items are

scored on a scale of 1 (strongly disagree) to 4 (strongly agree).

Original AGREE item / Adapted to apply specifically to patient preferences / Adapted to apply specifically to treatment effectiveness
systematic methods were used to search for evidence / systematic methods were used to search for evidence on patient preferences / systematic methods were used to search for evidence on effectiveness
the criteria for selecting the evidence are clearly described / the criteria for selecting evidence on patient preferences are clearly described / the criteria for selecting evidence on effectiveness are clearly described
the health benefits, side effects and risks have been considered in formulating the recommendations / health benefits, side effects, and risks from a quality of life/patient preference perspective have been considered in formulating the recommendations / health benefits, side effects, and risks from a “% chance of x” or “average gain/loss of hard outcome x” perspective have been considered in formulating the recommendations
there is an explicit link between the recommendations and the supporting evidence / there is an explicit link between the recommendations and supporting patient preference evidence / there is an explicit link between the recommendations and supporting effectiveness evidence

Table 2. Items from Shaneyfelt et al. adapted to apply specifically to how well evidence on patient preferences or treatment effectiveness is incorporated into CPGs. Items are scored as a yes or no.

Original Shaneyfelt et al. item / Adapted to apply specifically to patient preferences / Adapted to apply specifically to treatment effectiveness
method of identifying scientific evidence is specified / method of identifying evidence on patient preferences is specified / method of identifying evidence on treatment effectiveness is specified
the evidence used is identified by citation and referenced / evidence on patient preferences is identified by citation and referenced / evidence on effectiveness is identified by citation and referenced
method of data extraction is specified / method of data extraction for evidence on patient preferences is specified / method of data extraction for evidence on effectiveness is specified
method for grading or classifying the scientific evidence or expert opinion are used and described / method for grading or classifying the evidence on patient preferences are used and described / method for grading or classifying the evidence on effectiveness are used and described
formal methods of combining evidence or expert opinion are used and described / formal methods of combining evidence on patient preferences are used and described / formal methods of combining evidence on effectiveness are used and described
benefits and harms of specific health practices are specified / benefits and harms of specific health practices are specified as impact on quality of life / benefits and harms of specific health practices are specified in hard outcomes
benefits and harms are quantified / benefits and harms are expressed using a standard index for quality of life measures (e.g. change in SF-36 or utility score) / benefits and harms are quantified as “% chance of x” or “average gain/loss of outcome x”

Table 3. Summary of text-coding guide.

Patient Preferences / Treatment Effectiveness
Guiding definition / discussion on the QOL for health outcomes, how patient’s choose therapies/make decisions, or patient expectation of/satisfaction with health care / discussions on the effectiveness of an intervention for achieving specific outcomes
Clarifying terms in working definition / - “QOL” included related concepts of functional status, ability/disability and overall well-being. This included text that described the subjective impact of symptoms such as pain, anxiety and depression
- “how patients make decisions” included text advising individualized discussion of the risks and benefits e.g. in the form of decision aids / - “an intervention” included any health care intervention, whether drug, device, surgical or other
- “outcomes” included measures such as knowledge, satisfaction, or well-being
- “effectiveness” for screening tests means the yield of screening for directly influencing clinical outcome or management
- “effectiveness” for diagnostic tests means the ability of the test to directly influence clinical outcome or management
Study characteristics / text describing characteristics of preference studies included / text describing characteristics of effectiveness studies included
Side effects / excluded text describing only the frequency of side effects / excluded text on side-effects
Summary recommendations / excluded unless explicit comment on preferences / excluded unless explicit comment on effectiveness

Table 4a. Inter-rater reliability ( average intraclass correlation coefficient) and internal consistency ( Cronbach’s alpha) for instruments measuring the integration of preference and effectiveness evidence into clinical practice guidelines. (n = 65)

Measurement / intraclass correlation / Cronbach’s alpha
Adapted AGREE effectiveness / 0.90 / 0.71
Adapted AGREE preference / 0.81 / 0.75
Adapted Shaneyfelt effectiveness / 0.92 / 0.66
Adapted Shaneyfelt preference / 0.88 / 0.76

Table 4b. Mean absolute difference between raters for quantitative measurements.

Measurement / mean absolute difference (SD)
% text discussing effectiveness (relative to total word count) / 3.4% (3.3)
% text discussing preferences (relative to total word count) / 1.8% (3.0)
% references citing effectiveness evidence (relative to total reference count) / 6.3% (6.0)
% references citing preferences evidence (relative to total reference count) / 2.1% (3.2)

Table 5a. Construct and criterion validity of instruments measuring the quality of integrating effectiveness evidence -- Spearman correlation coefficients between each instrument and overall judgment (n = 65)

adapted AGREE / adapted Shaneyfelt / % total words / % total references / overall judgment
Adapted AGREE / 1.00
adapted Shaneyfelt / 0.82** / 1.00
% total words / 0.55** / 0.32** / 1.00
% total references / 0.52** / 0.24* / 0.71** / 1.00
overall judgment / 0.66** / 0.81** / 0.50** / 0.44** / 1.00

**p value < 0.001; *p = 0.057

Table 5b. Construct and criterion validity of instruments measuring the quality of integrating preference evidence -- Spearman correlation coefficients between each instrument and overall judgment. (n = 65)

adapted AGREE / adapted Shaneyfelt / % total words / % total references / overall judgment
adapted AGREE / 1.00
adapted Shaneyfelt / 0.71** / 1.00
% total words / 0.76** / 0.68** / 1.00
% total references / 0.77** / 0.73** / 0.84** / 1.00
overall judgment / 0.62** / 0.57** / 0.64** / 0.68** / 1.00

**p value < 0.001

REFERENCES

Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Qual Saf Health Care. 2003;12(1):18-23.

Shaneyfelt TM, Mayo-Smith MF, Rothwangl J. Are guidelines following guidelines? The methodological quality of clinical practice guidelines in the peer-reviewed medical literature. JAMA. 1999;281(20):1900-1905.

Guyatt GT, Haynes RB, Jaeschke RZ et al. Users’ guides to the medical literature. XXV. Evidence-based medicine: principles for applying the users’ guides to patient care. JAMA. 2000;284(10):1290-1905.

Grol R, Grimshaw J. From best evidence to best practice: effective implementation of change in patients’ care. Lancet. 2003;362(9391):1225-1230.

Grilli R, Magrini N, Penna A, Mura G, Liberati A. Practice guidelines developed by specialty societies: the need for a critical appraisal. Lancet. 2000;355(9198):103-106.