Supplementary Table 1. Comparison of GRADE for Interventions and GRADE for Diagnostics

Category / GRADE for Interventions / GRADE for Diagnostics
Question formulation / PICO formulation / PICO formulation;but maybe inadequate due to lack of distinction in type of “patients” and stage of “disease”, both elements may have impact on judgments on evidence quality
Tests need to be considered in the context of test-treatment pathway; development of clinical pathway
Outcomes / Patient important outcomes / Patient important outcomes
- Measured directly (randomizing patients to different test-treatment strategies; measurement of patient important outcomes); use GRADE for Interventions approach
or, in of absence of such direct evidence:
- Measured indirectly by using diagnostic test accuracy (DTA) outcomes (true positives, false positives, true negatives, false negatives) assurrogate markers for patient important outcomes. Evidence synthesis of DTA data and linkage to patient important outcomes is needed[1]; use GRADE for Diagnostics approach
Rating the quality of evidence / Intervention studies: by outcome, across studies / DTA studies: by outcome, across studies
GRADE criteria for downgrading
1. Risk of bias (RoB) /
  1. First, assess RoB for each study. Tool depends on study design. For RCTs, for example, the Cochrane Risk of Bias tool is recommended
  1. Downgradingis basedconsidered judgment on the extent of bias in all studies reporting the outcome
/
  1. First, assess RoB for each study. Use RoB domains of the QUADAS 2 tool
  1. Downgrading is based on considered judgment on the extent of bias in all studies reporting the outcome

Category / GRADE for Interventions / GRADE for Diagnostics
2. Indirectness / Downgrading based on applicability issues. E.g. differences between population studied (seasonal flu) and population for whom recommendation is intended (avian flu), surrogate outcomes (bone density vs fractures) or indirect comparisons (A vs placebo and B vs placebo instead of A vs B). / Downgrading based on applicability issues. E.g. differences between populations studied (e.g. secondary care) and those for whom recommendation or test is intended (e.g. primary care), differences in tests studied or diagnostic expertise of people applying them. Use Applicability domains of QUADAS 2 tool.
When two or more tests are each compared to its reference standard but not directly to one another:
Evidence quality maybe downgraded for indirect comparisons. E.g. (A vs reference test and B vs reference test instead of A vs B directly).
If focus is on patient important outcomes: quality of evidence downgraded because DTA evidence is considered a surrogate marker for patient important outcomes.
3. Inconsistency / Downgrading based on evaluation of unexplained heterogeneity.
Criteria include judgment on similarity of point estimates, extent of overlap of confidence intervals, and statistical criteria (tests of heterogeneity). / Downgrading based on evaluation of unexplained heterogeneity.
Criteria are less clear compared to intervention studies, but include judgment on similarity of point estimates and extent of overlap of confidence intervals. Lack of appropriate statistical methods for assessing study heterogeneity.
4. Imprecision / Downgrading based on evaluation of 95% confidence intervals of effect estimate, optimal information size and number of events. / Downgrading is based on evaluation of 95% confidence intervals around sensitivity and specificity. Formal guidance and methods on how this can be assessed not available yet.One method might be calculating the projected number of patients testing as FN and FP based on a defined prevalence of the target condition.
5. Publication bias / Judgment of this criterion is challenging.
Different approaches exist such as funnel plot and statistical tests. Though these approaches have their own limitations, they provide some guidance on how to make a judgment for this criterion. / Judgment of this criterion is challenging.
Existing methods used in intervention studies not applicable to diagnostic studies
Category / GRADE for Interventions / GRADE for Diagnostics
GRADE criteria for upgrading / Examples of situations in which upgrading appropriate (e.g. large magnitude of effect) are available. / Specific examples of situations in which upgrading is appropriate not yet available.

1

[1] Linkage of DTA evidence to patient important outcomes not covered in this table