Reporting Standards

Reporting Standards

Version 1, 17 Feb 2016, prepared by Thomas Hartung

This background document summarizes the reporting standards assessment frameworks, which might be considered for evidence evaluation in the context of GRAS evaluations. It addresses cell culture, animal and computer models. It does not extend to clinical and ecotoxicological studies. An obvious standard are the OECD Good Laboratory Practice guidelines, but they are more documentation than reporting standards and do not address scientific reporting in journals etc.. They also do not cover non-animal methods to the same extent. Especially for existing evidence to be considered under GRAS evaluations also an assessment of the reposting quality will be needed.

Cell Culture Work

Good Cell Culture Practice (GCCP)

The limited applicability of GLP to in vitro studies was first addressed in an European Center for the Validation of Alternatives Methods (ECVAM) workshop in 1998 (Cooper-Hannan et al. 1999). Parallel initiatives (1996 in Germany and 1999 in Bologna at the Third World Congress on Alternatives and Animal Use in the Life Sciences) led to a declaration toward Good Cell Culture Practice – GCCP (Gstraunthaler amd Hartung 1999):

“The participants … call on the scientific community to develop guidelines defining minimum standards in cell and tissue culture, to be called Good Cell Culture Practice … should facilitate the interlaboratory comparability of in vitro results … encourage journals in the life sciences to adopt these guidelines...”

A GCCP task force was then established, which produced two reports (Hartung et al. 2002; Coecke et al. 2005). The maintenance of high standards is fundamental to all good scientific practice, and it is essential for ensuring the reproducibility, reliability, credibility, acceptance, and proper application of any results produced. The aim of GCCP is to reduce uncertainty in the development and application of in vitro procedures by encouraging the establishment of principles for the greater international harmonization, standardization, and rational implementation of laboratory practices, nomenclature, quality control systems, safety procedures, and reporting, linked, where appropriate, to the application of the principles of Good Laboratory Practice (GLP). GCCP addresses issues related to:

– Characterization & maintenance of essential characteristics

– Quality assurance

– Recording

– Reporting

– Safety

– Education and training

– Ethics

The GCCP documents formed a major basis for a GLP advisory document by The Organisation for Economic Co-operation and Development (OECD) for in vitro studies (OECD, 2004), which addresses:

– Test Facility Organization and Personnel

– Quality Assurance Program

– Facilities

– Apparatus, Materials, and Reagents

– Test Systems

– Test and Reference Items

– Standard Operating Procedures

– Performance of the Study

– Reporting of Study Results

– Storage and Retention of Records and Materials

Therefore, both guidance documents have a lot in common: Inherent variation of in vitro test systems calls for standardization, and both the GLP advisory document and the GCCP guidance are intended to support best practice in all aspects of the use of in vitro systems, including the use of cells and tissues. Notably, there is current development of a Good In vitro Method Practice (GIVIMP) by ECVAM and the OECD, but details have not been published. The envisaged International guidance shall support the implementation of in vitro methods within a GLP environment to support regulatory human safety assessment of chemicals. GIVIMP will contribute to increased standardization and harmonization in the generation of in vitro information on test item safety. The Guidance will further facilitate the application of the OECD Mutual Acceptance of Data agreement for data generated by in vitro methods and as such contribute to avoidance of unnecessary additional testing. GIVIMP will take into account the requirements of the existing OECD guidelines and advisory documents to ensure that the guidance is complementary and 100% in line with these issued documents.

When comparing GLP and GCCP, there also are some major differences: GLP still gives only limited guidance for in vitro. GLP cannot normally be implemented in academia on the grounds of costs and lack of flexibility. GCCP, on the other hand, also aims to give guidance to journals and funding bodies.

All quality assurance of an in vitro system starts with its definition and standardization, which include:

– A definition of the scientific purpose of the method

– A description of its mechanistic basis

– The case for its relevance

– The availability of an optimized protocol, including:

· standard operation procedures

· specification of endpoints and endpoint measurements

· derivation, expression, and interpretation of results (preliminary

prediction model)

· the inclusion of adequate controls

– An indication of limitations (preliminary applicability domain)

– Quality assurance measures

This standardization forms the basis for formal validation, as developed by ECVAM, adapted and expanded by ICCVAM and other validation bodies, and, finally, internationally harmonized by OECD (OECD, 2005). Validation is the independent assessment of the scientific basis, the reproducibility, and the predictive capacity of a test. It was redefined in 2004 in the Modular Approach (Hartung et al. 2004) but needs to be seen as a continuous adaptation of the process to practical needs and a case-by-case assessment of what is feasible (Hartung 2007a; Leist et al. 2012).

Animal Work

Five articles on reporting the results of animal experiments were identified in a review in press (Samuel et al., in press). One was specific to toxicology (Beronius et al., 2014).

• Beronius, Molander, Rudén, Hanberg (2014)

This work proposed criteria for assessing reliability and relevance of in vivo studies not conducted according to standardized toxicity test guidelines. A two-tiered approach for assessing reliability was developed. Tier 1 reliability criteria comprise 11 items such as the chemical name/CAS number and source of test compound, the description of number of animals/dose group, description of dose-levels/concentrations, the duration and frequency of administration, and statistical methods. Studies that satisfy all the criteria in Tier 1 are then evaluated for reliability using Tier II criteria, while those failing are regarded as having poor reporting quality, and as such are excluded from evidence used in risk assessment. The proposed Tier II reliability criteria comprise items in seven categories, for example purpose (e.g., description of endpoints to be investigated); test substance (e.g., description of toxicokinetic properties); and animals, housing and feed. A web-based tool was developed for the appraisal of reliability using the Tier II criteria, which translates an assessor marks on the fulfillment of each criterion into a color scale. Finally, relevance is evaluated, guided by eight items that address aspects such as the relevance of the exposure route of administration for human exposure, the appropriateness of exposure timing for the investigated endpoints, and the use of test substance representative of substance being risk assessed. Furthermore, the authors proposed a 16-criteria reporting checklist to support researchers in the design, conduct and reporting of in vivo toxicity studies.

• Festing & Altman (2002)

Festing and Altman developed a checklist of criteria for reporting animal experiments. Their objective was to promote the “3Rs” framework (Replacement, Reduction, and Refinement) for the ethical use of animals. The checklist consists of three categories that should be addressed in a paper: animals, environment, and statistical software. For example, the following items should be reported with respect to the “animals” category: source (e.g., species and gender), transportation (e.g., period of acclimatization), genotype (e.g., strain name), and microbiological status (e.g., specified pathogen-free).

• Kilkenny, Browne, Cuthill, Emerson & Altman (2010)

The ARRIVE (Animals in Research: Reporting In Vivo Experiments) guidelines address the reporting of animal experiments. The goal of the guidelines is not to establish a standardized procedure or to mandate procedures for reporting, but, rather, to improve the quality and utility of animal research through enhanced reporting of what was done and found during a study. The guidelines were developed by researchers, statisticians, and journal editors, and funded by the United Kingdom-based National Center for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs). The elements of the 20-item checklist are categorized under headings that follow the typical format of a scientific paper: Title, Abstract, Introduction, Methods, Results, and Discussion. The included items address ethical issues; study design; experimental procedures and specific characteristics of animals used; details of housing and husbandry; sample size; experimental, statistical, and analytical methods; and scientific implications, generalizability, and funding.

• Hooijmans, de Vries, Leenaars & Ritskes-Hoitinga (2011)

The Gold Standard Publication Checklist (GSPC) provides a distillation of guidelines on the proper design and reporting of animal experiments, and reflects feedback from experts in the field of animal science. The GSPC is intended to improve the quality of research involving animals, to help researchers to replicate results, to reduce the number of animals used in research, and to improve animal welfare. The checklist comprises several items under four categories similar to those of the ARRIVE guidelines: Introduction, Methods, Results and Discussion. For example, the guidelines recommend that the methods section address the following topics: the experimental design used; the experimental groups and controls used (such as species, genetic background, housing and housing conditions, nutrition, etc.); the ethical and regulatory principles followed; the intervention employed (such as dose and/or frequency of intervention, administration route, etc.); and the desired outcome (such as descriptions of parameters of interest and statistical methods).

• Landis, Amara, Asadullah, Austin, Blumenstein, Bradley, Crystal, Darnell, Ferrante, Fillit, Finkelstein, Fisher, Gendelman, Golub, Goudreau, Gross, Gubitz, Hesterlee, Howells, Huguenard, Kelner, Koroshetz, Krainc, Lazic, Levine, Macleod, McCall, Moxley, Narasimhan, Noble, Perrin, Porter, Steward, Unger, Utz & Silberberg (2012)

This guideline was proposed by major stakeholders in the US National Institute of Neurological Disorders and Stroke. The objective was to improve the quality of reporting of animal studies in grant applications and publications. The authors reached a consensus on reporting criteria that are regarded as pre-requisite for authors of grant applications and scientific publications. These criteria comprise four items: randomization (e.g., data should be garnered and processed randomly), blinding (animal care-takers and investigators should be blinded), sample-size estimation (e.g., utilization of appropriate sample size), and data-handling (e.g., a priori description of inclusion and exclusion criteria).

Cell Culture and Animal Work

• Schneider, Schwarz, Burkholder, Kopp-Schneider, Edler, Kinsner-Ovaskainen, Hartung & Hoffmann (2009)

This paper proposed the Toxicological Data Reliability Assessment Tool (ToxRTool) as a means of introducing more objectivity into the assignment of Klimisch categories to individual studies. The ToxRTool provides comprehensive criteria and guidance for these assignments. This software-based tool comprises two parts, one for in vivo studies and the other for in vitro studies. There are five evaluation criteria groupings: (1) test substance identification, (2) test system characterization, (3) study design description, (4) study results documentation, and (5) plausibility of study design and data. Studies are assigned scores that determine their Klimisch code. Criteria that are considered essential (e.g., test substance identification and test concentration description) are given greater weight in the evaluation. The ToxRTool is nested within a Microsoft Office Excel® 2003 file that contains spreadsheets for the reliability evaluation of in vivo and in vitro toxicity studies, optional documentation of observations with importance to relevance (e.g., was study conducted according to recent OECD or EU guidelines?), as well as detailed explanations of the criteria. The goal of this design is to improve transparency in reliability evaluations of studies. The ToxRTool prototype was tested and improved through inter-rater testing (available for download at https://eurl-ecvam.jrc.ec.europa.eu/about-ecvam/archive-publications/toxrtool).

Computational Toxicology

Guidance relevant to assessing the reporting quality of (Q)SAR studies is provided in the OECD and the ECHA guidelines below [see the “Mixed Guidance (Methodological and Reporting Quality)” section, below].

Mixed Guidance (Methodological and Reporting Quality)

• OECD 2007 (Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR]) models (http://www.oecd.org/officialdocuments/publicdisplaydocumentpdf/?cote=env/jm/mono%282007%292&doclanguage=en)

The increasing calls for (QS)ARs to be developed and applied to regulatory purposes beyond screening/prioritization, such as under REACH, resulted in a number of activities being initiated to promote (Q)SARs. A 2002 workshop organized by the International Council of Chemical Associations’ Long-Range Research Initiative (ICCA-LRI) and held in Setubal, Portugal, brought together a diverse group of international stakeholders to formulate guiding principles for the development and application of (Q)SARs. These were named the Setubal principles (Jaworska, Comber, Auer, & Van Leeuwen, 2003). These were subsequently discussed and endorsed by the OECD and are now known as the OECD Principles for (Q)SAR Validation (OECD, 2004a). There are five such principles that should be assessed to evaluate the scientific validity (quality) of a (Q)SAR model: a defined endpoint, an unambiguous algorithm, a defined domain of applicability, statistical validations, and a mechanistic interpretation (where feasible). Each of these five principles has numerous components. Preliminary guidance to interpret these principles was drafted by the European Commission Joint Research Center (Worth et al., 2005). This guidance was taken up by the OECD and published as its guidance in 2007 (OECD, 2007). This OECD guidance can be used retrospectively to evaluate the quality of (Q)SAR studies that apply these models.

Reporting formats underpinned by the OECD validation principles were developed under the auspices of the then EU’s Technical Committee for New and Existing Substances QSAR Working Group to characterize pertinent information about a given (Q)SAR model and its predictions. Two of three reporting formats that were created and incorporated into the OECD and ECHA guidance on QSARs (OECD, 2007; ECHA, 2008) are of relevance:

• (Q)SAR Model Reporting Format (QMRF): The information captured within a QMRF includes the QMRF author, the (Q)SAR model developer, the model type, the model algorithm, the endpoint being modeled and the descriptors used, the approach used to characterize the domain of applicability, the performance characteristics from internal/external validation, the mechanistic interpretation, and possible applications of the model. The level of detail will vary with different types of (Q)SAR models.

• (Q)SAR Prediction Reporting Format (QPRF): The QPRF addresses the question of how a predicted value is generated for a substance using the model described in the QMRF; it also addresses the evaluation of its reliability. The information includes the substance identity and its structural representation, a description of how well the substance falls within the defined domain of applicability, and the extent to which there is agreement between the (Q)SAR predictions and the experimental data for relevant analogues.