ANDS Guide

Research Data for Journal Editors (DRAFT)

This ANDS Guideprovides a starting point for Journal Editors considering developing or improving data policies for their journals.

This Guide is currently in draft form (version 1.0). ANDS is seeking your feedback. Please provide comments or send theedited MS Word version to:

Table of Contents

1.Overview

2.Why do journals need data policies?

3.What should a data policy include?

3.1What data to deposit

3.2Where to deposit the data

3.3How to deposit the data

3.4When to deposit the data

3.5If, or how, the data should be peer-reviewed

3.6Options for data sharing and access

3.7How the data should be licensed

3.8How researchers should cite the data

3.9Including code and methods

3.10Help with compliance and consequences of non-compliance

4.Further references

1.Overview

Why do journals need data policies?

An increasing number of journals are implementing policies that require published articles to be accompanied by the underlying research data. Journal policies on research data and software code availability are an important part of the ongoing shift toward publishing reproducible research, supporting statements, mandates and principles issued by research funders, governments and scientific societies around the world.

What should a data policy include?

Journal publisher policies that define data availability in relation to published articles vary widely in both approach and scope. Some recommend data deposit while others mandate data deposit; some specify where the data should be deposited and when, while others do not; some detail the consequences of non-compliance while others make no mention of it.

In developing or refining a data policy, publishers could consider including the following:

  1. What data to deposit
  2. Where to deposit the data
  3. How to deposit the data
  4. When to deposit the data
  5. If or how the data should be peer-reviewed
  6. Options for data sharing and access
  7. How the data should be licensed
  8. How researchers should cite the data
  9. Including code and methods
  10. Help with compliance and consequences of non-compliance

How can ANDS help?

  • Provide example policies and links to reference material eg Journal Open Data Policies
  • Email to start the discussion for your journal data policy

2.Why do journals need data policies?

An increasing number of journals are implementing policies that require published articles to be accompanied by the underlying research data. These policies enable a platform for replication and verification of the authors’ published claims. Journal policies on research data are an important part of the ongoing shift toward publishing reproducible research, supporting numerous statements and principles issued by research funders, governments and scientific societies around the world.

A condition of publication in a Nature journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications.

- Nature policy on availability of data, material and methods

PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception.

- PLOS policy on data availability

3.What should a data policy include?

Journal publisher policies that define data availability in relation to published articles vary widely in both approach and scope. Some recommend data deposit while others mandate data deposit; some specify where the data should be deposited and when, while others do not; some detail the consequences of non-compliance while others make no mention of it.

In developing or refining a data policy, publishers could consider including the following:

  1. What data to deposit
  2. Where to deposit the data
  3. How to deposit the data
  4. When to deposit the data
  5. If or how the data should be peer-reviewed
  6. Options for data sharing and access
  7. How the data should be licensed
  8. How researchers should cite the data
  9. Including code and methods
  10. Help with compliance and consequences of non-compliance

3.1What data to deposit

Journal data policies vary widely in their definition of terms such as ‘the data’ and ‘the data set’. Stating the definition of these terms within a data policy will provide clarity and guidance for authors. Policies need to specify that both the data and associated metadata required to validate the article should be deposited. Providing examples of the types of data accepted would also be beneficial. If certain types of data should not or cannot be accepted then this should be specified in the policy.

PLOS defines the “minimal data set” to consist of the data set used to reach the conclusions drawn in the manuscript with related metadata and methods, and any additional data required to replicate the reported study findings in their entirety. Authors do not need to submit their entire data set if only a portion of the data were used in the reported study. Also, authors do not need to submit the raw data collected during an investigation if the standard in the field is to share data that have been processed.

- PLOS policyon data availability

Publishers may need to refer authors to advice on how to share data sets that have been derived from clinical studies or other studies involving human participants. A data policy may refer to internal or external sources that researchers may consult to guide them in this area, for example, PLOS suggests:

  • US National Institutes of Health: Protecting the Rights and Privacy of Human Subjects
  • Canadian Institutes of Health Research Best Practices for Protecting Privacy in Health Research
  • UK Data Archive: Anonymisation Overview
  • Australian National Data Service: Ethics, Consent and Data Sharing

Data policies should specify a course of action in the event that authors did not collect data themselves but used another source, for example, crediting the source of this data.

3.2Where to deposit the data

Journal data policies should include clear instructions for authors on where to deposit the data accompanying their article. Policies commonly refer to depositing data in “an appropriate data repository”. However, authors may be unfamiliar with using a data repository and may have concerns about the trustworthiness of repositories.Therefore a policy is likely to be more effective if it refers authors to deposit data in specific repositories. The PLOS data policy, for example, identifies established repositories “which are recognized and trusted within their respective communities”.

A good place to start in identifying possible data repositories is the Registry of Research Data Repositories (Re3Data) which lists over 1,500 research data repositories, making it the largest and most comprehensive registry of data repositories available on the internet. Policies may also advise data deposit in “institutional repositories” which refers to repositories that are managed by research institutions such as universities.

Additional criteria for selecting an appropriate repository may be included in the policy such as that which is appropriate for the discipline, use of open licences, inclusion of metadata, adhering to standards and best practice, data preservation policy, mechanism for data citation and accessibility.

A journal’s policy about data sharing often suggests a location/repository for the data to be archived. For examples of this see PLOS Medicine and Scientific Data.

Some journals have relationships with specific repositories, and have integrated data submission to that repository into their manuscript submission systemeg

  • Dryad has a Journal Lookup page which provides information on over 100 journals with integrated data submission and/or sponsored data publishing charges.
  • Wiley is piloting this process with figshare in 2016.

Some publishers may wish to advise authors to deposit data into their own repository.

The preferred way to share large data sets is via public repositories…...Nature journals encourage authors to consider the publication of a Data Descriptor in Scientific Data to increase transparency and enhance the re-use value of data sets used in their papers.

Nature, availability of data, material and methods

3.3How to deposit the data

Authors will benefit from clear instructions on how to deposit the data that accompanies their article. If a data policy has specified that the data is to be deposited into an external data repository, it may indicate that authors follow the guidelines for submission provided by the respective repository. If the policy advises data deposit into a publisher-managed repository, it should refer authors to instructions as to how they can deposit their data and what metadata needs to be provided.

3.4When to deposit the data

It is recommended that data policies provide details to authors of when they need to deposit the data and associated metadata needed to validate the results presented in their publication. Generally, the data policy should require authors to provide data to the editorial office prior to the publication of an article. The exact timing of deposit will differ depending on whether the data is to be peer reviewed or not.

Ideally, the data will be deposited after the initial acceptance of the article and while publication peer review is taking place. Deposited data may then be optionally reviewed, issued with a Digital Object Identifier (DOI) and a citation for the data. The data citation can be included in the article itself, either in the references, data or supplementary material sections.

Some journals integrate article submission with data repositories. For example, the Dryad Data Repository offers submission integration as a free service that allows journal publishers to

co-ordinate the submission of manuscripts with submission of data to Dryad. It includes an option of making data available for editorial or peer review, via secure access for editors and reviewers.

3.5If, or how, the data should be peer-reviewed

If a publisher decides that supporting data is to be peer-reviewed then the data policy should specify this. It should also provide information on the peer-review process for data including timing, purpose, and preferred method of sharing the data with peer-reviewers and whether the review will be anonymous.

Supporting data must be made available to editors and peer-reviewers at the time of submission for the purposes of evaluating the manuscript…..Some of these repositories offer authors the option to host data associated with a manuscript confidentially, and provide anonymous access to peer-reviewers before public release. These repositories then coordinate public release of the data with the journal's publication date. This option should be used when possible but it remains the author's responsibility to communicate with the repository to ensure that public release is made on time for online publication of the paper.

Nature, availabilityof data, material and methods

3.6Options for data sharing and access

Journal policies on sharing of research data associated with articles currently vary widely:

  • No mention of sharing or publication of research data
  • Requiring a statement on the authors’ willingness to share the data, e.g. Annals of Internal Medicine, the BMJ
  • A statement encouraging data sharing, e.g. Monthly Notices of the Royal Astronomical Society
  • Requiring all data underlying a journal article to be made available with no or minimal restrictions, e.g. PLOS Medicine, Nature, and PNAS.

Access to research data may vary from being openly available to mediated or even closed access, depending on the sensitivity of the data.

3.7How the data should be licensed

Defining permissions, terms, and conditions for reuse of data is essential so that prospective reusers know exactly what they can, and can't do when reusing data. Lack of clarity around reuse of data can have the same result as forbidding reuse of the data.

AusGOAL (Australian Governments Open Access and Licensing Framework) is:

  • An open access and licensing framework based on the international suite of Creative Commons licences
  • Designed to help select the least restrictive licence to apply to published datasets and other works
  • A common approach to data licensing across research and government

Data should be covered by a CC BY license or a less restrictive license.

- PLOS policyon data availability

3.8How researchers should cite the data

Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to outputs such as journal articles, reports and conference papers. Citing data is increasingly being recognised as one of the key practices leading to recognition of data as a primary research output. This is important because:

  • when datasets are routinely cited they will achieve greater validity and significance within the scholarly communications cycle
  • citation of data enables recognition of scholarly effort with the potential for reward based on data outputs
  • the use of data should be appropriately attributed in scholarly outputs as with other types of publication.

Style guides for data exist as suggested in the UK Digital Curation Center’s How to Cite Datasets and Link to Publications which includes APA, Chicago, MLA, Oxford

Assigning a Digital Object Identifier (DOI) to data facilitates data citation and is considered best practice. A DOI is a type of persistent identifier that indicates a dataset will be well managed and accessible for long term use. It is now routine practice for publishers to assign DOIs to journal articles and for authors to include them in article citations. Journal data policies should information about how researchers should cite their data and recommend use of DOIs.

Please provide a data sharing statement such as:
"Technical appendix, statistical code, and dataset available from the Dryad repository, DOI: [include DOI for dataset here]

- BMJ Instructions for Authors

3.9Including code and methods

Given that reproducible research is a goal of many journal data policies, it is recommended that these policies include reference to code and method availability. As with data, the policy needs to provide information about how the code and methods should be made available and whether these will be included in peer-review.

Authors must make available upon request, to editors and reviewers, any previously unreported custom computer code used to generate results that are reported in the paper and central to its main claims. Any practical issues preventing code sharing will be evaluated by the editors who reserve the right to decline the paper if important code is unavailable. Upon publication, Nature Journals consider it best practice to release custom computer code in a way that allows readers to repeat the published results.

Nature, availabilityof data, material and methods

3.10 Help with compliance and consequences of non-compliance

A data policy that clearly states the consequences of non-compliance for authors will be more effective than one which does not. However publishers will need to give careful consideration as to what the consequences of non-compliance may be and their capacity to enforce such consequences. Some authors may require assistance in order to meet compliance, particularly if their data is sensitive. Specifying a point of contact to help with compliance issues would be beneficial. A policy which specifies the consequences of non-compliance should also have a method of monitoring compliance failure such as introducing a procedure for registering complaints about non-compliance. Publishers could also consider stating their rights to post a correction to, or retraction of, the data following publication.

Refusal to share data and related metadata and methods in accordance with this policy will be grounds for rejection. PLOS journal editors encourage researchers to contact them if they encounter difficulties in obtaining data from articles published in PLOS journals. If restrictions on access to data come to light after publication, we reserve the right to post a correction, to contact the authors' institutions and funders, or in extreme cases to retract the publication.

- PLOS policyon data availability

4.Further references

ANDS overview on Data and journals

Journal Open Data Policies:a list of journals with data-sharing mandates for their published articles.

Sturges, Paul and Bamkin, Marianne and Anders, Jane H.S. and Hubbard, Bill and Hussain, Azhar and Heeley, Melanie (2014) Research data sharing: developing a stakeholder-driven model for journal policies. Journal of the Association for Information Science and Technology . ISSN 2330-1643 (In Press) (eprint)


This work is licensed under a Creative Commons Attribution 3.0 Australia License

ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program. Monash University leads the partnership with the Australian National University and CSIRO

Version 1 - Last updated: 11 December 2018 ands.org.au/guides1

Please send feedback on this Draft Guide to: