Data Sharing: Perspective from the National Institutes of Health
Dr Belinda Seto, Deputy Director, National Institutes of Biomedical Imaging and Bioengineering, National Institutes of Health, USA
The National Institutes of Health (NIH) implemented policy on data sharing in 2003. The policy reaffirmed the principle that data should be made as widely and freely available as possible, while safeguarding the privacy of research participants and protecting confidential and proprietary data. Restricted availability of unique resources on which further studies are dependent can impede the advancement of research and the delivery of medical care. Therefore, research data supported with NIH funds should be made readily available for research purposes to qualified individuals within the scientific community.
The NIH data-sharing policy expects timely release and sharing of final research data for use by other researchers. Grant applicants are expected to include a plan for sharing data or to state why data sharing is not possible, especially if $500,000 or more of direct cost is requested from the NIH in any single year. Generally data is shared in the form of publications. While it did not specify a timeline for sharing data, the NIH policy expects researchers to share data no later than the acceptance for publication of the main findings from the dataset. Data can also be shared under a data use agreement or by placements in public archives and, for sensitive data, by placing these in restricted access data centres or data enclaves.
How can this policy be reconciled with privacy laws and concerns? The NIH strongly upholds the importance of individual privacy and data confidentiality and offers caveats for sharing data involving human research participants: 1) keep data secure, and 2) de-identify data. The current U.S. medical privacy rule (HIPPA) includes 18 identifying information or data elements such as name, social security number, and health plan beneficiary number. In order to de-identify health information, researchers may completely eliminate all 18 identifying elements or statistically de-identify information such that there is a very small risk that the information could be sued to identify the subject.
To advance the data-sharing policy, the NIH has launched initiatives to develop informatics tools to overcome barriers in the fundamental differences in databases and informatics infrastructures. To the extent that commonalities can be implemented and data and tools shared, subsequent studies and secondary analyses can be initiated more quickly. Furthermore, access to databases and data mining require more user-friendly informatics tools. Approaches that combine images, genomic, gene expression, and patient medical records will ultimately deliver patient-specific information at a time and place where clinical decisions are made regarding risk, diagnosis, treatment, and follow-up. The overall strategy involves the development and standardised validation of application-specific software for integration and knowledge extraction of heterogeneous clinically relevant data.