Supplementary Table 1: Sharing Initiatives
Project name / Sharing functionality / Sharing trendsCommercial
DNAnexus
/ HostsNationalCenter for Biotechnology Information (NCBI) Sequence Read Archive (SRA) for raw sequence data from next-gen sequencing platforms. / Built user interface and mirrors 300-400 terabytes of SRA data (no medical data) on Google’s cloud. Offers proprietary cloud-based analysis and visualization tools that users can share.
Illumina
/ Data-sharing, analysis and storage for Illumina platform users / BaseSpace, genomics data-sharing space on Amazon’s AWS cloud infrastructure, in beta testing phase. Requires user registration.
Life TechnologiesIon Torrent Community
/ User portal share data, protocols and code / Sharing portal Ion Torrent Community requires user registration
Complete Genomics
/ Offers sequencing services, data management, analysis, and results sharing. / Downstream analysis and data sharing services invade market of software service providers.
Genedata
/ Software has built-in sharing functionality for power-users and workflow users / Positioned for pharma outsourcing and public-private projects, such as Europe’s InnoMedPredTox.
GenomeQuest
/ Sharing functionality built into analysis tools / As customer sharing behavior changes, less interest in data storage. More sharing of analysis results that sharing of raw data.
ID Business Solutions
/ Software and consulting firm with platform for data analysis and data integration, InforSense Suite / Lung Genomics Research Consortium expanded one of suite’s components, ClinicalSense, for its data analysis and sharing portal
Non-commercial alliances / Description / Projects
PistoiaAlliance
/ Collaborative group of pharma and life sciences companies exploring pre-competitive data-sharing. / Launches data-sharing projects for next-gen sequence data, biomarker exchange standards. Runs competitions, for ex. Sequence Squeeze Competition seeking algorithm to compress next-gen sequence data
BioIT Alliance
/ Founded by Microsoft, now a non-profit organization. / Seeks to create standards—data models and transmission standards—to enable data-sharing in translational medicine
Non-profit initiatives
BioSharing
/ International network of organizations geared toward data-sharing and standardization in the life sciences / Developed standard called ISA Commons to streamline data sharing
crowdLabs
/ Repository for computational workflows (not only life sciences); offers access to high performance computing / Uses VisTrails, an open source workflow system.
Galaxy
/ Web and cloud-based open source sequence analysis tools / Galaxy Pages lets users see, re-use, and extend workflows
myExperiment Virtual Research Environment
/ Collaboration between the universities of Southampton, Manchester and Oxford in the UK. / Platform to share workflows. Users can share workflows openly or keep them private.
NationalCenter for Biotechnology Information (NCBI)
/ Online resources with databases and analysis tools. A division of the National Library of Medicine at the National Institutes of Health. / DNA sequence resource, GenBank, run by NIA, EMBL and DNA DataBank of Japan. Dozens of terabytes of data are downloaded from NCBI resources every day.
Sage Bionetworks
/ A non-profit focused on sharing science founded by former Merck researchers Stephen Friend and Eric Schadt / Launches research collaborations, for example the public-private CommonMind Consortium to share neuropsychiatric disease
World Wide Web Consortium Semantic Web
/ Part of the international community organization World Wide Web Constorium (W3C) / Has groups devoted to data-sharing in the life sciences, for example Semantic Web Health Care and Life Sciences Interest Group
Workflow 4 Ever
/ Web-based resource to preserve and share methods and workflows. / Has partners in genomics and astronomy. Complementary to SHIWA (Sharing Interoperable Workflows for large-scale scientific simulations on distributed computing infrastructures)
Sharing networks and repositories
BioPortal
/ Repository run by The National Center for Biomedical Ontology, part of the National Centers for Biomedical Computing / Has portal stores over 300 controlled vocabularies and ontologies in biomedicine. Users can submit download ontologies and upload them to share with others.
Concept Web Alliance / Group effort addressing semantic web applications, based at The Netherlands Bioinformatics Centre / Establishing uniform, user-friendly online platform for text-mining from published texts, databases, and offline resources.
Cytoscape
/ Open-source software to analyze and visualize biological networks / Developers are working on a database for sharing network models.
Datacite
/ A non-profit, international organization of libraries / Offers service for data publishers to mint Digital Object Identifiers (DOIs) for data-sharing. DOIs are also available for datasets. Datacite is compiling a list of research data repositories and working on ways to use DOI to retrieve metadata.
Force11
/ A group of editors, publishers, scientists librarians, and research funders / Formed in 2011 to explore new ways to share, create, and communicate scholarly knowledge.
Genocoding Project
/ A data harvesting initiative based at the university of California at Santa Cruz and the University of Manchester / Software tool scans journal papers for genomic identifiers and maps them to human genome.
Nanopublications / A venture seeking to use semantic tools to harvest assertions and to them with DOIs / Nanopublications are being tested in Open Pharmacological Concepts Triple Store (Open PHACTS), a European public- private venture
EMBL: European Molecular Biology Laboratories. Sources: Nature Biotechnology research, Frost & Sullivan, company data