Kuhn et al.

Filovirus RefSeq entries: decision on filovirus type variants, type sequences, and names

Jens H. Kuhn1*#¶‡, Kristian Andersen2, Yīmíng Bào3‡, Sina Bavari4, Stephan Becker5, Richard S. Bennett6, Nicholas G. Bergman6, Olga Blinkova3‡, Steven Bradfute7, J. Rodney Brister3‡, Alexander A. Bukreyev8#, Kartik Chandran9#, Aleksandr A. Chepurnov10, Robert A. Davey11, Ralf Dietzgen12¶, Norman A. Doggett13, Olga Dolnik5#, John M. Dye4#, Sven Enterlein14, Paul Fenimore13, Stephen Gire2, Jean-Paul Gonzalez14, Anthony Griffiths11, Pierre Formenty16, Alexander N. Freiberg8, Christian T. Happi, Lisa E. Hensley1, Andrew S. Herbert4, Michael C. Hevey6, Thomas Hoenen17, Anna N. Honko1, Georgy M. Ignatyev18, Peter B. Jahrling1, Joshua Johnson1, Karl M. Johnson19, Hans-Dieter Klenk5, Gary Kobinger20, Tadeusz J. Kochel6, Matthew G. Lackemeyer1, Nicole Lackemeyer4, Daniel F. Lackner6, Eric M. Leroy21#, Mark S. Lever22, Elke Mühlberger23#, Sergey V. Netesov24#, Gene G. Olinger1, Sunday Omilabu, Gustavo Palacios4, Jean L. Patterson11#, Rekha Panchal4, Danny Park2, Janusz T. Paweska25#, Clarence J. Peters8, James Pettitt1, Louise Pitt4, Sheli R. Radoshitzky4, Elena I. Ryabchikova26, Erica Ollmann Saphire27#, Pardis Sabeti2, Rachel Sealfon2, Aleksandr M. Shestopalov24, Sophie J. Smither22#, Nancy J. Sullivan28, Robert Swanepoel29, Ayato Takada30#, Jonathan S. Towner31#, Guido van der Groen32, Viktor E. Volchkov33#, Valentina A. Volchkova33, Victoria Wahl-Jensen6, Travis K. Warren4#, Kelly L. Warfield14, Manfred Weidmann34, Stuart T. Nichol31*

1Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, Maryland, USA; 2Center for Systems Biology, Harvard University, Cambridge, MA, USA; 3Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA; 4United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA; 5Institut für Virologie, Philipps-Universität Marburg, Marburg, Germany; 6National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, Maryland, USA; 7University of New Mexico, Albuquerque, New Mexico, USA; 8Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, Texas, USA; 9Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, USA; 10xxx; 11Department of Virology and Immunology, Texas Biomedical Research Institute, San Antonio, Texas, USA; 12Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, Australia; 13Los Alamos National Laboratory, Los Alamos, New Mexico, USA; 14Integrated BioTherapeutics, Inc., Gaithersburg, Maryland, USA; 15Metabiota, Inc., San Francisco, California, USA; 16World Health Organization, Geneva, Switzerland; 17Laboratory for Virology, Division of Intramural Research, National Institute for Allergy and Infectious Diseases, National Institutes of Health; 18Federal State Unitary Company "Microgen Scientific Industrial Company for Immunobiological Medicines", Ministry of Health of the Russian Federation, Moscow, Russia; 19Bozeman, Montana, USA; 20Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada; 21Centre International de Recherches Médicales de Franceville, Franceville, Gabon; 22Biomedical Sciences Department, Dstl, Porton Down, Salisbury, Wiltshire, UK; 23Department of Microbiology and National Emerging Infectious Diseases Laboratory, Boston University School of Medicine, Boston, Massachusetts, USA; 24Novosibirsk State University, Novosibirsk, Novosibirsk Region, Russia; 25Center for Emerging and Zoonotic Diseases, National Institute for Communicable Diseases of the National Health Laboratory Service, Sandringham-Johannesburg, Gauteng, South Africa; 26Institute of Chemical Biology and Fundamental Medicine, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Novosibirsk Region, Russia; 27Department of Immunology and Microbial Science and The Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California, USA; 28Vaccine Research Center, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, USA; 29Zoonoses Research Unit, University of Pretoria, Pretoria, South Africa; 30Division of Global Epidemiology, Hokkaido University Research Center for Zoonosis Control, Sapporo, Japan; 31Viral Special Pathogens Branch, Division of High-Consequence Pathogens Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA; 32Prins Leopold Instituut voor Tropische Geneeskunde, Antwerp, Belgium; 33Laboratoire des Filovirus, Inserm U758, Université de Lyon, UCB-Lyon-1, Ecole-Normale-Supérieure de Lyon, Lyon, France; 34Universitätsmedizin Göttingen, Abteilung Virologie, Göttingen, Germany

*Corresponding authors: JHK: Integrated Research Facility at Fort Detrick (IRF-Frederick), Division of Clinical Research (DCR), National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH), B-8200 Research Plaza, Fort Detrick, Frederick, MD 21702, USA; Phone: +1-301-631-7245; Fax: +1-301-619-5029; Email: ; STN: Centers for Disease Control and Prevention (CDC), National Center for Emerging and Zoonotic Infectious Diseases (NCEZID), Division of High-Consequence Pathogens Pathology (DHCPP), Viral Special Pathogens Branch (VSPB), 1600 Clifton Road, Atlanta, GA 30333, USA; Phone: +1-404-639-1122; Email:

#Members of the 2012-2014 International Committee on Taxonomy of Viruses (ICTV) Filoviridae Study Group

¶NCBI Viral RefSeq Genomes Advisors for members of the order Mononegavirales

‡Members of the NCBI Genome Annotation Virus Working Group

Keywords: Bundibugyo virus; cDNA clone; cuevavirus; Ebola; Ebola virus; ebolavirus; filovirid; Filoviridae; filovirus; genome annotation; ICTV; International Committee on Taxonomy of Viruses; Lloviu virus; Marburg virus; marburgvirus; mononegavirad; Mononegavirales; mononegavirus; Ravn virus; RefSeq; Reston virus; reverse genetics; Sudan virus; Taï Forest virus; virus classification; virus isolate; virus nomenclature; virus strain; virus taxonomy; virus variant

Disclaimer: The content of this publication does not necessarily reflect the views or policies of the US Department of the Army, the US Department of Defense or the US Department of Health and Human Services or of the institutions and companies affiliated with the authors. JHK performed this work as employees of Tunnell Government Services, Inc., and MGL as an employee of Lovelace Respiratory Research Institute, both subcontractors to Battelle Memorial Institute under its prime contract with NIAID, under Contract No. HHSN272200700016I. This research was further supported in part by the Intramural Research Program of the NIH, National Library of Medicine (YB, OB, and JRB), and the Intramural Research Program of the NIH, NIAID (TH). This work was also funded under Agreement No. HSHQDC-07-C-00020 awarded by the Department of Homeland Security Science and Technology Directorate (DHS/S&T) for the management and operation of the National Biodefense Analysis and Countermeasures Center (NBACC), a Federally Funded Research and Development Center. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. In no event shall the DHS, NBACC, or Battelle National Biodefense Institute (BNBI) have any responsibility or liability for any use, misuse, inability to use, or reliance upon the information contained herein. The Department of Homeland Security does not endorse any products or commercial services mentioned in this publication.
ABSTRACT

Sequence determination of complete or at least coding-complete virus genomes is becoming more and more common to support the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, and sequencing equipment is being modified to allow for use by non-experts while software is constantly being improved to simplify sequence data management. It is therefore now feasible to analyze virus disease outbreaks on the molecular level, including the characterization of the evolution of individual virus populations in single individuals over time. The increasing amount of sequencing data accumulating will create a management problem for the curators of commonly used sequence and other databases, and an entry retrieval problem for end users. Nomenclature and annotation standards for virus isolates and their genomic sequences are therefore paramount to utilize the data to their fullest potential. National Center for Biotechnology Information (NCBI) RefSeq is a non-redundant, curated database for reference (aka type) nucleotide sequence records that serve as source data to numerous other databases. Building on recently proposed templates for filovirus variant naming (<virus name> (<strain>/)<isolation host-suffix>/<country of sampling>/<year of sampling>/<genetic variant designation>-<isolate designation>), we here present filovirus community-wide consensus decisions on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.

REFSEQ

The National Center for Biotechnology Information (NCBI) RefSeq project was initiated to create a non-redundant and curated set of genomic, transcript, and protein sequence records [44]. Genomic RefSeq records provide a reference nucleotide sequence wherein individual protein coding regions and other sequence features are annotated using the best available experimental data as a guide. Akin to reference specimens being labeled type specimens in several taxonomic schemes, RefSeq reference sequences can be considered type sequences for type viruses.

In the case of virological RefSeq records, only one genome sequence record was initially constructed to represent each viral species, and all other genome records for members of the same species, or for different strains, variants, and isolates of the same member of this species, were linked to this record as “genome neighbors” [7]. The rationale behind choosing particular virus isolate sequence as reference sequence is unclear in most cases and almost never has been published. Annotation of individual RefSeq entries was performed using PubMed-indexed experimental data through NCBI in-house and individual expert curation – subspecialty-wide committees or expert groups had not been established.

Over the past decade, the number of sequenced viral genomes has increased exponentially [12], fundamentally altering the curation of genome sequence data. Little to no experimental data are available for most new virus genomes, and annotation is often computationally transferred from related genomes or predicted de novo [23]. Moreover, the utility of reference genomes has expanded to include use in sequence assembly and pathogen detection pipelines [11, 18, 21, 24, 47]. With these changes, the data model has adapted, and multiple RefSeq records are now maintained for many members of many viral species (http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10239). This approach provides representation of the extant sequence diversity (or genotypes) within a particular species, and also provides a mechanism to maintain well annotated records from experimentally important laboratory isolates, as well as less studied isolates found in the wild.

CURRENT FILOVIRUS REFSEQ ENTRIES

The mononegaviral family Filoviridae includes three genera, Cuevavirus, Ebolavirus, and Marburgvirus. Eight distinct filoviruses are recognized as members of a total of seven species distributed among those three genera (Table 1) [2, 3, 14, 26, 27]. These eight viruses are differentiated from each other by biological characteristics [26] and genomic sequence divergence [8, 26, 27, 32]. This divergence is determined based on sequences of well-characterized variants of root viruses (from here on called type variants of type viruses) for each taxon [26]. De facto, these sequences therefore become type sequences. Using type sequences allows algorithmic representation of filovirus relationships and theoretically the automatic preliminary assignment of newly isolated filoviruses to existing or novel taxa (Figure 1).

Temporary type filovirus variants have been previously established by the 2010-2012 ICTV Filoviridae Study Group [26]. These temporary type variants are largely consistent with those chosen for RefSeq (Table 2), but it is unclear by whom and on which grounds these decisions were made at NCBI and they therefore need re-evaluation by filovirus experts. In addition, the current RefSeq entries have to be relabeled to confirm with ICTV Taxonomy, and type filovirus variant designations have to be chosen and the individual isolate names have to be adjusted to the filovirus strain/variant/isolate schemes that were recently established [29] to achieve uniformity and consistency.

REFSEQ ENTRY REEVALUATION

The “gold standard” filovirus type RefSeq entry should represent a repository of functional information about a particular filovirus and should be selected on the basis of experimental importance and accessibility. It is of crucial importance that any functional annotation of a RefSeq entry, e.g., the reference to functions of particular genome parts or of genome-encoded proteins, be associated with the actual sequence associated with these experiments. This means that the better characterized a particular virus/variant/isolate/sequence is the more appropriate it is to choose it as a RefSeq entry, independent of whether it is the first one discovered or the most widely used. Importantly, decisions on RefSeq entries does not entail a mandate that future experiments should necessarily be performed with the viruses associated with these entries. However, direct comparisons with RefSeq-associated viruses are highly recommended to further increase the detail associated with the RefSeq entries, which should be updated and if necessary corrected on a continuous basis.

Consequently, the authors of this article confirmed or replaced the current taxonomic type virus variants and isolates and the current filovirus RefSeq entries based on availability of scientific information characterizing a particular virus and, if scientific information is scarce for variants of entire taxa, based on other criteria such as availability, passaging history, or medical importance. Decisions were reached by consensus/simple majority voting with the understanding that all authors will apply the final decisions reached by the entire group and enforce them in their functions as authors, peer-reviewers, or editors.

Cuevavirus RefSeq entries

Only one cuevavirus, Lloviu virus (LLOV), has been described [41]. At the time of writing, LLOV has not been isolated in culture and the sequence diversity of LLOV only has been defined in a single study using deep sequencing techniques on samples from deceased Schreiber’s long-fingered bats [41]. Only one additional study has been published on this virus, characterizing molecular-biological characteristics of the LLOV glycoprotein [39]. The coding-complete genome of one LLOV has been determined (Genbank #JF828358), which therefore automatically became the current RefSeq sequence (#NC_016144) (see [31] for sequencing nomenclature used in this article). In the absence of additional deposited LLOV sequences and characterization data, this RefSeq entry should therefore be upheld but be considered temporary.

In line with filovirus strain/variant/isolate definitions outlined previously [29], we propose the variant designation “Asturias” (after the Principality of Asturias in Spain, where Cueva del Lloviu is located in which LLOV was discovered [41]) and the “isolate” name “Bat86” (instead of “MS-Liver-86/2003”) for this virus:

Full name: Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86

Shortened name: LLOV/M.sch/ESP/03/Ast-Bat86

Abbreviated name: LLOV/Ast-Bat86

Accordingly, in RefSeq #NC_016144 the title should be changed to “Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86, [coding-]complete genome”; the RefSeq <strain> field should be cleared; and the RefSeq <isolate> field should contain “Lloviu virus M.schreibersii-wt/ESP/2003/Asturias-Bat86”. The same changes should be applied to GenBank #JF828358.

Ebolavirus RefSeq entries