1

Supplementary Information – Materials and Methods

RNA isolation, PCR, and sequencing. Virus RNA was extracted either from aliquots of original respiratory material or from cell-grown viral isolates using either the NucliSENS® easyMag® system (bioMérieux UK Ltd, Hampshire, England) or the QIAxtractor robot (Qiagen Ltd, West Sussex, England). Amplification and sequencing of whole influenza genomes was performed from original clinical respiratory material at CfI or from cell-grown isolates at Wellcome Trust Sanger Institute (WTSI). A number of methods were used. Clinical samples were sequenced, by performing two-step RT-PCR of full or half segments, followed by direct sequencing of the products. Reverse transcription was performed using Superscript® III RT reverse transcriptase (Invitrogen Ltd, Paisley, England) following manufacturer’s instructions and the universal primer for influenza Uni12 (3). Twelve fragments covering the eight influenza genes were amplified, using Platinum® Pfx, a proof-reading DNA polymerase (Invitrogen). Primers for amplification of complete open reading frames (ORFs) of pandemic H1N1 influenza were supplied by Eurofins MWG Operon (Ebersberg, Germany) (Table S3). PCR clean up and sequencing reactions were performed by the Genomic Services Unit at the Department for Bioanalysis and Horizon Technologies, CfI, HPA. RT-PCR products were prepared for sequencing by purifying with Ampure magnetic beads on the Biomek® NxP robot (Beckman Coulter, High Wycombe, England). Sequencing forward and reverse primers were designed in a gene-walking approach to hybridise approximately every 400 bp (Table S3). The cleaned PCR products were used for sequencing with the ABI BigDye® Terminator Kit v3.1 (Applied Biosystems, Warrington, UK) followed by clean-up of products with the CleanSeq magnetic beads on the Biomek® NxP robot (Beckman Coulter). Automated sequence detection was performed on a 48-capillary ABI 3730 DNA Analyser (Applied Biosystems). Raw sequencing data was edited and assembled into contigs covering the complete open reading frame (ORF) of each gene using Sequencher® 4.9 (Gene Codes Corporation, Ann Arbor, MI, USA).

Cultured viruses were sequenced using a two-step RT-PCR approach employing H1N1/09-specific PCR primers (Sigma-Aldrich, Gillingham, Dorset, UK) containing 5’ extensions identical to the M13F and M13R sequencing primers, similar to the method described by Ghedin et al. (2). cDNA was generated by reverse-transcribing RNA from cell-grown virus using SuperScript® III reverse transcriptase and either Uni12 or M13F uni12 primer (see below), and the cDNA was PCR-amplified in 80 or 86 reactions in 96-well plates using Phusion® Hot Start High-Fidelity DNA polymerase (Finnzymes/NEB, Hitchin, UK) and primers shown in Table S4. Reactions were performed in a volume of 10 µl under an overlay of 10 µl Vapor-Lock (Qiagen), and contained 0.04 µl of cDNA, and final concentrations of 200 µM each dNTP, 0.5 µM each primer and 0.02 U/µl Phusion® Hot Start High-Fidelity DNA polymerase. Thermal cycling conditions were: initial denaturation of 98◦C for 30 seconds; five cycles of 98◦C for 5 seconds, 60◦C for 20 seconds, 72◦C for 10 seconds; 30 cycles 98◦C for 5 seconds, 68◦C for 20 seconds, 72◦C for 10 seconds; and final extension of 72◦C for 5 min. Following cycling, 2.5 µl of each reaction was transferred to a new 96-well plate, and treated with 1.0 µl ExoSAP-IT® (USB, High Wycombe, UK) at 37◦C for 15 minutes, followed by inactivation at 80◦C for 15 minutes. The final products were diluted 1/50 before sequencing. All primer-walking PCR products were sequenced with M13F (5’-TGTAAAACGACGGCCAGT-3’) and M13R (5’-CAGGAAACAGCTATGAC-3’) and BigDye® Terminator v3.1 sequencing chemistry (Applied Biosystems) on ABI3730xl capillary sequencers. Sequencing reads were trimmed of primer sequences and poor quality regions, and were assembled into contigs using our own scripts. Assemblies were viewed and consensus sequences generated using Gap4 (1). Ambiguous bases were manually detected in Gap4 and converted to IUBMB ambiguity codes in the final consensus sequences.

Whole-genome amplification by 8-segment PCR and sequencing by Illumina second generation methods. For the second wave samples, RNA from clinical samples was RT-PCR amplified using the 8-segment PCR method of Zhou et al. (7), with some modifications: we substituted the primers used by Zhou et al. (7) with M13F uni12A (5’-TGTAAAACGACGGCCAGTAGCAAAAGCAGG-3’), M13F uni12G (5’-TGTAAAACGACGGCCAGTAGCGAAAGCAGG-3’), and M13R uni13 (5’-CAGGAAACAGCTATGACAGTAGAAACAAGG-3’) primers. We performed two reactions for each sample, one containing the primers at concentrations of 0.25µM M13F uni12A, 0.25µM M13F uni12G, and 0.5µM M13R uni13, and another containing primers at concentrations of 0.5µM M13F uni12G and 0.5µM M13R-uni13 (ie. omitting the M13F uni12A primer); the latter greatly improves amplification of the PB2, PB1, and PA segments in our experience. Reactions were performed in a volume of 50 µl under an overlay of 20 µl Vapor-Lock (Qiagen), and contained 5.0 µl of RNA isolated from clinical material, and final concentrations of 1X SuperScript® III One-Step RT-PCR reaction buffer, 0.5 µM each primer and 1.0 µl SuperScript® III RT / Platinum® Taq High Fidelity Enzyme Mix. Thermal cycling conditions were: reverse transcription at 42◦C for 15 minutes, 55◦C for 15 minutes, 60◦C for 5 minutes; initial denaturation/enzyme activation of 94◦C for 2 minutes; five cycles of 94◦C for 30 seconds, 45◦C for 30 seconds, slow ramp (0.5◦C/sec) to 68◦C, 68◦C for 3 minutes; 30 cycles 94◦C for 30 seconds, 57◦C for 30 seconds, 68◦C for 3 minutes; and final extension of 68◦C for 5 min. Equal volumes of both reactions were combined and used as template for primer-walking PCR as above, or were sequenced on an Illumina Genome Analyzer IIe (Illumina, Little Chesterford, UK) (see below) (Table S2).

Products from the 8-segment PCR were sheared to a length of 200-400bp using a Covaris AFA (Covaris, Woburn, Massachusetts), end-repaired, A-tailed, and ligated with Illumina sequencing adaptors containing identifying tags allowing multiplex sequencing of 12 samples per lane. The products were sequenced on an Illumina GAIIe using a single-end 54bp run, following the manufacturer’s instructions. The resulting reads were assembled against a reference sequence (the genome of A/California/04/2009) using SSAHA2 (5)with the options ‘-rtype solexa -skip 1’, and the resulting assemblies were evaluated for coverage and quality using samtools (4). For consensus sequence calling, we employed the criteria: a) the majority high-quality base (quality score ≥23) was used as the consensus base and b ) at positions where minority high-quality bases were present at a frequency of ≥20% of the total coverage at that position, these bases were included in a calculation of the IUBMB ambiguity code.

Supplementary Information – Results.

A number of amino acid substitutions were observed; (i) cluster D isolates (and 5 other isolates, including two from the UK second wave) with a serine (S) to asparagine (N) change at position 162 of the HA gene (179 in Supplementary Information Table S1), (ii) cluster H isolates with a leucine (L) (591 isolates) or isoleucine (I) (1 isolate) to phenylalanine (F) change at position 161 of the HA gene (178 in Supplementary Information Table S1); (iii) cluster K isolates (plus 10 closely related isolates and one more distantly related UK isolate), with an aspartic acid (D) to glutamic acid (E) change at position 222 of the HA gene (239 in Supplementary Table S1). Amino acid positions 161 and 162 lie within antigenic site Sa of HA, proximal to the receptor-binding pocket (6) and amino acid 222 is within antigenic site Ca2.

Supplementary Information – References

1. Bonfield, J. K., K. Smith, and R. Staden. 1995. A new DNA sequence assembly program. Nucleic Acids Res. 23: 4992-4999.

2. Ghedin, E., N. A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D. J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D. J. Lipman, C. M. Fraser, J. K. Taubenberger, and S. L. Salzberg. 2005. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution. Nature 437: 1162-1166.

3. Hoffmann, E., J. Stech, Y. Guan, R. G. Webster, and D. R. Perez. 2001. Universal primer set for the full-length amplification of all influenza A viruses. Arch. Virol. 146: 2275-2289.

4. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079.

5. Ning, Z., A. J. Cox, and J. C. Mullikin. 2001. SSAHA: A fast search method for large DNA databases. Genome Res. 11: 1725-1729.

6. Xu, R., D. C. Ekiert, J. C. Krause, R. Hai, J. E. Crowe, and I. A. Wilson. 2010. Structural basis of preexisting immunity to the 2009 H1N1 pandemic influenza virus. Science 328: 357-360.

7. Zhou, B., M. E. Donnelly, D. T. Scholes, K. St George, M. Hatta, Y. Kawaoka, and D. E. Wentworth. 2009. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. J. Virol. 83: 10309-10313.

Supplementary Information – Figure legends

Figure S1. Full BEAST tree showing distribution of UK isolates in the context of global pandemic H1N1/09 isolates. UK lineages are shown with red lines, all other lineages have black lines. UK-specific clusters are indicated by grey boxes with adjacent cluster labels. Red dots indicate nodes with ≥95% posterior support, orange dots indicate nodes with posterior support <95% but ≥80%. Amino acid changes which characterise global cluster 7 are shown adjacent to the branches on which they likely occurred. Known (or suspected) importations are indicated with an asterisk adjacent to the isolate name. Some groups with high posterior support are collapsed for clarity, with locations of isolates (and numbers of isolates from each location) shown. The UK isolates are colour coded according to the geographical region from where the sample was obtained, namely, London, Red; South East England, Pink; South West England, Dark Red; East of England, Yellow; West Midlands, Grey; East Midlands, Orange; Yorkshire, Dark Green; North West England, Blue; North East England, Light Blue; Republic of Ireland, Green and Scotland, Dark Blue. Blue, pink, and yellow bars represent pre-detection, first wave, and second wave phases of the pandemic in the UK.

Figure S2. ML tree showing distribution of UK isolates in the context of unfiltered global pandemic H1N1/09 isolates. UK lineages are shown with red lines, all other lineages have black lines. UK-specific clusters are indicated by grey boxes with adjacent cluster labels. Bootstrap support from 100 replicates is shown at nodes.

Figure S3. Pictorial representation of amino acid changes at selected sites. Only cluster-defining sites are shown. Position numbers surrounded by boxes indicate signature positions in global isolates. Remaining positions are signature positions for UK isolates.