SupplementaryMaterial 1. Details on the selection of samples for each sub-database

Based on the comprehensive phylogenetic analysis of subtype B sequences in China and beyond (Fig.2), we generated individual Sub-databases according to the geographical linkages of the Shanghai sequences.

Sub-database 1 was used toexplore the origin of and the transmission route to Shanghai for the monophyletic lineage of Shanghai sequences (SH-L).For theconstruction of this sub-database, we first performed a BLAST search on the 36 Shanghai sequences from SH-L, and selected the top 100 sequences with the highest similarityto each ofthe Shanghai referencesequence. Through the preliminaryBayesian analysis, we observed that in the MCC tree, the SH-L branched withLiaoning and Argentine sequences with high posterior probability.After that, we downloaded all available subtype B polsequences from Liaoning and Argentina from theLANL database, and used these sequences to construct an MCC tree with the 36 Shanghai sequences.After manually removing closely related sequences from the same areas (Liaoning and Argentina) and without compromising the genetic or geographic heterogeneity of each alignment according to the MCC tree,we compiled the finalsub-database 1 as described in Fig.1.

Sub-databases2, 3 and 4 were generated accordingly to identify the transmission linkage between HIV-1 subtype B in Shanghai and other countries (Japan and Korea) and Taiwan, China.Sub-database 2 (104 sequences),was mainly selected from JP-L1 and JP-L2, while sub-database 3 (70 sequences) was chiefly from KR-L. Sub-database 4(103 sequences) was selected based on TW-L1 and TW-L2. All of these three sub-databasescontain a set of subtype Breferencesequences from other (more distant) countries that are scattered in the phylogenetic tree.

Sub-database 5 was used to identify the transmission linkage between HIV-1 subtype B in Shanghai and other domestic areas of China.Sub-database 5mainly includedsequences from BJ-L,CC-L and other sporadicminor lineages (shown in the tree, but not labeledin Fig. 2).

Supplementary Material 2.Bayesian skyline plot of subtype B strains in Shanghai. Molecular clock analysis was performed using BEAST v1.7. The Bayesian skyline plot output was analyzed using Tracer v1.5. Markov Chain Monte Carlo (MCMC) chains were run with at least 200 million generations and sampled every 1000 steps. The x-axis represents time in years, and the y-axis represents the effective population size. The mean estimates are shown as thick solid line and the 95% Highest posterior density (HPD) credible region is shown as blue areas.

Supplementary Material 3. Characteristics of 13 supraregional transmission networks with 30 contributing Shanghai subtype B sequences.

Network / Network size / tMRCA / Shanghai sequences
No. / Diagnosed time
Japan
1 / 24 / 2001.0 (1998.7-2003.3) / 3 / 2010-2013
2 / 10 / 2003.0 (2001.7-2004.4) / 1 / 2010
3 / 9 / 2001.4 (1998.5-2004.3) / 1 / 2012
4 / 4 / 2000.0 (1995.3-2004.8) / 1 / 2010
Korea
1 / 8 / 1991.9 (1990.4-1993.4) / 6 / 2010-2012
2 / 7 / 1998.6 (1995.5-2001.7) / 1 / 2004
Taiwan
1 / 8 / 2001.8 (1998.3-2005.3) / 6 / 2012-2014
2 / 7 / 2002.7 (2000.3-2005.1) / 2 / 2011
3 / 7 / 2000.8 (1997.5-2004.1) / 1 / 2013
4 / 4 / 1998.5 (1995.7-2001.4) / 1 / 2010
5 / 4 / 2004.6 (2002.3-2007.0) / 1 / 2013
6 / 6 / 1999.1 (1995.7-2002.5) / 1 / 2011
7 / 8 / 1997.0 (1991.2-2002.9) / 5 / 2009-2013