Supplementary Fig. 1 Classification of the 144 bacteria based on the normalized amino acid compositions at the N-terminal region and on the amino acid compositions.

The dendrograms represent the results of the hierarchical clustering analysis of 144 bacteria based on the normalized amino acid compositionsat the N-terminal region (a), and on the amino acid compositions (b).The normalized amino acid compositions represent the biases of an amino acid residue at a position of all amino acid sequences of a bacterium. In this dendrogram, colored shapes represent the taxonomic classes (purple squares: Alphaproteobacteria; light-blue squares: Betaproteobacteria; blue squares: Gammaproteobacteria; green squares: Deltaproteobacteria; yellow squares: Epsilonproteobacteria; gray circles: Actinobacteria; pink circles: Bacilli; orange circles: Clostridia). The values in parentheses are the G+C contents of the DNA sequences of a bacterium.

Supplementary Fig. 2 Means of the normalized amino acid compositionsin each taxonomic class.

The means of the normalized amino acid compositions (N b a p)were calculated in each taxonomic class. The means at the N-terminal region (2≤p≤41) are shown in 1-1 to 1-17 (1-1: Lys; 1-2: Asn; 1-3: Ile; 1-4: Arg; 1-5: Gln; 1-6: Met; 1-7: Asp; 1-8: Glu; 1-9: Ala; 1-10: Phe; 1-11: His; 1-12: Leu; 1-13: Val; 1-14: Tyr; 1-15: Gly; 1-16: Trp; 1-17: Cys). The means at the C-terminal region (n−39≤p≤n) are shown in 2-1 to 2-18 (2-1: Lys; 2-2: Asn; 2-3: Thr; 2-4: Ser; 2-5: Ile; 2-6: Arg; 2-7: Gln; 2-8: Met; 2-9: Asp; 2-10: Glu; 2-11: Phe; 2-12: His; 2-13: Leu; 2-14: Val; 2-15: Tyr; 2-16: Gly; 2-17: Trp; 2-18: Cys). The means of the normalized amino acid compositions in each taxonomic class are represented by colored lines (purple: Alphaproteobacteria; light-blue: Betaproteobacteria; blue: Gammaproteobacteria; green: Deltaproteobacteria; yellow-green: Epsilonproteobacteria; gray: Actinobacteria; pink: Bacilli; orange: Clostridia).

Supplementary Fig. 3 S a p l for each type of amino acid residue at the terminal regions.

S a p l for each type of amino acid residue at the N-terminal region (2≤p≤41) are shown in 1-1 to 1-17 (1-1: Lys; 1-2: Asn; 1-3: Ile; 1-4: Arg; 1-5: Gln; 1-6: Met; 1-7: Asp; 1-8: Glu; 1-9: Ala; 1-10: Phe; 1-11: His; 1-12: Leu; 1-13: Val; 1-14: Tyr; 1-15: Gly; 1-16: Trp; 1-17: Cys). S a p l for each type of amino acid residue at the C-terminal region (n−39≤p≤n) are shown in 2-1 to 2-18 (2-1: Lys; 2-2: Asn; 2-3: Thr; 2-4: Ser; 2-5: Ile; 2-6: Arg; 2-7: Gln; 2-8: Met; 2-9: Asp; 2-10: Glu; 2-11: Phe; 2-12: His; 2-13: Leu; 2-14: Val; 2-15: Tyr; 2-16: Gly; 2-17: Trp; 2-18: Cys).The scores, S a p l, represent the correspondence between a result of the hierarchical clustering analysis and the classification according to the taxonomic classes.The variable a, p, l and n is amino acid residue, the position from the termini, the distance from position p, and the length of amino acid sequences, respectively.

Supplementary Table 1 Genomes of the 144 bacteria

The chromosomal sequences of 144 bacteria were obtained from a public database available at the NationalCenter for Biotechnology Information (NCBI). The taxonomic classes of these 144 bacteria were Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, Epsilonproteobacteria, Actinobacteria, Bacilli, and Clostridia.

Supplementary Table 2 Prediction of the subcellular localization for amino acid sequences of the 144 bacteria

For all annotated amino acid sequences deduced from 144 bacterial genomic DNA, subcellular localization was predicted. This prediction was performed with PSORTb v.2.1.0. The percentages of these predicted amino acid sequences were averaged in each taxonomic class.

1