Supplementary

Knowledge-based Discovery for Designing CRISPR-CAS Systems against Invading Mobilomes in Thermophiles

Chellapandi P* and Ranjani J

Genomic distribution of TRs and CRISPRs in extremophiles

A maximum number of TRs is existed in firmicutes Clostridium thermocellum (400)and Symbiobacterium thermophilum (373). More than100 TRs are detected in actinobacteria Thermobifida fusca (132), hyperthermophilic bacteria Sulfurihydrogenibium sp. YO3AOP1 (124), cyanobacteria Thermosynechococcus elongate (115) and deinococcus-themus Deinococcus geothermalis (124). Thermophilic archaea, crenarchaeota and caryarchaeota have less than 67 TR in their genomes. The lowest number of TRs is predicted in Ignicoccus hospitalis (18). A maximum number of CRISPRs is identified in actinobacteria and then in crenarchaeota and hyperthermophilic bacteria. The lowest number of CRISPRs is predicted in thermophilic photosynthetic bacteria. When the genomes of psychrophiles surveyed, the highest number (above 280) of TRs is identified in γ-proteobacteria Psychromonas ingrahamii 37(608 TRs and 23 CRISPRs)and then in β-proteobacteria and bacteriodes (less than 240). Number of CRISPRs is existed in psychrophilic archaea is higher than number of TRs and CRISPRs of other psycrophiles distributed equally throught out the genomes.

The number of TRs in acidophiles is variably distributed, and it distinctly found in actinobacteria Acidothermus cellulolyticus (472) compared to other acidophiles (154-301). A moderate number (200-300) of TRs is distrubuted in α-proteobacteria, acidobacteria and β-proteobacteria in which α-proteobacteria Acidiphilium cryptum JF-5 had the highest one (299). Acidophilic crenarchaeota, δ-proteobacteria, γ-proteobacteria and firmicutes have less than 60 TRs in their genomes. The highest number of CRISPRs is predicted in crenarchaeota, which almost equally distributed in the group of proteobacteria. When number of TRs searched from halophiles and alkaliphiles, it resulted a low number of TRs distribution in the genomes of halophiles, partcularly Haloquadratum walsbyi (172) as compared to alkaliphiles. More number of CRISPRs are predicted in Natronomonas pharaonis (5). The lowest number of TRs is predicted in alkaliphile Alkaliphilus oremlandii. The TRs are abunduntly distributed in Thioalkalivibrio sp. HL-EbcR7 (425 TRs and 8 CRISPRs), Alkalilimnicola ehrlichei, Desulfatibacillum alkenivorans and Alkaliphilus metalliredigens (above 250).

Figure S1. Genomic distrubution of DNA repeats (CRISPR and TR) in extremophiles

Thermophiles Psychrophiles

Acidophiles Halophiles and alkaliphiles

3.2 Genomic feature of thermophiles

As shown in Table S1, the average GC content and coding regions of selected thermophiles have 46.33% and 88.91%, respectively.The highest protein-coding genes (4709) are found in Acidovorax avenaesubsp.citrulli whereas Hydrogenobaculumsp. has the lowest protein-coding genes (1629). The greatest number strcutural RNA genes is reported (133) in Shewanella frigidimarina genome and overall average of these genomes is 59.75%. Pseudogenes in the genomes of Deinococcus deserti, Thermococcus gammotolerans EJ3 and Halobacterium sp.are absent and the highest number of them reported in A. avenae subsp. citrulli.

Table S1. Genomic features and predicted CRISPR system of recently sequenced genomes of thermophiles

Genome / Strain / Genome size (nts.) / GC content (%) / Protein coding genes / Structural RNA genes
Caldivirga maquilingensis / IC-167 / 2077567 / 43 / 1963 / 45
Deinococcus deserti / VCD115 / 2819842 / 63 / 2593 / 57
Hydrogenobaculum sp. / Y04AAS1 / 1559514 / 34 / 1629 / 54
Kosmotoga olearia / TBF 19.5.1 / 2302126 / 41 / 2118 / 55
Nitrosopumilus maritimus / SCM1 / 1645259 / 34 / 1795 / 43
Petrotoga mobilis / SJ95 / 2169548 / 34 / 1898 / 57
Thermococcus gammotolerans / EJ3 / 2045438 / 53 / 2156 / 54
Thermococcus sibiricus / MM 739 / 1845800 / 40 / 2035 / 50
Thermotoga lettingae / TMO / 2135342 / 38 / 2040 / 52
Shewanella frigidimarina / NCIMB 400 / 4845257 / 41 / 4029 / 133
Acidovorax avenae subsp. citrulli / AAC00-1 / 5352772 / 68 / 4709 / 65
Halobacterium sp. / NRC-1 / 2014239 / 67 / 2075 / 52

TableS2: Entry list of CRISPR system in thermophilic microorganisms available in CRISPR database

Organism / Accession / Genome size (nts.) / Numbers
TR / CRISPR
P / C / Q
Hyperthermophilic Bacteria
Thermosipho africanus / NC_011653 / 2016657 / 71 / 12 / 0
Thermosipho melanesiensis / NC_009616 / 1915238 / 43 / 5 / 0
Thermotoga neapolitana / NC_011978 / 1884562 / 31 / 8 / 0
Thermotoga sp. RQ2 / NC_010483 / 1877693 / 25 / 8 / 0
Dictyoglomus thermophilum / NC_011297 / 1959987 / 56 / 3 / 0
Dictyoglomus turgidum / NC_011661 / 1855560 / 50 / 2 / 1
Fervidobacterium nodosum / NC_009718 / 1948941 / 67 / 2 / 0
Persephonella marina / NC_012440 / 1930284 / 35 / 5 / 0
Thermodesulfovibrio yellowstonii / NC_011296 / 2003803 / 63 / 5 / 0
Sulfurihydrogeumbium azorense / NC_012438 / 1640877 / 92 / 13 / 0
Sulfurihydrogenibium sp. YO3AOP1 / NC_010730 / 1838442 / 116 / 3 / 3
Firmicutes
Anaerocellum thermophilum / NC_012034 / 2919718 / 140 / 9 / 0
Anoxybacillus flavithermus / NC_011567 / 2846746 / 74 / 3 / 0
Clostridium thermocellum / NC_009012 / 3843301 / 400 / 5 / 0
Geobacillus thermodenitrificans / NC_009328 / 3550319 / 71 / 3 / 1
Halothermothrix orenii / NC_011899 / 2578146 / 172 / 1 / 0
Moorella thermoacetica / NC_007644 / 2628784 / 71 / 2 / 1
Natranaerobius thermophilus / NC_010718 / 3165557 / 164 / 0 / 0
Streptococcus thermophilus / NC_006449 / 1796226 / 63 / 1 / 0
Symbiobacterium thermophilum / NC_006177 / 3566135 / 373 / 3 / 0
Thermoanaerobacter pseudethanolicus / NC_010321 / 2362816 / 84 / 7 / 0
Thermoanaerobacter_X514 / NC_010320 / 2457259 / 85 / 4 / 0
Thermoanaerobacter tengcongensis / NC_003869 / 2689445 / 61 / 3 / 0
Green non-sulfur bacteria
Thermomicrobium roseum / NC_011959 / 2003006 / 29 / 1 / 2
Deinococcus-Thermus
Deinococcus geothermalis / NC_008025 / 2467205 / 124 / 6 / 0
Deinococcus deserti / NC_012526 / 2819842 / 73 / 0 / 0
Actinobacteria
Thermobifida fusca / NC_007333 / 3642249 / 132 / 14 / 2
Cyanobacteria
Thermosynechococcus elongates / NC_004113 / 2593857 / 115 / 0 / 1
Crenarchaeota
Desulfurococcus kamchatkensis / NC_011766 / 1365223 / 19 / 2 / 0
Hyperthermus butylicus / NC_008818 / 1667163 / 26 / 2 / 0
Ignicoccus hospitalis / NC_009776 / 1297538 / 18 / 12 / 0
Metallosphaera sedula / NC_009440 / 2191517 / 48 / 4 / 0
Pyrobaculum arsenaticum / NC_009376 / 2121076 / 31 / 4 / 0
Pyrobaculum calidifontis / NC_009073 / 2009313 / 36 / 7 / 0
Pyrobaculum islandicum / NC_008701 / 1826402 / 26 / 5 / 0
Staphylothermus marinus / NC_009033 / 1570485 / 45 / 11 / 0
Thermofilum pendens / NC_008698 / 1781889 / 36 / 10 / 0
Thermoproteus neutrophilus / NC_010525 / 1769823 / 39 / 10 / 0
Euryarchaeota
Methanosaeta thermophila / NC_008553 / 1879471 / 67 / 2 / 17
Thermococcus onnurineus / NC_011529 / 1847607 / 26 / 6 / 0

TableS3: Entry list of CRISPR system in psychrophiles available in CRISPR database

Organism / Accession / Genome size (nts.) / Numbers
TR / CRISPR
P / C / Q
Gamma Proteobacteria
Colwellia psychrerythraea 34H / NC_003910 / 5373180 / 215 / 0 / 1
Psychrobacter arcticus / NC_007204 / 2650701 / 266 / 0 / 1
Psychrobacter cryohalolentis K5 / NC_007969 / 3059876 / 119 / 0 / 0
Psychrobacter sp. PRwf-1 / NC_009524 / 2978976 / 310 / 1 / 0
Psychromonas ingrahamii 37 / NC_008709 / 4559598 / 608 / 1 / 22
Shewanella frigidimarina / NC_008345 / 4845257 / 312 / 0 / 0
Shewanella piezotolerans / NC_011566 / 5396476 / 137 / 1 / 0
Beta Protoebacteria
Polaromonas naphthalenivorans / NC_008781 / 4410291 / 129 / 0 / 3
Polaromonas sp. JS666 / NC_007948 / 5200264 / 344 / 0 / 7
Delta Proteobacteria
Desulfotalea psychrophila / NC_006138 / 3523383 / 166 / 1 / 3
Bacteroides
Flavobacterium psychrophilum / NC_009613 / 2861988 / 236 / 1 / 2
Euryarchaeota
Methanococcoides burtonii / NC_007955 / 2575032 / 98 / 2 / 1

TableS4: Entry list of CRISPR system in acidophiles available in CRISPR database

Organism / Accession / Genome size (nts.) / Numbers
TR / CRISPR
P / C / Q
Acidobacteria
Acidobacteria bacterium / NC_008009 / 5650368 / 154 / 0 / 4
Acidobacterium capsulatum / NC_012483 / 4127356 / 213 / 0 / 0
Solibacter usitatus / NC_008536 / 9965640 / 301 / 0 / 3
Crenarchaeota
Sulfolobus acidocaldarius / NC_007181 / 2225959 / 46 / 4 / 0
Sulfolobus islandicus L.S.2.15 / NC_012589 / 2736272 / 76 / 4 / 1
Sulfolobus islandicus M.14.25 / NC_012588 / 2608832 / 54 / 3 / 2
Alpha Proteobacteria
Acidiphilium cryptum JF-5 / NC_009484 / 3389227 / 299 / 2 / 1
Delta Proteobacteria
Syntrophobacter fumaroxidans / NC_008554 / 4990251 / 151 / 2 / 10
Syntrophus aciditrophicus / NC_007759 / 3179300 / 55 / 2 / 1
Beta Proteobacteria
Acidovorax avenae / NC_008752 / 5352772 / 270 / 0 / 0
Acidovorax sp. JS42 / NC_008782 / 4448856 / 197 / 2 / 2
Gamma Proteobacteria
Acidithiobacillus ferrooxidans ATCC 23270 / NC_011761 / 2982397 / 86 / 4 / 2
Acidithiobacillus ferrooxidans ATCC 53993 / NC_011206 / 2885038 / 79 / 0 / 2
Actinobacteria
Acidothermus cellulolyticus / NC_008578 / 2443540 / 472 / 1 / 7
Firmicutes
Lactobacillus acidophilus / NC_006814 / 1993560 / 48 / 1 / 0

TableS5: Entry list of CRISPR system in halophiles available in CRISPR database

Organism / Accession / Genome size (nts.) / Numbers
TR / CRISPR
P / C / Q
Euryarchaeota
Halobacterium salinarum R1 / NC_010364 / 2000962 / 111 / 0 / 1
Halobacterium sp. NRC-1 / NC_002607 / 2014239 / 111 / 0 / 0
Haloquadratum walsbyi / NC_008212 / 3132494 / 172 / 2 / 0
Natronomonas pharaonis / NC_007426 / 2595221 / 153 / 4 / 1
Gamma Proteobacteria
Halorhodospira halophila / NC_008789 / 2678452 / 79 / 2 / 1

TableS6: Entry list of CRISPR system in alkaliphiles available in CRISPR database

Organism / Accession / Genome size (nts.) / Numbers
TR / CRISPR
P / C / Q
Gamma Proteobacteria
Alkalilimnicola ehrlichei / NC_008340 / 3275944 / 260 / 1 / 3
Thioalkalivibrio sp. HL-EbGR7 / NC_011901 / 3464554 / 414 / 3 / 5
Delta Proteobacteria
Desulfatibacillum alkenivorans / NC_011768 / 6517073 / 272 / 1 / 0
Firmicutes
Alkaliphilus metalliredigens / NC_009633 / 4929566 / 252 / 2 / 2
Alkaliphilus oremlandii / NC_009922 / 3123558 / 116 / 0 / 1

TableS7: List of CAS protein family and its functions

Accession / ID / Description
PF03378 / CAS_CSE1 / CAS/CSE protein, C-terminus
PF07779 / Cas1p / Cas1p-like protein
PF09344 / Cas_CT1975 / CT1975-like protein
PF09481 / CRISPR_Cse1 / CRISPR-associated protein Cse1 (CRISPR_cse1)
PF09485 / CRISPR_Cse2 / CRISPR-associated protein Cse2 (CRISPR_cse2)
PF09530 / Cas_TM1812 / CRISPR-associated protein (cas_TM1812)
PF09559 / Cas6 / Cas6 Crispr
PF09623 / Cas_NE0113 / CRISPR-associated protein NE0113 (Cas_NE0113)
PF09670 / Cas_Cas02710 / CRISPR-associated protein (Cas_Cas02710)
PF09704 / Cas_Cas5d / CRISPR-associated protein (Cas_Cas5d)
PF09705 / Cas_Cas5a / CRISPR-associated protein (Cas_Cas5a)
PF09707 / Cas_Cas2CT198 / CRISPR-associated protein (Cas_Cas2CT1978)
PF09708 / Cas_Cas5e / CRISPR-associated protein (Cas_Cas5e)
PF09827 / CRISPR_Cas2 / CRISPR associated protein Cas2
PF09455 / Cas_DxTHG / CRISPR-associated (Cas) DxTHG family
PF09484 / Cas_TM1802 / CRISPR-associated protein TM1802 (cas_TM1802)
PF09609 / Cas_GSU0054 / CRISPR-associated protein, GSU0054 family (Cas_GSU0054)
PF09611 / Cas_Csy1 / CRISPR-associated protein (Cas_Csy1)
PF09614 / Cas_Csy2 / CRISPR-associated protein (Cas_Csy2)
PF09615 / Cas_Csy3 / CRISPR-associated protein (Cas_Csy3)
PF09617 / Cas_GSU0053 / CRISPR-associated protein GSU0053 (Cas_GSU0053)
PF09618 / Cas_Csy4 / CRISPR-associated protein (Cas_Csy4)
PF09620 / Cas_csx3 / CRISPR-associated protein (Cas_csx3)
PF09651 / Cas_APE2256 / CRISPR-associated protein (Cas_APE2256)
PF09652 / Cas_VVA1548 / Putative CRISPR-associated protein (Cas_VVA1548)
PF09657 / Cas_Csx8 / CRISPR-associated protein Csx8 (Cas_Csx8)
PF09658 / Cas_Csx9 / CRISPR-associated protein (Cas_Csx9)
PF09659 / Cas_Csm6 / CRISPR-associated protein (Cas_Csm6)
PF09700 / Cas_Cmr3 / CRISPR-associated protein (Cas_Cmr3)
PF09701 / Cas_Cmr5 / CRISPR-associated protein (Cas_Cmr5)
PF09702 / Cas_Csa5 / CRISPR-associated protein (Cas_Csa5)
PF09703 / Cas_Csa4 / CRISPR-associated protein (Cas_Csa4)
PF09706 / Cas_CXXC_CXXC / CRISPR-associated protein (Cas_CXXC_CXXC)
PF09709 / Cas_Csd1 / CRISPR-associated protein (Cas_Csd1)
PF09711 / Cas_Csn2 / CRISPR-associated protein (Cas_Csn2)

1