Fgenesh_smo2 and genescan_smo2 gene predictions
Aligned the two gene predictions using Multalign
The two sequences are very similar for the most part. The genescan gene model has three stretches of AA that are not present in the fgenesh gene model (AA 502-523, 546-555, and 577-600). There is a small section (AA 524-545 and 556-567) that have low sequence similarity between the two gene prediction models.
NCBI Blastp genescan_smo2
Putative conserved domains have been detected, click on the image below for detailed results.
// / / / / / / / / / / / /
/ / / / / / / / / / / / / / / / / / / / / / / / /
/ / / /
/ / / /
/ / / /
/ / / /
/ / / /
/ / / /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
/ /
Best Blast hits
emb|CAO21886.1| unnamed protein product [Vitisvinifera]/note="SPRY domain. SPRY Domain is named from SPla and the
RYanodine Receptor. Domain of unknown function. Distant
homologues are domains in butyrophilin/marenostrin/pyrin
homologues; cl02614"
Length=389
Score = 322 bits (825), Expect = 7e-86, Method: Compositional matrix adjust.
Identities = 161/271 (59%), Positives = 197/271 (72%), Gaps = 4/271 (1%)
Query 131 NTNVWSKSTTRKSSKKGGSKTQAKALENENSVYLNPVVP-KIEDGPDLPVLLSKFQKAEK 189
+ NVW+KST+RK KK + E+ + P P K +D PD+ + LSK KAEK
Sbjct 104 SNNVWTKSTSRKGKKKAKATANNAPTEDPVLITPVPRFPDKNDDAPDMKICLSKIYKAEK 163
Query 190 VELSADQLSAGSIKGYRMVRATRGVVEGAWYFEITVEHLGKTGHTRLGWCTQKGDVQAPV 249
VELS D++ A S KGYRMVRATRGVVEGAWYFEI V LG+TGHTRLGW T+KGD+QAPV
Sbjct 164 VELSDDRMRAASGKGYRMVRATRGVVEGAWYFEIRVLKLGETGHTRLGWSTEKGDLQAPV 223
Query 250 GYDSHGYGYRDLEGSKVHAALREPY-GEAYVEGDTIGFYINLPNGAALAPKPPEIVSFKG 308
GYD++ +GYRD++G+KVH ALRE Y GE YVEGD IGFYINLP+GA APKPP +V +KG
Sbjct 224 GYDANSFGYRDIDGTKVHKALRETYGGEGYVEGDVIGFYINLPDGAMYAPKPPHLVWYKG 283
Query 309 LPYTA--ETKEEPLKPLPGGEIVFFRNGVYQGCAYKDIYAGRYFPAASMYTLPNEPNCTV 366
Y +TKE+P K +PG EI FF+NGV QG A+KD+ GRY+PAASMYTLPN+P+C V
Sbjct 284 QRYVCATDTKEDPPKVVPGSEISFFKNGVCQGVAFKDLCGGRYYPAASMYTLPNQPHCVV 343
Query 367 RFNFGPDFAFPITDWKDHTPPQPMSAAPFAG 397
+FNFGPDF F + P+PM P+ G
Sbjct 344 KFNFGPDFEFFPEELNGRPVPRPMIEVPYHG 374
ref|NP_175556.1|SPla/RYanodine receptor (SPRY) domain-containing protein [Arabidopsis
thaliana]
gb|AAG52633.1|AC024261_20 unknown protein; 66348-64527 [Arabidopsis thaliana]
gb|AAY56421.1| At1g51450 [Arabidopsis thaliana]
dbj|BAF01046.1| hypothetical protein [Arabidopsis thaliana]
Length=509
GENE ID: 841570 AT1G51450 | SPla/RYanodine receptor (SPRY) domain-containing
protein [Arabidopsis thaliana]
Score = 314 bits (805), Expect = 1e-83, Method: Compositional matrix adjust.
Identities = 157/271 (57%), Positives = 189/271 (69%), Gaps = 9/271 (3%)
Query 134 VWSKSTTRKSSKKGGSKTQAKALENENSVYLNPVVPKI----EDGPDLPVLLSKFQKAEK 189
VW +TRK KK + T A E+ V + PV P+ +D PDL + LSK KAEK
Sbjct 226 VWVTKSTRKGKKKSKANTPNPAA-VEDKVLITPV-PRFPDKGDDTPDLEICLSKVYKAEK 283
Query 190 VELSADQLSAGSIKGYRMVRATRGVVEGAWYFEITVEHLGKTGHTRLGWCTQKGDVQAPV 249
VE+S D+L+AGS KGYRMVRATRGVVEGAWYFEI V LG+TGHTRLGW T KGD+QAPV
Sbjct 284 VEISEDRLTAGSSKGYRMVRATRGVVEGAWYFEIKVLSLGETGHTRLGWSTDKGDLQAPV 343
Query 250 GYDSHGYGYRDLEGSKVHAALREPYG-EAYVEGDTIGFYINLPNGAALAPKPPEIVSFKG 308
GYD + +G+RD++G K+H ALRE Y E Y EGD IGFYINLP+G + APKPP V +KG
Sbjct 344 GYDGNSFGFRDIDGCKIHKALRETYAEEGYKEGDVIGFYINLPDGESFAPKPPHYVFYKG 403
Query 309 LPYTA--ETKEEPLKPLPGGEIVFFRNGVYQGCAYKDIYAGRYFPAASMYTLPNEPNCTV 366
Y + KEEP K +PG EI FF+NGV QG A+ DI GRY+PAASMYTLP++ NC V
Sbjct 404 QRYICAPDAKEEPPKVVPGSEISFFKNGVCQGAAFTDIVGGRYYPAASMYTLPDQSNCLV 463
Query 367 RFNFGPDFAFPITDWKDHTPPQPMSAAPFAG 397
+FNFGP F F D+ P+PM P+ G
Sbjct 464 KFNFGPSFEFFPEDFGGRATPRPMWEVPYHG 494
ref|NP_569028.1| unknown protein [Arabidopsis thaliana]
dbj|BAB10414.1| unnamed protein product [Arabidopsis thaliana]
gb|AAM61578.1| unknown [Arabidopsis thaliana]
Length=210
GENE ID: 836741 AT5G66090 | hypothetical protein [Arabidopsis thaliana]
(10 or fewer PubMed links)
Score = 197 bits (502), Expect = 2e-48, Method: Compositional matrix adjust.
Identities = 88/129 (68%), Positives = 112/129 (86%), Gaps = 2/129 (1%)
Query 538 QHVLLPIIDKNPYLSDSTRQAAATATSLAKKYGAKITVVVIDEEKKEK--DYEQRLQTIR 595
+H+LLP+ID+NPYLS+ TRQAAAT TSLAKKYGA ITVVVIDEEK+E ++E ++ IR
Sbjct 82 KHLLLPVIDRNPYLSEGTRQAAATTTSLAKKYGADITVVVIDEEKRESSSEHETQVSNIR 141
Query 596 WHLEEGGIQDYGMLEKIGEGKKAAVVIGEVADDMGLDLVVLSMECIHSKHIDGNLLAEFV 655
WHL EGG +++ +LE++GEGKKA +IGEVAD++ ++LVV+SME IHSK+ID NLLAEF+
Sbjct 142 WHLSEGGFEEFKLLERLGEGKKATAIIGEVADELKMELVVMSMEAIHSKYIDANLLAEFI 201
Query 656 PCPVLLLPL 664
PCPVLLLPL
Sbjct 202 PCPVLLLPL 210
Blastp using TAIR database
Scores Sequences producing significant alignments: (bits) Value
ref|NP_175556.1| SPla/RYanodine receptor (SPRY) domain-cont... 285 7e-77
ref|NP_569028.1| unknown protein [Arabidopsis thaliana] 187 2e-47
ref|NP_850020.1| zinc finger (C3HC4-type RING finger) famil... 54 4e-07
ref|NP_192672.2| SPla/RYanodine receptor (SPRY) domain-cont... 38 0.021
ref|NP_973963.1| SPla/RYanodine receptor (SPRY) domain-cont... 37 0.055
ref|NP_174777.2| SPla/RYanodine receptor (SPRY) domain-cont... 37 0.058
ref|NP_176888.2| unknown protein [Arabidopsis thaliana] 31 2.3
ref|NP_190612.3| unknown protein [Arabidopsis thaliana] 30 4.7
ref|NP_174172.1| unknown protein [Arabidopsis thaliana] 29 8.4
ref|NP_175556.1|SPla/RYanodine receptor (SPRY) domain-containing protein
[Arabidopsis thaliana]
Length = 509
Score = 285 bits (729), Expect = 7e-77, Method: Composition-based stats.
Identities = 148/245 (60%), Positives = 177/245 (72%), Gaps = 6/245 (2%)
Query: 159 ENSVYLNPVV---PKIEDGPDLPVLLSKFQKAEKVELSADQLSAGSIKGYRMVRATRGVV 215
E+ V + PV K +D PDL + LSK KAEKVE+S D+L+AGS KGYRMVRATRGVV
Sbjct: 250 EDKVLITPVPRFPDKGDDTPDLEICLSKVYKAEKVEISEDRLTAGSSKGYRMVRATRGVV 309
Query: 216 EGAWYFEITVEHLGKTGHTRLGWCTQKGDVQAPVGYDSHGYGYRDLEGSKVHAALREPYG 275
EGAWYFEI V LG+TGHTRLGW T KGD+QAPVGYD + +G+RD++G K+H ALRE Y
Sbjct: 310 EGAWYFEIKVLSLGETGHTRLGWSTDKGDLQAPVGYDGNSFGFRDIDGCKIHKALRETYA 369
Query: 276 -EAYVEGDTIGFYINLPNGAALAPKPPEIVSFKGLPY--TAETKEEPLKPLPGGEIVFFR 332
E Y EGD IGFYINLP+G + APKPP V +KG Y + KEEP K +PG EI FF+
Sbjct: 370 EEGYKEGDVIGFYINLPDGESFAPKPPHYVFYKGQRYICAPDAKEEPPKVVPGSEISFFK 429
Query: 333 NGVYQGCAYKDIYAGRYFPAASMYTLPNEPNCTVRFNFGPDFAFPITDWKDHTPPQPMSA 392
NGV QG A+ DI GRY+PAASMYTLP++ NC V+FNFGP F F D+ P+PM
Sbjct: 430 NGVCQGAAFTDIVGGRYYPAASMYTLPDQSNCLVKFNFGPSFEFFPEDFGGRATPRPMWE 489
Query: 393 APFAG 397
P+ G
Sbjct: 490 VPYHG 494
ref|NP_569028.1| unknown protein [Arabidopsis thaliana]
Length = 210
Score = 187 bits (475), Expect = 2e-47, Method: Composition-based stats.
Identities = 88/129 (68%), Positives = 112/129 (86%), Gaps = 2/129 (1%)
Query: 538 QHVLLPIIDKNPYLSDSTRQAAATATSLAKKYGAKITVVVIDEEKKEK--DYEQRLQTIR 595
+H+LLP+ID+NPYLS+ TRQAAAT TSLAKKYGA ITVVVIDEEK+E ++E ++ IR
Sbjct: 82 KHLLLPVIDRNPYLSEGTRQAAATTTSLAKKYGADITVVVIDEEKRESSSEHETQVSNIR 141
Query: 596 WHLEEGGIQDYGMLEKIGEGKKAAVVIGEVADDMGLDLVVLSMECIHSKHIDGNLLAEFV 655
WHL EGG +++ +LE++GEGKKA +IGEVAD++ ++LVV+SME IHSK+ID NLLAEF+
Sbjct: 142 WHLSEGGFEEFKLLERLGEGKKATAIIGEVADELKMELVVMSMEAIHSKYIDANLLAEFI 201
Query: 656 PCPVLLLPL 664
PCPVLLLPL
Sbjct: 202 PCPVLLLPL 210
Multalign to Arabidopsis SPRY protein
The Blast results show that known regions only match up to about 200 AA to the gene prediction models. Most of the unknown proteins hit to a SPla/RYanodine receptor (SPRY) protein. When aligned, there is good sequence similarity in certain regions and low sequence similarity to the rest of the protein. The first 85 AA and the last 250 AA do not align to the known protein found in Arabidopsis.
Multalign to Vitisvinifera SPRY protein
This is another SPRY protein hit that turned up in Blastp. This alignment looks better than the Arabidopsis alignment. The first 20 AA and last 250 AA do not align to the known protein. There is one section in the genescan gene prediction (AA 61-68) that does not appear in the Vitis model. This could mean that the genescan model has an additional intron or that a particular intron does not stop when it is supposed to or starts before it is supposed to start.
Multalign to Arabidopsis unknown protein
This unknown protein also showed up in the Blast results. It has pretty good sequence similarity to the unknown protein, particularly at the end of the sequence. In the notes section of the unknown protein refsequence it mentions that /note="similar to Os09g0541700 [Oryza sativa (japonica
cultivar-group)] (GB:NP_001063816.1); similar to unnamed
protein product [Vitisvinifera] (GB:CAO23263.1); contains
domain Adenine nucleotide alpha hydrolases-like
(SSF52402)"
My thoughts are that genescan smo_2 gene model is actually comprised of two genes: a SPRY type gene and a protein of unknown function.