Analysis of Col1a1 promoter and parcial intron1 sequences in Homo sapiens/Hs, Bos taurus/Bt, Cervus elaphus/Ce ortologues

Table of Contents

Osx motifs 2.

Promoter analysis with Fuzznuc program 2.

Homo sapiens 2.

Bos taurus 4.

Cervus elaphus 6.

Promoter analysis with Dialign program 8.

Partial intronic sequence analysis with Fuzznuc program 20.

Homo sapiens 20.

Bos taurus 21.

Cervus elaphus 22.

Partial intronic sequence analysis with Dialign program 24.
Osx motifs:

y / NNGGGGCGGGGNN / V$SP1_Q4_01

Symbol Binding site Transfac ID

Promoter

########################################

# Program: fuzznuc

# Rundate: Sun 1 Feb 2010 18:29:40

# Commandline: fuzznuc

# -filter

# [-sequence] ../fuzznuc/source/HsPrE1COL1A1_1kE1.fasta

# -complement

# -pattern @SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: HsPrE1COL1A1 from: 1 to: 1223

# HitCount: 31

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: Yes

#

#======

Start End Pattern_name Mismatch Sequence

779 791 V$SP1_Q4_01 2 GCTGGGCTGGGGG

780 792 V$SP1_Q4_01 2 CTGGGCTGGGGGG

787 799 V$SP1_Q4_01 1 GGGGGGCTGGGGA

788 800 V$SP1_Q4_01 2 GGGGGCTGGGGAG

877 889 V$SP1_Q4_01 1 TGGGGGCCGGGCC

878 890 V$SP1_Q4_01 2 GGGGGCCGGGCCA

906 918 V$SP1_Q4_01 2 CTGGGGCACGGGC

937 949 V$SP1_Q4_01 1 GAGGGGCAGGGTT

979 991 V$SP1_Q4_01 2 AAGGGGCCCGGGC

980 992 V$SP1_Q4_01 2 AGGGGCCCGGGCC

1022 1034 V$SP1_Q4_01 2 CGGGGTCGGAGCA

145 133 V$SP1_Q4_01 2 CAGAGGTGGGGAT

168 156 V$SP1_Q4_01 2 AGGGGTGGGGGAT

169 157 V$SP1_Q4_01 1 TAGGGGTGGGGGA

247 235 V$SP1_Q4_01 2 GAGGTGCAGGGTG

263 251 V$SP1_Q4_01 2 ATGGAGTGGGGAG

373 361 V$SP1_Q4_01 2 TGAGGGAGGGGTG

545 533 V$SP1_Q4_01 2 TAAGGGAGGGGCA

602 590 V$SP1_Q4_01 2 TGGGGAGGGGGTC

603 591 V$SP1_Q4_01 1 TTGGGGAGGGGGT

617 605 V$SP1_Q4_01 2 AAGGGTTGGGGGT

644 632 V$SP1_Q4_01 2 GCTGGGTGGGGAG

649 637 V$SP1_Q4_01 2 GGGGAGCTGGGTG

717 705 V$SP1_Q4_01 2 GTGGGTAGGGGTG

856 844 V$SP1_Q4_01 2 AGGGGGAGGAGGA

862 850 V$SP1_Q4_01 2 ATGGAGAGGGGGA

938 926 V$SP1_Q4_01 2 TCGGAGAGGGGGA

1087 1075 V$SP1_Q4_01 2 TGGGGAGGGGGTT

1088 1076 V$SP1_Q4_01 1 CTGGGGAGGGGGT

1094 1082 V$SP1_Q4_01 2 TTGTGGCTGGGGA

1175 1163 V$SP1_Q4_01 2 GGAGGGCGGTGGC

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 31

#------

########################################

# Program: fuzznuc

# Rundate: Sun 1 Feb 2010 18:29:58

# Commandline: fuzznuc

# -filter

# [-sequence] ../fuzznuc/source/BtPrE1COL1A1_1kE1.fasta

# -complement

# -pattern @SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: BtPrE1COL1A1 from: 1 to: 1223

# HitCount: 22

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: Yes

#

#======

Start End Pattern_name Mismatch Sequence

782 794 V$SP1_Q4_01 1 GGGGGGCTGGGGA

783 795 V$SP1_Q4_01 2 GGGGGCTGGGGAG

877 889 V$SP1_Q4_01 1 TGGGGGCCGGGCC

878 890 V$SP1_Q4_01 2 GGGGGCCGGGCCA

907 919 V$SP1_Q4_01 2 TGGGGGACGGGCA

908 920 V$SP1_Q4_01 2 GGGGGACGGGCAG

937 949 V$SP1_Q4_01 1 CGGGGGCAGGGTT

979 991 V$SP1_Q4_01 2 AAGGGGCCCGGGC

980 992 V$SP1_Q4_01 2 AGGGGCCCGGGCC

986 998 V$SP1_Q4_01 2 CCGGGCCGGTGGT

1022 1034 V$SP1_Q4_01 2 CGGGGTCGGAGCA

131 119 V$SP1_Q4_01 2 AAGGGCCGTGGTC

447 435 V$SP1_Q4_01 2 GTGTGGCTGGGGT

667 655 V$SP1_Q4_01 2 TCAGGGAGGGGAC

712 700 V$SP1_Q4_01 2 GTGGGTAGGGGCG

856 844 V$SP1_Q4_01 2 AGGGGGAGGAGGA

938 926 V$SP1_Q4_01 2 CGGGAGAGGGGGA

963 951 V$SP1_Q4_01 2 GGAGAGCGGGGAG

1087 1075 V$SP1_Q4_01 2 TGGGGTGGGGGTT

1088 1076 V$SP1_Q4_01 1 CTGGGGTGGGGGT

1094 1082 V$SP1_Q4_01 2 TTGCGGCTGGGGT

1175 1163 V$SP1_Q4_01 2 GGAGGGCGGTGGC

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 22

#------

########################################

# Program: fuzznuc

# Rundate: Sun 1 Feb 2010 18:30:24

# Commandline: fuzznuc

# -filter

# [-sequence] ../fuzznuc/source/CePrE1COL1A1_1k_Pr_rev.fasta

# -complement

# -pattern @SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: CePrE1COL1A1_1k_Pr_rev from: 1 to: 1175

# HitCount: 25

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: Yes

#

#======

Start End Pattern_name Mismatch Sequence

716 728 V$SP1_Q4_01 1 GGGGGGCTGGGGA

717 729 V$SP1_Q4_01 2 GGGGGCTGGGGAG

811 823 V$SP1_Q4_01 1 TGGGGGCCGGGCC

812 824 V$SP1_Q4_01 2 GGGGGCCGGGCCA

841 853 V$SP1_Q4_01 2 TGGGGGACGGGCA

842 854 V$SP1_Q4_01 2 GGGGGACGGGCAG

871 883 V$SP1_Q4_01 1 CGGGGGCAGGGTT

913 925 V$SP1_Q4_01 2 AAGGGGCCCGGGC

914 926 V$SP1_Q4_01 2 AGGGGCCCGGGCC

920 932 V$SP1_Q4_01 2 CCGGGCCGGTGGT

976 988 V$SP1_Q4_01 2 CGGGGTCGGAGCA

54 42 V$SP1_Q4_01 2 AAGGGCCGTGGTG

203 191 V$SP1_Q4_01 2 GGGGTGAGGGGGG

205 193 V$SP1_Q4_01 2 ATGGGGTGAGGGG

381 369 V$SP1_Q4_01 2 GTGTGGCTGGGGT

473 461 V$SP1_Q4_01 2 CAGGGATGGGGAT

600 588 V$SP1_Q4_01 2 TCAGGGAGGGGAC

646 634 V$SP1_Q4_01 2 GTGGGTAGGGGCG

687 675 V$SP1_Q4_01 2 CTGGGGTGCGGCT

790 778 V$SP1_Q4_01 2 AGGGGGAGGAGGA

872 860 V$SP1_Q4_01 2 CGGGAGAGGGGGA

897 885 V$SP1_Q4_01 2 GGAGAGCGGGGAG

1039 1027 V$SP1_Q4_01 2 TGGGGTGGGGGTT

1040 1028 V$SP1_Q4_01 1 TTGGGGTGGGGGT

1127 1115 V$SP1_Q4_01 2 GGAGGGCGGTGGC

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 25

#------

DIALIGN 2.2.1

*************

Program code written by Burkhard Morgenstern and Said Abdeddaim

e-mail contact:

Published research assisted by DIALIGN 2 should cite:

Burkhard Morgenstern (1999).

DIALIGN 2: improvement of the segment-to-segment

approach to multiple sequence alignment.

Bioinformatics 15, 211 - 218.

For more information, please visit the DIALIGN home page at

http://bibiserv.techfak.uni-bielefeld.de/dialign/

program call: dialign2-2 -n Hs_Bt_Ce_PrE1_1k_20100112.dfasta

Aligned sequences: length:

======

1) HsPrE1COL1A1 1223

2) BtPrE1COL1A1 1223

3) CePrE1COL1A1 1175

Average seq. length: 1207.0

Please note that only upper-case letters are considered to be aligned.

Alignment (DIALIGN format):

======

HsPrE1COL1A1 1 cc------

BtPrE1COL1A1 1 aagtgctctt caatacagtt ctctccagtt tgactgtgct gggtagaagg

CePrE1COL1A1 1 ------

0000000000 0000000000 0000000000 0000000000 0000000000

HsPrE1COL1A1 3 ------C TGGCCACAGC CATGGC------AAACAAAACT

BtPrE1COL1A1 51 gtgtctaaaC AGGCCATGAC CATGGCcACG ACAGACCCTA ACACAAGACC

CePrE1COL1A1 1 ------ACG ACAGACCCTA ACACAAGACC

0000000000 0000000000 0000000222 2222222222 3333333333

HsPrE1COL1A1 30 CTTCTCTAAG TCACCAATGA TCACAGGCCT cccactaaaa ATACTTCCCA

BtPrE1COL1A1 101 CTTTTCTAAA TCACCAGTGA CCACGGCCCT TCCGTGC--- AAACTTACTG

CePrE1COL1A1 24 CTTTTCTAAG TCACGAGTCA CCACGGCCCT TCCGTGC--- AAACTTACTG

3333333333 3333133333 3333333333 2222222000 2222222222

HsPrE1COL1A1 80 ACTCTGGGGT GGAAGAGTtt gggggatgaa tttttagggg attgcaagcc

BtPrE1COL1A1 148 CCCCTAGGga g------GC------

CePrE1COL1A1 71 CTCCTGGGGT GAAAAAATcc taGC------

2222222200 0000000000 0022000000 0000000000 0000000000

HsPrE1COL1A1 130 ccaatcccca cctctgtgtc cctagAATCC CCCACCCCTA CCTTgGCTGC

BtPrE1COL1A1 161 ------AGTTT TCCACCACCA CCCT-GCTGC

CePrE1COL1A1 95 ------AATTT CCCACCACCA CCCT-GCTGC

0000000000 0000000000 0000022222 2222222222 2222022222

HsPrE1COL1A1 180 TCCATCACCC AACcACCAAA GCTTTCTTCT GCAGAGGCCA CCTAGTCAtg

BtPrE1COL1A1 185 CCCATCACTC AAC-ACCAAA GCTTCCTTCT GGCGAGACCG CATACTCAAA

CePrE1COL1A1 119 CCCATCACTC AAC-ACTAAA GCGTCCTTCT GGAGAGACCG CATACTCAAA

2222222222 2220311333 3333333333 3333333333 3333333322

HsPrE1COL1A1 230 TTTCTCACCC TGCACCTCAG CCTCCCCACT C------CaT CTCTCAATCA

BtPrE1COL1A1 234 TTTTTCACTC AGTGCCTCAG CACCCCTCAT CACCCCATCT tTTTCAATCA

CePrE1COL1A1 168 TTTTTCACTC AGTGCCTCAG CAGCCCCCCT CACCCCATCT CTTTCAATCC

2222222111 1111111111 1111111111 1111111111 0222222222

HsPrE1COL1A1 274 TGcCTAGGGT TTGGAGGAAG GCATTTGATT CTGTTCTGGA GCacagcaga

BtPrE1COL1A1 284 TG-CTAGGGT TTAGGGGAGA GTATTTGAGT CTATCCTGGA GCCTCAGGCA

CePrE1COL1A1 218 TG-CTAGGGT TTAGGGGAGA GTATTTGAAT CTATCCTGGA GCCTCAGGCA

2204444444 4444444444 4444444414 4444444444 4433333333

HsPrE1COL1A1 324 ----AGAATT GACATCCTCA AAATTAAAAC tcccttgcct gcacccctcc

BtPrE1COL1A1 333 TGGCAGAATT GACATCCTCA AAACAAAAAC CCACCCTAAG G------

CePrE1COL1A1 267 TGGCAGAATT GACATCCTCA AAACAAAAAC CCACCCTAAG G------

3333555555 5555555555 5555555555 3333333333 3000000000

HsPrE1COL1A1 370 ctcagatATC TGATTCTTAA TGTCTAGAAA GGAATCTgta aaTTGTTCCC

BtPrE1COL1A1 374 ------ATC TGATTCTTAA CGTCTATAAA TGAATCTATC --TTGTTCCC

CePrE1COL1A1 308 ------ATC TGATTCTTAA CGTCTATAAA TGAATCTATG --TTGTTCCC

0000000555 5555555555 5555544444 4444444222 0044444444

HsPrE1COL1A1 420 CAAATATTCC TAAGCTCCAT CCCCtAGCCA CACCAGAAGA CACCCCCAAA

BtPrE1COL1A1 415 CAAATATTCC TAAGTTCCAT ACCCCAGCCA CACCAGAAGA CACCCCTAAA

CePrE1COL1A1 349 CAAATATTCC TACATTCCAT ACCCCAGCCA CACCAGAAGA CACCCCTAAA

4444444444 4422555555 5555388888 8888888888 8888888888

HsPrE1COL1A1 470 CAGGCACATC TTTTT-AATT CCCAGCTTCC TCTGTTTTGG AGAGGTCCTC

BtPrE1COL1A1 465 CAGGCACATC TTTTTAAATT CTCTGGTTCT CCTGCTTTAA AGTGGTCCCC

CePrE1COL1A1 399 CAGGCGCATC TTTTTAAATT CTCTGGTCCT CCCGCTTTAA AGTGGTCCCC

8888556666 6666612222 2222222222 2203333333 3333333333

HsPrE1COL1A1 519 AGCaTGCCTC TTTATGCCCC TCCCTTAGCT CTTGCca-GG ATATCAGagg

BtPrE1COL1A1 515 AGC-TGCCTC TCCATACCCA TCCCTGAGCT CTTGCTCTGG ATGTCAGGAG

CePrE1COL1A1 449 AGC-TGCCTC TCCATCCCCA TCCCTGAGCT CTTGCTCTGG ATATCAGGAG

3330444444 4444444444 4444333333 3333322222 2222222222

HsPrE1COL1A1 568 gtgactgggG -CACAGCCAG GAGGACCCCC TCCCCAACAC CCccaac--C

BtPrE1COL1A1 564 GGTGAACAAG -CAGGGCGAG CAGGACCCTT TCCCAATCAC TCaaAGAGTC

CePrE1COL1A1 498 GGTGAACAAG aCATGGCGGG CAGGACCCTC TCCCAATCAC CC--AGAGTC

2222222222 0111111111 1111111111 1111111111 0000222224

HsPrE1COL1A1 615 CTTCCACCTT TGGAAGTCTC CCCACCCAGC TCCCCAGTTc cccagttcca

BtPrE1COL1A1 613 CTTCCATCTT TGAAAGCTTC CCTAACCAGC TCCCCAATTC CAGTCCCCTC

CePrE1COL1A1 546 CTTCCGTCTT TGAAAGCCTC CCTAACCAGC TCCACAATTC CAGTCCCCTC

4444444444 4444444444 4444444444 4442444441 1111111111

HsPrE1COL1A1 665 cttcttctaG ATTGGAGG-T CCCAGGAAGA GAGCAG-AGG GGCACCCCTA

BtPrE1COL1A1 663 CCTGA----G ACTGGGGG-T CCAAGAAAGA AAGCCGAAGA GGCGCCCCTA

CePrE1COL1A1 596 CCTGA----G ACTGGGGGgT CCTAGAAAGA AAGCCGAAGA GGCGCCCCTA

1111000022 2222222200 0003333333 3333333444 4444444444

HsPrE1COL1A1 713 CCCACTGGTT AGCCCACgcc attcTGAGGA CCCAGCTGCA CCCCtaccaC

BtPrE1COL1A1 708 CCCACTATTT AGCCGACAGT ATGTTGAGGA CCTAGCTGCA CCCCAGCGGC

CePrE1COL1A1 642 CCCACTATTT AGCCGACAGT ATGTTGAGGA CCTAGCCGCA CCCCAGCGGC

4444444444 4444444222 2222333333 3333333333 3333222225

HsPrE1COL1A1 763 AGCACCTCTG GCCCAGGCTG GGCTGGGGGG CTGGGGAGGC AGAGCTGCGA

BtPrE1COL1A1 758 AGCATCTCTG GCCCAGACTA AACTGGGGGG CTGGGGAGGC AGAGTTGCGA

CePrE1COL1A1 692 AGCATCTCTG GCCCAGACTG AACTGGGGGG CTGGGGAGGC AGAGCCGCGA

5555555555 5555555555 5555555555 5555666666 6664447777

HsPrE1COL1A1 813 AGAGGGGAGA TGTGGGGTGG ACTCCC---- -TTCCCTCCT CCTCCCCCTC

BtPrE1COL1A1 808 AGGGGGGAGA TGTGCGGTGG ACTCCCTTTC CTTCCCTCCT CCTCCCCCTC

CePrE1COL1A1 742 AGGGGGGAGA TGTGCGGTGG ACTCCCTTTC CTTCCCTCCT CCTCCCCCTC

7777777777 7777777777 7777553333 3666666666 6666666666

HsPrE1COL1A1 858 TCCATTCCAA CTCCCAAATT GGGGGCCGGG CCAGGCAGCT CTGATTGGCT

BtPrE1COL1A1 858 TCGGTTCCGA CTCCCAAATT GGGGGCCGGG CCAGGCAACT CTGATTGGCT

CePrE1COL1A1 792 TCGGTTCCGA CTCCCAAATT GGGGGCCGGG CCAGGCAACT CTGATTGGCT

6655777777 7777777777 7777777777 7777777577 7777777777

HsPrE1COL1A1 908 GGGGCACGGG CGGCCGGCTC CCCCTCTCCG AGGGGCAGGG TTCCTCCCTG

BtPrE1COL1A1 908 GGGGGACGGG CAGCCGGCTC CCCCTCTCCC GGGGGCAGGG TTCCTCCCCG

CePrE1COL1A1 842 GGGGGACGGG CAGCCGGCTC CCCCTCTCCC GGGGGCAGGG TTCCTCCCCG

7777777777 7577777777 7777777744 4666666666 6666666646

+1

HsPrE1COL1A1 958 CTCTCCATCA GGACAGTATA AAAGGGGCCC GGGCCAGTCG TCGGAGC---

BtPrE1COL1A1 958 CTCTCCATCA GGATGGTATA AAAGGGGCCC GGGCCGGTGG TCGGAGC---

CePrE1COL1A1 892 CTCTCCGTCA GGATGGTATA AAAGGGGCCC GGGCCGGTGG TCGGAGCaga

6666663666 6664466666 6666666666 6666666647 7777777000

HsPrE1COL1A1 1005 ------AGA CGGGAGTTTC TCCTCGGGGT CGGAGCAGGA

BtPrE1COL1A1 1005 ------AGA CGGGAGTTTC TCCTCGGGGT CGGAGCAGGA

CePrE1COL1A1 942 cgggagtttc tcctcgcAGA CGGGAGTTTC TCCTCGGGGT CGGAGCAGGA

0000000000 0000000555 9999999999 9999999999 9999999999

HsPrE1COL1A1 1038 GGCACGCGGA GTGTGAGGCC ACGCATGAGC GGACGCTAAC CCCCTCCCCA

BtPrE1COL1A1 1038 GGCACGCGGA GTGTGAGGCC ACGCATGAGC GGACGCTAAC CCCCACCCCA

CePrE1COL1A1 992 GGCACGCGGA --GTGAGGCC ACGCATGAGC GGACGCTAAC CCCCACCCCA

9999999555 2288888888 8888888888 8888888887 7777777777

HsPrE1COL1A1 1088 GCCACAAAGA GTCTACATGT CTAGGGTCTA GACATGTTCA GCTTTGTGGA

BtPrE1COL1A1 1088 GCCGCAAAGA GTCTACATGT CTAGGGTCTA GACATGTTCA GCTTTGTGGA

CePrE1COL1A1 1040 aCCGCAAAGA GTCTACATGT CTAGGGTCTA GACATGTTCA GCTTTGTGGA

2555888888 8888888888 8888888999 9999999999 9999999999

HsPrE1COL1A1 1138 CCTCCGGCTC CTGCTCCTCT TAGCGGCCAC CGCCCTCCTG ACGCACGGCC

BtPrE1COL1A1 1138 CCTCCGGCTC CTGCTCCTCT TAGCGGCCAC CGCCCTCCTG ACGCACGGCC

CePrE1COL1A1 1090 CCTCCGGCTC CTGCTCCTCT TAGCGGCCAC CGCCCTCCTG ACGCACGGCC

9999999999 9999999999 9999999999 9888666666 6666666666

HsPrE1COL1A1 1188 AAGAGGAAGG CCAAGTCGAG GGCCAAGACG AAGACA

BtPrE1COL1A1 1188 AAGAGGAGGG CCAGGAAGAA GGCCAAGAAG AAGACA

CePrE1COL1A1 1140 AAGAGGAGGG CCAAGAAGAA GGCCAAGAAG AAGACA

6666666344 4442311222 2222222222 222222

Alignment (FASTA format):

======

>HsPrE1COL1A1

cc------

------CTGGCCACAGCCATGGC------AAACAAAACT

CTTCTCTAAGTCACCAATGATCACAGGCCTcccactaaaaATACTTCCCA

ACTCTGGGGTGGAAGAGTttgggggatgaatttttaggggattgcaagcc

ccaatccccacctctgtgtccctagAATCCCCCACCCCTACCTTgGCTGC

TCCATCACCCAACcACCAAAGCTTTCTTCTGCAGAGGCCACCTAGTCAtg

TTTCTCACCCTGCACCTCAGCCTCCCCACTCCa------TCTCTCAATCA

TGcCTAGGGTTTGGAGGAAGGCATTTGATTCTGTTCTGGAGCacagcaga

----AGAATTGACATCCTCAAAATTAAAACtcccttgcctgcacccctcc

ctcagatATCTGATTCTTAATGTCTAGAAAGGAATCTgtaaaTTGTTCCC

CAAATATTCCTAAGCTCCATCCCCtAGCCACACCAGAAGACACCCCCAAA

CAGGCACATCTTTTT-AATTCCCAGCTTCCTCTGTTTTGGAGAGGTCCTC

AGCaTGCCTCTTTATGCCCCTCCCTTAGCTCTTGCca-GGATATCAGagg

gtgactgggG-CACAGCCAGGAGGACCCCCTCCCCAACACCCccaac--C

CTTCCACCTTTGGAAGTCTCCCCACCCAGCTCCCCAGTTccccagttcca

cttcttctAGATTGGAGG-TCCCAGGAAGAGAGCAG-AGGGGCACCCCTA

CCCACTGGTTAGCCCACgccattcTGAGGACCCAGCTGCACCCCtaccaC

AGCACCTCTGGCCCAGGCTGGGCTGGGGGGCTGGGGAGGCAGAGCTGCGA

AGAGGGGAGATGTGGGGTGGACTCCC-----TTCCCTCCTCCTCCCCCTC

TCCATTCCAACTCCCAAATTGGGGGCCGGGCCAGGCAGCTCTGATTGGCT

GGGGCACGGGCGGCCGGCTCCCCCTCTCCGAGGGGCAGGGTTCCTCCCTG

CTCTCCATCAGGACAGTATAAAAGGGGCCCGGGCCAGTCGTCGGAGC---

------AGACGGGAGTTTCTCCTCGGGGTCGGAGCAGGA

GGCACGCGGAGTGTGAGGCCACGCATGAGCGGACGCTAACCCCCTCCCCA

GCCACAAAGAGTCTACATGTCTAGGGTCTAGACATGTTCAGCTTTGTGGA

CCTCCGGCTCCTGCTCCTCTTAGCGGCCACCGCCCTCCTGACGCACGGCC

AAGAGGAAGGCCAAGTCGAGGGCCAAGACGAAGACA

>BtPrE1COL1A1

aagtgctcttcaatacagttctctccagtttgactgtgctgggtagaagg

gtgtctaaaCAGGCCATGACCATGGCcACGACAGACCCTAACACAAGACC

CTTTTCTAAATCACCAGTGACCACGGCCCTTCCGTGC---AAACTTACTG

CCCCTAGGgag------GC------

------AGTTTTCCACCACCACCCT-GCTGC

CCCATCACTCAAC-ACCAAAGCTTCCTTCTGGCGAGACCGCATACTCAAA

TTTTTCACTCAGTGCCTCAGCACCCCTCATCACCCCATCTtTTTCAATCA

TG-CTAGGGTTTAGGGGAGAGTATTTGAGTCTATCCTGGAGCCTCAGGCA

TGGCAGAATTGACATCCTCAAAACAAAAACCCACCCTAAGG------

------ATCTGATTCTTAACGTCTATAAATGAATCTATC--TTGTTCCC

CAAATATTCCTAAGTTCCATACCCCAGCCACACCAGAAGACACCCCTAAA

CAGGCACATCTTTTTAAATTCTCTGGTTCTCCTGCTTTAAAGTGGTCCCC

AGC-TGCCTCTCCATACCCATCCCTGAGCTCTTGCTCTGGATGTCAGGAG

GGTGAACAAG-CAGGGCGAGCAGGACCCTTTCCCAATCACTCaaAGAGTC

CTTCCATCTTTGAAAGCTTCCCTAACCAGCTCCCCAATTCCAGTCCCCTC

CCTG----AGACTGGGGG-TCCAAGAAAGAAAGCCGAAGAGGCGCCCCTA

CCCACTATTTAGCCGACAGTATGTTGAGGACCTAGCTGCACCCCAGCGGC

AGCATCTCTGGCCCAGACTAAACTGGGGGGCTGGGGAGGCAGAGTTGCGA

AGGGGGGAGATGTGCGGTGGACTCCCTTTCCTTCCCTCCTCCTCCCCCTC

TCGGTTCCGACTCCCAAATTGGGGGCCGGGCCAGGCAACTCTGATTGGCT

GGGGGACGGGCAGCCGGCTCCCCCTCTCCCGGGGGCAGGGTTCCTCCCCG

CTCTCCATCAGGATGGTATAAAAGGGGCCCGGGCCGGTGGTCGGAGC---

------AGACGGGAGTTTCTCCTCGGGGTCGGAGCAGGA

GGCACGCGGAGTGTGAGGCCACGCATGAGCGGACGCTAACCCCCACCCCA

GCCGCAAAGAGTCTACATGTCTAGGGTCTAGACATGTTCAGCTTTGTGGA

CCTCCGGCTCCTGCTCCTCTTAGCGGCCACCGCCCTCCTGACGCACGGCC

AAGAGGAGGGCCAGGAAGAAGGCCAAGAAGAAGACA

>CePrE1COL1A1

------

------ACGACAGACCCTAACACAAGACC

CTTTTCTAAGTCACGAGTCACCACGGCCCTTCCGTGC---AAACTTACTG

CTCCTGGGGTGAAAAAATcctaGC------

------AATTTCCCACCACCACCCT-GCTGC

CCCATCACTCAAC-ACTAAAGCGTCCTTCTGGAGAGACCGCATACTCAAA

TTTTTCACTCAGTGCCTCAGCAGCCCCCCTCACCCCATCTCTTTCAATCC

TG-CTAGGGTTTAGGGGAGAGTATTTGAATCTATCCTGGAGCCTCAGGCA

TGGCAGAATTGACATCCTCAAAACAAAAACCCACCCTAAGG------

------ATCTGATTCTTAACGTCTATAAATGAATCTATG--TTGTTCCC

CAAATATTCCTACATTCCATACCCCAGCCACACCAGAAGACACCCCTAAA

CAGGCGCATCTTTTTAAATTCTCTGGTCCTCCCGCTTTAAAGTGGTCCCC

AGC-TGCCTCTCCATCCCCATCCCTGAGCTCTTGCTCTGGATATCAGGAG

GGTGAACAAGaCATGGCGGGCAGGACCCTCTCCCAATCACCC--AGAGTC

CTTCCGTCTTTGAAAGCCTCCCTAACCAGCTCCACAATTCCAGTCCCCTC

CCTG----AGACTGGGGGgTCCTAGAAAGAAAGCCGAAGAGGCGCCCCTA

CCCACTATTTAGCCGACAGTATGTTGAGGACCTAGCCGCACCCCAGCGGC

AGCATCTCTGGCCCAGACTGAACTGGGGGGCTGGGGAGGCAGAGCCGCGA

AGGGGGGAGATGTGCGGTGGACTCCCTTTCCTTCCCTCCTCCTCCCCCTC

TCGGTTCCGACTCCCAAATTGGGGGCCGGGCCAGGCAACTCTGATTGGCT

GGGGGACGGGCAGCCGGCTCCCCCTCTCCCGGGGGCAGGGTTCCTCCCCG

CTCTCCGTCAGGATGGTATAAAAGGGGCCCGGGCCGGTGGTCGGAGCaga

cgggagtttctcctcgcAGACGGGAGTTTCTCCTCGGGGTCGGAGCAGGA

GGCACGCGGA--GTGAGGCCACGCATGAGCGGACGCTAACCCCCACCCCA

aCCGCAAAGAGTCTACATGTCTAGGGTCTAGACATGTTCAGCTTTGTGGA

CCTCCGGCTCCTGCTCCTCTTAGCGGCCACCGCCCTCCTGACGCACGGCC

AAGAGGAGGGCCAAGAAGAAGGCCAAGAAGAAGACA

Sequence tree:

======

Tree constructed using UPGMA

(HsPrE1COL1A1:0.001846,

(BtPrE1COL1A1:0.000979,

CePrE1COL1A1:0.000979):0.000867);

Intron

Fuzznuc

########################################

# Program: fuzznuc

# Rundate: Sun 5 Feb 2010 15:47:16

# Commandline: fuzznuc

# -filter

# [-sequence] HsI1_part_COL1A1.fas

# -pattern @../../SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: HsI1_part_COL1A1.seq from: 1 to: 172

# HitCount: 2

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: No

#

#======

Start End Pattern_name Mismatch Sequence

133 145 V$SP1_Q4_01 2 ATGGGGGCGGGAT

134 146 V$SP1_Q4_01 1 TGGGGGCGGGATG

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 2

#------

########################################

# Program: fuzznuc

# Rundate: Sun 5 Feb 2010 15:46:55

# Commandline: fuzznuc

# -filter

# [-sequence] BtI1_part_COL1A1.fas

# -pattern @../../SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: BtI1_part_COL1A1.seq from: 1 to: 173

# HitCount: 2

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: No

#

#======

Start End Pattern_name Mismatch Sequence

59 71 V$SP1_Q4_01 2 TCTGGGCGGGATC

135 147 V$SP1_Q4_01 2 ATGGGGCGGAATC

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 2

#------

########################################

# Program: fuzznuc

# Rundate: Sun 5 Feb 2010 15:47:56

# Commandline: fuzznuc

# -filter

# [-sequence] CeI1COL1A1.fas

# -pattern @../../SP1_1.pat

# Report_format: seqtable

# Report_file: stdout

########################################

#======

#

# Sequence: CeI1COL1A1.seq from: 1 to: 173

# HitCount: 2

#

# Pattern_name Mismatch Pattern

# V$SP1_Q4_01 2 NNGGGGCGGGGNN

#

# Complement: No

#

#======

Start End Pattern_name Mismatch Sequence

59 71 V$SP1_Q4_01 2 TCTGGGCGGGATC

135 147 V$SP1_Q4_01 2 ATGGGGCGGAATC

#------

#------

#------

# Total_sequences: 1

# Total_hitcount: 2

#------

Dialign

DIALIGN 2.2.1

*************

Program code written by Burkhard Morgenstern and Said Abdeddaim

e-mail contact:

Published research assisted by DIALIGN 2 should cite:

Burkhard Morgenstern (1999).

DIALIGN 2: improvement of the segment-to-segment

approach to multiple sequence alignment.

Bioinformatics 15, 211 - 218.

For more information, please visit the DIALIGN home page at

http://bibiserv.techfak.uni-bielefeld.de/dialign/

program call: dialign2-2 -n Hs_Bt_Ce_col1a1_intron.dfasta

Aligned sequences: length:

======

1) HsI1_part_CO 172

2) BtI1_part_CO 173

3) CeI1COL1A1.s 173

Average seq. length: 172.7

Please note that only upper-case letters are considered to be aligned.

Alignment (DIALIGN format):

======

HsI1_part_CO 1 aAGATGTCTA GGTGCTGGAG GTTAGGGTGT CTCCTAATTT TgagGTACAT

BtI1_part_CO 1 GAGATGTCTG GGCGCCGGAG GTTAGGGCGT ACCCTATTTT TACCGTACAT

CeI1COL1A1.s 1 GAGATGTCTG GGCGCCGGAG GTTAGGGCGT ACCCTATTTT TACCGTATAT

4888888888 8888888888 8888888888 8888888888 6222444444

HsI1_part_CO 51 TTCAAGTCTT GGGGGGGCCT CCCtt-CCAA TCAGCCGCTC CCatt-CTCC

BtI1_part_CO 51 TTCAGGTCTC TGGGCGGGAT CCCACGCCAA TCAGCCCCAC CCCATCCTCT

CeI1COL1A1.s 51 TTCAAGTCTC TGGGCGGGAT CtCACGCCAA TCAGCCCCAC CCCATCCTCT

4444444444 4444444444 4043334444 4444444444 4433337777

*1245

HsI1_part_CO 99 TAGCCCCGCC CCCGCCACCC CACCTGCCCA GGGAATgGGG GCGGGATGAG

BtI1_part_CO 101 TAGCCCCGCC CACGCCGTCC CACCTGCCCC GGGAAT-GGG GCGGAATCTG

CeI1COL1A1.s 101 TAGCGCCGCC CACTCCATCC CACCTGCCCC GGGAAT-GGG GCGGAATCTG

7777466666 6666666666 6666666666 6669990888 8888888888

HsI1_part_CO 149 GGCTGGACCT CCCTTCTCTC CTCC

BtI1_part_CO 150 GGTTGAACCT CCCATCTCTC CTCC

CeI1COL1A1.s 150 GGTTGAACCT CCCATCTCTC CTCC

8888888888 8888888888 8888

Alignment (FASTA format):

======

>HsI1_part_CO

aAGATGTCTAGGTGCTGGAGGTTAGGGTGTCTCCTAATTTTgagGTACAT

TTCAAGTCTTGGGGGGGCCTCCCtt-CCAATCAGCCGCTCCCatt-CTCC

TAGCCCCGCCCCCGCCACCCCACCTGCCCAGGGAATgGGGGCGGGATGAG

GGCTGGACCTCCCTTCTCTCCTCC

>BtI1_part_CO

GAGATGTCTGGGCGCCGGAGGTTAGGGCGTACCCTATTTTTACCGTACAT

TTCAGGTCTCTGGGCGGGATCCCACGCCAATCAGCCCCACCCCATCCTCT

TAGCCCCGCCCACGCCGTCCCACCTGCCCCGGGAAT-GGGGCGGAATCTG

GGTTGAACCTCCCATCTCTCCTCC

>CeI1COL1A1.s

GAGATGTCTGGGCGCCGGAGGTTAGGGCGTACCCTATTTTTACCGTATAT

TTCAAGTCTCTGGGCGGGATCtCACGCCAATCAGCCCCACCCCATCCTCT

TAGCGCCGCCCACTCCATCCCACCTGCCCCGGGAAT-GGGGCGGAATCTG

GGTTGAACCTCCCATCTCTCCTCC

Sequence tree:

======

Tree constructed using UPGMA

(HsI1_part_CO:0.012235,

(BtI1_part_CO:0.005867,

CeI1COL1A1.s:0.005867):0.006367);

2