Supplementary Text A

H3: The DNA sequence data allowed clarification of some of the Leu/Ile residue assignmentsin the peptide sequences (Peptides 2,8, 10,11); these two residues have the same mass, and thus cannot be resolved by mass spectrometry (Table 1). The DNA data also allowed the occasional correction of the mass spectrometry data. For peptide 8, residue 4 is a Lys, rather than a Gln; these residues have nearly identical masses and can difficult to distinguish in mass spectrometry. For peptide 11, the N-terminal residues VQD, which were the least confident assignments from mass spectrometry, were LDN in the cDNA-derived sequence. An alternative explanation for such discrepancies would be the presence of multiple isoforms for the protein.

H8: The cDNA-predicted amino acid sequence contained both of the H8 peptides identified by mass spectrometry, and allowed correct assignment of the Leu/Ile residues as Leu. It also allowed the revision of the sequence for Peptide 1, where an extra Thr is present at residue 3. An alternative explanation for such discrepancies would be the presence of multiple isoforms for the protein.

H9: The predicted N-terminal sequence from the cDNA clone differs slightly from Peptide 1 (Table 1) and the Edman data, with the cloned sequence suggesting that residue 5 may be a Ser and that residue 8 may be a Cys. Peptide 3 could also be identified in the cDNA-derived sequence, which allowed its Leu and Ile residues to be resolved from one another.

SupplementaryText B. Ambiguities in peptide sequencing by mass spectrometry

Phenylalanine (F) may be a methionine (M) adduct, methionine sulphoxide, which has almost the same mass. Likewise, glutamine (Q) may be lysine (K), as both residues have very similar masses. Trypsin cleaves C-terminal to arginine (R) and lysine (K) residues, but either can be present internally in a tryptic peptide if digestion is incomplete, or definitely if one of these residues is followed immediately by a proline residue.

Supplementary Table S1.

Amino acid composition of cDNA-predicted polypeptide segments compared with actual composition of H. dofleiniibands.

Band / H3 / H8 / H9
cDNAb / AAAc / cDNAb / AAAc / cDNAb,d / AAAc
% lengtha / 40% / 20% / 100%
Asx / 10.5 / 8.7 / 12.2 / 9.7 / 17.9 / 15.0
Ser / 7.3 / 11.6 / 3.0 / 10.2 / 8.3 / 8.0
Glx / 7.7 / 13.3 / 16.6 / 15.0 / 6.2 / 11.6
Gly / 8.9 / 16.8 / 7.6 / 13.6 / 6.2 / 10.5
His / 3.2 / 1.0 / 1.5 / 0.7 / 2.8 / 1.3
Arg / 3.2 / 5.2 / 3.0 / 5.5 / 4.8 / 4.4
Thr / 5.7 / 5.2 / 3.0 / 4.9 / 1.4 / 4.6
Ala / 10.9 / 7.0 / 4.5 / 6.8 / 9.0 / 8.0
Pro / 3.6 / 4.9 / 3.0 / 6.3 / 4.8 / 3.1
Tyr / 4.5 / 2.5 / 3.0 / 3.0 / 6.2 / 2.4
Val / 6.5 / 5.5 / 6.1 / 5.5 / 7.6 / 5.7
Met / 2.8 / 0.4 / 1.5 / 0.4 / 0 / 5.0
Lys / 6.9 / 4.9 / 4.5 / 5.0 / 5.5 / 7.0
Ile / 2.8 / 3.8 / 9.1 / 3.5 / 4.1 / 4.8
Leu / 8.9 / 7.0 / 13.6 / 7.5 / 4.8 / 5.8
Phe / 3.6 / 2.4 / 4.5 / 2.5 / 4.1 / 2.6
Correlation with composition ofH. dofleinii band (R)e and statistical significance (P)f
R(cDNA) / 0.82 / 0.62 / 0.65
P (cDNA) / < 0.0001 / 0.011 / 0.006
R(SP ave) / 0.89 / 0.88 / 0.89
P (SP ave) / < 0.0001 / < 0.0001 / < 0.0001

a % length is an estimate of the completeness of the clone, i.e. thenumber of amino acid residues predicted from the cDNA segment as a percentage of the total number of residues in the best-matching S. purpuratus protein (H3, H8) or in the complete H. dofleinii ORF (H9).

bAmino acid compositions of cDNA-predicted polypeptides were calculated using

cAmino acid analysis data of Peng et al. (2011).

dExcluding signal peptide.

eSpearman correlation coefficients (R-values) for linear regression of amino acid composition of each H. dofleiniiband against that of the cognate cDNA-derived polypeptide segment (“cDNA”) or against the average composition of the SwissProt database (“SP ave”; Tompa 2002), calculated using GraphPadInStat v3.01.

fSignificance (two-tailed non-parametric P-values) of the correlation of amino acid composition of each H. dofleiniiband with that of the cognate cDNA-derived polypeptide segment (“cDNA”) or with the average composition of the SwissProt database (“SP ave”; Tompa 2002), calculated using GraphPadInstat v3.01.

1

Supp. Table S2. Amino acid composition of best-match S. purpuratus proteins compared with actual composition

ofH. dofleinii bands.

Band a / H2 / H3 / H6 / H7 / H8 / H9
Type / Peptide/nitrate transporter / Transketolase / FBP aldolase / Transaldolase-like / Exportin-6-like / C-type lectin
Species / Spb / Hd / Spb / Hd / Spb / Hd / Spb / Hd / Spb / Hd / Spb / Hd
Asx / 6.9 / 9.7 / 10.1 / 8.7 / 9.9 / 12.1 / 10.3 / 13.1 / 8.5 / 9.7 / 11.3 / 15.0
Ser / 8.3 / 9.8 / 6.3 / 11.6 / 4.4 / 7.0 / 6.7 / 6.5 / 8.8 / 10.2 / 13.8 / 8.0
Glx / 6.5 / 16.2 / 6.9 / 13.3 / 11.3 / 13.2 / 12.1 / 12.7 / 11.8 / 15.0 / 8.1 / 11.6
Gly / 6.6 / 16.1 / 8.2 / 16.8 / 8.5 / 7.8 / 4.8 / 8.1 / 3.8 / 13.6 / 7.5 / 10.5
His / 1.3 / 1.1 / 2.1 / 1.0 / 1.4 / 1.3 / - / n.d. / 2.7 / 0.7 / 4.4 / 1.3
Arg / 3.4 / 3.6 / 4.3 / 5.2 / 3.8 / 4.9 / 3.3 / 4.1 / 3.8 / 5.5 / 1.3 / 4.4
Thr / 10.8 / 3.9 / 5.5 / 5.2 / 6.8 / 5.1 / 4.5 / 5.0 / 4.9 / 4.9 / 6.9 / 4.6
Ala / 9.0 / 7.8 / 13.0 / 7.0 / 12.3 / 9.2 / 10.6 / 9.4 / 4.4 / 6.8 / 3.1 / 8.0
Pro / 4.4 / 3.2 / 4.5 / 4.9 / 4.7 / 3.9 / 2.4 / 3.2 / 4.4 / 6.3 / 3.8 / 3.1
Tyr / 4.2 / 1.8 / 2.7 / 2.5 / 3.3 / 2.4 / 2.4 / 1.9 / 2.5 / 3.0 / 5.0 / 2.4
Val / 7.7 / 4.6 / 8.7 / 5.5 / 7.4 / 6.4 / 7.3 / 6.4 / 5.8 / 5.5 / 5.7 / 5.7
Met / - / n.d. / 1.6 / 0.4 / 0.8 / 3.2 / 3.0 / 3.8 / 4.4 / 0.4 / 1.9 / 5.0
Lys / 3.1 / 8.8 / 5.8 / 4.9 / 6.0 / 8.3 / 10.0 / 9.4 / 4.7 / 5.0 / 3.8 / 7.0
Ile / 7.6 / 4.1 / 5.9 / 3.8 / 4.4 / 5.6 / 6.4 / 5.3 / 8.2 / 3.5 / 3.8 / 4.8
Leu / 9.7 / 7.0 / 7.6 / 7.0 / 9.6 / 7.3 / 10.3 / 8.3 / 13.4 / 7.5 / 5.7 / 5.8
Phe / 5.1 / 2.4 / 3.7 / 2.4 / 2.7 / 2.5 / 2.7 / 3.0 / 6.8 / 2.5 / 3.8 / 2.6
Correlation with composition of H. dofleinii band (R) and statistical significance (P)
R (Echin) / 0.39 / 0.83 / 0.89 / 0.94 / 0.45 / 0.50
P (Echin) / 0.15 / < 0.0001 / < 0.0001 / < 0.0001 / 0.083 / 0.047
R (SwPrtave) / 0.90 / 0.89 / 0.94 / 0.92 / 0.88 / 0.89
P (SwPrtave) / < 0.0001 / < 0.0001 / < 0.0001 / < 0.0001 / < 0.0001 / < 0.0001

a H4 is omitted because no amino acid analysis data is available for this band. Amino acid compositions of S. purpuratus proteins (“Sp”) were calculated from translated ORF sequences using amino acid analysis data for H. dofleinii proteins (Hd) are from Peng et al. (2011).

1

Supp. Fig. S1. SignalP v4.1 predicition for cDNA-derived protein corresponding to H9.

The server is at Parameters are calculated as follows:

C-score (raw cleavage site score)

The output from the CS networks, which are trained to distinguish signal peptide cleavage sites from everything else. Note the position numbering of the cleavage site: the C-score is trained to be high at the position immediately after the cleavage site (the first residue in the mature protein).

S-score (signal peptide score)

The output from the SP networks, which are trained to distinguish positions within signal peptides from positions in the mature part of the proteins and from proteins without signal peptides.

Y-score (combined cleavage site score)

A combination (geometric average) of the C-score and the slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep.

The graphical output from SignalP (above) shows the three different scores, C, S and Y, for each position in the sequence.

Parameter summary:

Measure Position Value Cutoff Signal peptide?

max. C 21 0.548

max. Y 21 0.699

max. S 17 0.935

mean S 1-20 0.891

D 1-20 0.803 0.450 YES

Cleavage site between pos. 20 and 21: NRA-ED D=0.803 D-cutoff=0.450 Networks=SignalP-noTM

In the parameter summary, the maximal values of the three scores are reported. In addition, the following two scores are shown:

mean S

The average S-score of the possible signal peptide (from position 1 to the position immediately before the maximal Y-score).

D-score (discrimination score)

A weighted average of the mean S and the max.Y scores. This is the score that is used to discriminate signal peptides from non-signal peptides.

For non-secretory proteins all the scores represented in the SignalP output should ideally be very low (close to the negative target value of 0.1).

SpEchL2 ----MFNKIVTILVIASSAVMLPVLPGCQAGGCGCPPLWTAFQNNCYRYFSVKNITWHGA 56

TfCL MLLFLFLFGLALGAVAPSGVDQEHIEALLQ--VNCPLLWISFNNHCYKYVSNR-MTWVDA 57

Hd band 9 -----MASKFAVLLLLYTMVVVNRAEDKVS--FGCPKDWYAYNDNCYHYNAERRFTSTSG 53

: .:: : : * .** * :::::**:* : : :* ..

Consensus CYkff ktWdA

SpEchL2 EMHCSGFSVPCSDVDSTISLGHLTSIHSKEEMTFLSVLYESIRSKVVTSTTYVWIGLHDQ 116

TfCL ELHCVSQDA------NLVSIHSLEEHNFVKALIK----KSDITEERTWIGLSDI 101

Hd band 9 ASYCRDHGA------KLLYISSIEEFKFAGAIASS--RDADVLIPSLHIGLNNN 99

:* . .. :* * * ** .* .: . . *** :

Consensus e fC LVSI S eE Fl WIGL

SpEchL2 TTEDSWEWSDGSSL----DYEIWEPGQPSSHNGNQDCVMFFSSNKYKWNDLACDSDGDTA 172

TfCL HKEGTWMWSDGSKV----DFVTWNQGQPDNHLANENCVHTNYYIAKKWNDALCSELYAFV 157

Hd band 9 AREGDFVWSDGKKLSQAPNVVVWEPNQPNDLGHNQNCVVYRINN-YNVNDAPCSYVAGVF 158

*. : ****..: : *: .**.. *::** : ** *.

Consensus w WsDGs y nW gePnn e CV l g Wnd C fI

SpEchL2 YHGSAYVCKLPQW 185

TfCL CQSRTVFS----- 165

Hd band 9 CKKPRAN------165

:

Consensus Ck

Supp. Fig. S2. Multiple sequence alignment of C-lectins with the polypeptide from H. dofleinii band 9 cDNA.TfCL is a C-type lectin from the fish Trachidermusfasciatus (AFW17073.1); SpEchL2 is echinoidin-like isoform 2 from the holothurian S. purpuratus(XP_003726044.1); Consensus shows conserved motifs taken from the sequence logo for C-type lectin class 2 ( to which these sequences conform better than to class 1; capitals indicate more strongly conserved residues. The alignment was performed using Clustal-W2 at * identical : highly conserved .conserved. The invariant cysteines are highlighted in cyan (the pair that form one disulphide bond) and magenta (the pair that form the other). Predicted ligand-binding residues for the two known proteins (taken from their NCBI RefSeq entries) are shown in red.

Yeast TK 1 MTQFTDIDKLAVSTIRILAVDTVSKANSGHPGAPLGMAPAAHVL-WSQMR 49

:...:.:|..::::.:.|.||||...:..|....|| :..|:

Hd band 3 1 ------EDVANKLREDSIESTTAAGSGHPTTCMSAAEVMSVLFFHTMK 42

Yeast TK 50 MNPTNPDWINRDRFVLSNGHAVALLYSMLHLTGYDLSIEDLKQFRQLGSR 99

.....|.....|||::|.|||..:||:.....|. ..:|:||..|::.|.

Hd band 3 43 YKVDVPKDPANDRFIMSKGHAAPILYAAWAEAGL-FPVENLKNLRKIDSD 91

Yeast TK 100 TPGHPEFELPGVEVTTGPLGQGISNAVGMAMAQANLAATYNKPGFTLSDN 149

..|||...|..|:|.||.||||:|...||| |...... :|.

Hd band 3 92 LEGHPTPRLSFVDVATGSLGQGLSVGAGMA------YTGKYLDKADY 132

Yeast TK 150 YTYVFLGDGCLQEGISSEASSLAGHLKLGNLIAIYDDNKITIDGATSISF 199

.||..||||...||...||.:.|.:.||.||:||:|.|::.....||:..

Hd band 3 133 RTYCLLGDGESAEGSVWEAMAFASYYKLDNLVAIFDVNRLGQSQPTSLQH 182

Yeast TK 200 D-EDVAKRYEAYGWEVLYVENGN--EDLAGIAKAIAQAKLSKDKPTLIKM 246

| |....|.||:|:.. ||.:|: ||| |||:..|...||||:.|..

Hd band 3 183 DMETYRLRCEAFGFNT-YVVDGHSVEDL---AKALHDASTVKDKPSCILA 228

Yeast TK 247 TTTIGYGSLHAGSHSV---HGAPLKADDVKQLKSKFGFNPDKSFVVPQEV… 293

.|..|.|: .|...: ||

Hd band 3 229 KTYKGKGA--KGIEDLEGWHG------247

Supp. Fig. 3.Sequence alignment of yeast transketolase (partial) with polypeptide fromH. dofleinii band 3 cDNA. “Yeast TK” is the N-terminal portion of Saccharomyces cerevisiae transketolase (AAA35168.1)with catalytically essential residues highlighted in red and residues involved in cofactor or metal binding highlighted in cyan. These are as identified in Singleton et al. [1996, Biochemistry 35, 15865-15869], Wikner et al. [1997 Eur. J. Biochem. 233, 750-755] and Gerhardt et al. [2003,Plant Physiology132, 1941–1949]. Sequence alignment is by EMBOSS Needle at key: “|” identical, “:” highly conserved; “.” conserved.

1