A gene responsible for prolyl-hydroxylation of moss-produced recombinant human erythropoietin

Juliana Parsons1, Friedrich Altmann2, Manuela Graf1, Johannes Stadlmann2, Ralf Reski1,3,4 and Eva L. Decker1*

Supplementary FigureS1: RNA ligase-mediated amplification of 5’ cDNA ends (RACE) of prolyl-4-hydroxylases(P4H)2, 3 and 6. The P4H gene models obtained in which were not supported in the 5’ ends by expressed sequence tag evidence, were analysed via RACE-PCR. Arrows indicate bands which were identified by sequence analysis to contain upstream information of the respective P4H gene.

Supplementary Figure S2: Protein sequence comparison of P. patens putative prolyl-4-hydroxylases (P4Hs), Arabidopsis thaliana P4H1 (AT2G43080.1), Nicotiana tabacum P4H (BAD07294) and the  (I) subunit of the human collagen-P4H (NP_000908).The grey shading is based on the percentage of identity as analysed by the Jalview alignment editor. The residues are coloured according to the percentage of identity in each column (dark grey: >80 %, grey: 60 % and light grey: 40 % identity). The conserved residues responsible for binding Fe2+ and the C-5 carboxyl group of 2-oxoglutarate are marked with asterisks. The first 147 amino acids of the human  (I) subunit, which did not align with any other analysed sequence, are not shown.

Supplementary Figure S3: Schematic representation of the P4Hgene targetingconstructs.On top the general strategy for targeting the 6 P4H genes is outlined. The genomic structures of all P4H genes after integration of the respective knockout constructs are shown below. Exons are represented as rectangles and introns as lines. The genomic target locus is shown in black. The partsof the genes included in the KO constructs are marked in grey and striped rectangles indicatethe selection cassette (zeo-cassette). Restriction sites (RS) used to insert the selection cassette are marked.Black arrowheads indicate oligonucleotides used for screening of genomic integration,grey arrowheads denote primers used to check the absence of the specific transcript and among these the empty arrowheads represent primers homologous toa deleted part of the gene.

Supplementary Figure S4: P4H gene expression analysis in recombinant moss lines.

a) Expression analysis of P4H1, P4H2, P4H3, P4H4, P4H5 and P4H6, respectively, in the putative knockoutplants. As a control for efficient mRNA isolation, RT-PCR was performed with primers corresponding to the constitutively expressed gene for the ribosomal protein L21 (control).

b) Expression analysis of P4H1 in moss wild type (WT), the rhEPO producing line 174.16, and five putative moss lines overexpressing P4H1 (#12, 16, 32, 41 and 45). Semi-quantitative RT-PCR was performed with increasing cycle number (24, 26 and 28) and primers specific for P4H1 as well as a control with primers corresponding to the constitutively expressed gene encoding the TATA-box binding protein TBP.

Supplementary Figure S5: MS/MS analysis of the peptide EAISPPDAASAAPLR (144-158) from moss-produced rhEPO. The upper spectrumwas derived from non-oxidized peptide (m/z 933.45) faithfully showing the partial sequence SPPDAAS. The lower spectrum was derived from one of the two oxidized peptides (m/z 941.45). It gave the apparent partial sequence SPLDAAS, which stands for SPODAAS as Hyp (O) and Leu are isobaric. A second, slightly smaller peak of m/z 941.45 eluted a bit later and probably arose from hydroxylation of the other proline of the hydroxylation motif SPP.

Supplementary Figure S6: Analysis of the hydroxylation status of the N-terminal peptide of moss-produced rhEPO. The N-terminal sequence APP may also constitute a target sequence for moss prolyl-hydroxylase. Therefore the N-terminus of moss-produced rhEPO was analysed by reverse-phase liquid chromatography coupled to electrospray ionization mass spectrometry (LC-ESI-MS) of chymotryptic peptides. Screening for the masses of the non-oxidized and the oxidized peptide APPRLIC*DSRVL (C* standing for carbamidomethyl-cysteine) from rhEPO produced in moss control line 174.16, the knockout P4H1 No.192 and the overexpression line P4H1OE#45 revealed no indication of Pro hydroxylation of this peptide.

Supplementary Table S1: In silico localization prediction of Physcomitrella patens P4Hs using different programs.

P4H / P4H1 / P4H2 / P4H3 / P4H4 / P4H5 / P4H6a / P4H6b
SherLoc / ER / ER / ER / Golgi / ER / secreted / mitochondria
WoLFPSORT / vacuole / plastid / plastid / nucleus / vacuole / cytoplasm / plastid
MultiLoc / mitochondria / plastid / plastid / mitochondria / mitochondria / plastid / mitochondria
TargetP / SP / / / / / mitochondria / mitochondria / mitochondria / mitochondria

Supplementary Table S2: Oligonucleotides used in this work.

gen / oligonucleotide
P4H-GFP construct
P4H1 / fwd: 5´- GGGATGGAGTAATTCTACGAAGC -3´
rev: 5´- AATCAAAGGCTCGCTGCCTCAT -3´
P4H2 / fwd: 5´-GTGATGCGTGATCCTGTGC-3´
rev: 5´-GGCACACATGGCATGCTTTC-3´
P4H3 / fwd: 5´-GGTGTTATGTAGAGATTCGTCACAAC-3´
rev: 5'-GAAATTTGTCAGTGTTGCGAATC-3'
P4H4 / fwd: 5´-GACTCGGAAATCGCTCCTGA-3´
rev: 5´-GAAATTTGTCGGTATTGCGTATC-3´
P4H5 / fwd: 5´-GCCACATCTCGAAGTAGTCGGTAAT-3´
rev: 5´-CGGCTGCATAGTTTTCTACATGTAAC-3´
P4H6-a / fwd: 5´-CTCTTGCTCTTCACCGTCGACTC-3´
rev: 5´-ACCGTGCTGCGTATTTTTCAAC-3´
P4H6-b / fwd: 5´-GAGACGTACTATTAAACACGTAGG-3´
rev: 5´-ACCGTGCTGCGTATTTTTCAAC-3´
genomic DNA amplification for KO construct
P4H1 / fwd: 5´- TGAATTCTGAATGTCATAAGGCCTCTACTG -3´
rev: 5´- TGAATTCAGAGGGTAGGATTGTGTGAAG -3´
P4H2 / fwd: 5´-CGAATTCCTCTGCTCCCTGTTCTTGTTTG-3´
rev: 5´-CGAATTCCACAAACTTCATCGACTTGATCC-3´
P4H3 / fwd: 5´-GAATTCGTTGCAGTAATCCTTGGTGAT-3´
rev: 5'-GAATTCTCTCCACCCTCTTCCACATC-3'
P4H4 / fwd: 5´-TGAATTCCTGAGGGGATTGAAGAG-3´
rev: 5´-TGAATTCAGAACACAGGGATCAGC-3´
P4H5 / fwd: 5´-TGAATTCTGCAGCTTGTTACACTCCCAAT-3´
rev: 5´-ATGAATTCAGATAGGCACGAGGTGGT-3´
P4H6 / fwd: 5´-TGAATTCTGCAGTAGATGGCCAATCATGT-3´
rev: 5´-GTAATCCTGCAACAAGAATTCAAAGCAG-3´
screening of integration in the genome
P4H1 / 5´-integration / fwd: 5´-GGCTAATGATGAAGATGCGAGA-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-AGCATCCCCTCGTTTAGGTT-3´
P4H2 / 5´-integration / fwd: 5´-TGTGGTATTCTCGCAGATTAGGG-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-CGGTCATAATTTGAGTTTTGCT-3´
P4H3 / 5´-integration / fwd: 5´-CAACGGATGCCATTGACAGT-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-CATTTGGCAACTTAAGGGTGTA-3´
P4H4 / 5´-integration / fwd: 5´-GACTCGGAAATCGCTCCTGA-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-CATCGACAGTTGTTCGTGGA-3´
P4H5 / 5´-integration / fwd: 5´-GTAAAGGACATTCGTTTATGCATCG-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-TGTGGTGATTACAAGAAATGGTCGT-3´
P4H6 / 5´-integration / fwd: 5´-ATAGGTGTCGCTACAGCAATCG-3´
rev: 5´-TGTCGTGCTCCACCATGTTG-3´
3´-integration / fwd: 5´-GTTGAGCATATAAGAAACCC-3´
rev: 5´-ATGGACACTCGCTCCTTGTAA-3´
P4H1 / overexpression / fwd: 5’-GGCTAATGATGAAGATGCGAGA-3’
rev: 5’-AGCATCCCCTCGTTTAGGTT-3’
transcript screening
P4H1 / fwd: 5´- GGCTAATGATGAAGATGCGAGA -3´
rev: 5´- AGCATCCCCTCGTTTAGGTT -3´
P4H2 / fwd: 5´-AGGACAAGCTGGAGAAGTCAATG-3´
rev: 5´-CCTAGCACACATGGCATG-3´
P4H3 / fwd: 5´-GGTGTTATGTAGAGATTCGTCACAAC-3´
rev: 5'-GAAATTTGTCAGTGTTGCGAATC-3'
P4H4 / fwd: 5´-TTGGTCGGCTTTTACGGTTC-3´
rev: 5´-AAAGAAGAGCATCGCCTTGG-3´
P4H5 / fwd: 5´-TCCTGTTGTCTCTAGCGCTCAT-3´
rev: 5´-GGCTGCATAGTTTTCTACATGTAAC-3´
P4H6 / fwd: 5´-CCAGAGCTTTTCTGTATCACCAC-3´
rev: 5´-ACCGTGCTGCGTATTTTTCAAC-3´
tbp / fwd: 5’-GCTGAGGCAGTCTTGGAG-3’
rev: 5’-TCGAGCCGGATAGGGAAC-3’

Supplementary Table S3: International Moss Stock Center ( accession numbers of plants used in this work.

Plant / IMSC No.
EPO 174.16 / 40216
P4H1KO I-192 EPO / 40218
P4H2 KO IX-6 EPO / 40234
P4H3 KO VIII-21 EPO / 40230
P4H4 KO I-95 EPO / 40231
P4H5 KO IX-29 EPO / 40223
P4H6 KO X-31 EPO / 40239
P4H1 OE XIX-12 in P4H1 KO I-192 EPO / 40336
P4H1 OE XIX-16 in P4H1 KO I-192 EPO / 40337
P4H1 OE XIX-32 in P4H1 KO I-192 EPO / 40338
P4H1 OE XIX-41 in P4H1 KO I-192 EPO / 40339
P4H1 OE XIX-45 in P4H1 KO I-192 EPO / 40340