SUPPLEMENTARY DATA

Four different categories of GGT-related sequences

The loci that were identified as having homology to human GGT1 could be subdivided into four categories (see Table 1 Supplementary Data). Category 1 contains four members that have substantial nucleic acid identity over the full length of the test GGT1 cDNA and includes GGT1, GGT2, GGT3P and GGT4P. The status of the GGT2 gene as an actively transcribed locus is currently not clear. Pawlak et al (1998) cloned three cDNAs from a human kidney cDNA library and reported the sequence of an 0.8 kb clone which was designated type II. We performed a database search against the human genome (build 36.1) and EST databases with this 800 bp sequence and found that it has the highest match with both GGT2 and GGTLC3, but 100% identity with neither sequence. In addition, there are no ESTs with 100% identity to this type II RNA. No other authentic mRNAs are currently listed in the databases for either GGT2 or GGTLC3. Both GGT2 (not shown) and GGTLC3 (Suppl. Fig. 1) have mutations in residues S451 (S→L) and D423 (D→T), which are proposed to interact with glutathione in GGT1 (Han et al., 2007; Okada et al., 2006). Based on this, the status of GGT2 and GGTLC3 as active loci is currently unclear. GGT3P and GGT4P are descibed in more detail below.

Category 2 includes the light chain genes of which only two (GGTLC1 and GGTLC2) are associated with mRNAs. They are also represented on microarrays (see Table II, Geo Profile records). Category 3 includes one gene that contains only sequences homologous to GGT1 coding exons 1, 4, and 5. However, the reading frame of that gene encounters a stop codon at amino acid residue 25 and although an mRNA (NR_003503) was reported it does not appear to be able to encode protein with any GGT function and therefore this gene is considered to be a pseudogene (GGT8P).

Category 4 contains genes with a deduced amino acid sequence exhibiting a higher or lower degree of similarity to GGT1, namely GGT5 {formerly GGTLA1/GGT-rel, GGL}, GGT6 and GGT7 (formerly GGTL3 or GGTL5). We also performed database searches with the GGT5 cDNA sequence. However, apart from the locus that encodes it on chr 22: 22.9 Mb, the human genome does not contain sequences with substantial nucleotide identity to this gene or parts of it. In 2005 Puente et al reported that chimpanzee does not contain a GGT5 ortholog, but a database survey for GGTLA1 against the Pan troglodytes (build 2.1) genome now showed the presence of such gene on chromosome 22. We also performed a database search with the GGT7 cDNA sequence but it also represents a single gene, lacking other sequences with significant identity in human. Finally, GGT6 also is a single-copy gene that is located on chromosome 17p13.2.

Examination of genes with a frame shift in coding exon 9- GGT3P and GGT4P

GGT3P and GGT4P have very substantial nucelotide sequence homology over their entire length to the bona fideGGT1 gene. However, although the moieties encoding the heavy chain have an apparently open reading frame consistent with GGT1, the light chains have a frame shift that would render the amino acid sequence quite different. As shown in Supplementary Fig. 1, the GGT3P and GGT4P genes both miss one nucleotide in exon 9 that causes a frame shift, although the possibility of a continued open reading frame that encodes a substantial extra number of amino acid residues is present.

Frame shifts are also present in GGTLC4P and GGTLC5P. GGTLC4P has one nucleotide missing in coding exon 9 after the –FGSKVRSPVSGILFNDEMDDFSSPNITNEFGVPP- string, causing a frame shift.

GGTLC5P misses one nucleotide after the sequence FGSKVCSPVSGILFNNEWTTSALPA- leading to an amino acid terminal end similar to GGT3P and GGT4P.

1

Supplementary Table 1. Categories of GGT1-related sequences in the human genome

Category / gene / Chrom. location
1. substantial nucleic acid identity to GGT1 over the entire length /

GGT1

/ 22: 23.3
GGT2 / 22: 19.89
GGT3P / 22: 17.15
GGT4P / 13
2. substantial nucleic acid identity to the light chain encoding part of GGT1 /

GGTLC1

/ 20: 23.92
GGTLC2 / 22: 21.31
GGTLC3 / 22: 18.75
GGTLC4P / 22: 22.97
GGTLC5P / 22: 18.95
3. substantial nucleic acid identity to some GGT1 exons /

GGT8P

/ 2: 91.3
4. similarity in deduced amino acid sequence to GGT1 /

GGT5

/ 22: 22.95
GGT6 / 17: 4.4
GGT7 / 20: 32.9

Supplementary Figure 1.

T381 N401

GGTLC1 MTSEFFSAQLRAQISDDTTHPISYYKPEFYMPDDGGTAHLSVVAEDGSAVSATSTINLY 8

GGTLC2 MTSEFFAAQLRAQISDDTTHPISYYKPEFYTPVDGGTAHLSVVAEDGSAVSATSTINLY 8

GGTLC3 MTSEFFAAQLRSQISDHTTHPISYYKPEFYTPDDGGTAHLSVVAEDGSAVSATSTINLY 8

E420

GGTLC1 FGSKVRSPVSGILLNNEMDDFSSTSITNEFGVPPSPANFIQP 9

GGTLC2 FGSKVRSPVSEILFNDEMDDFSSPNITNEFGVPPSPANFIQP 9

GGTLC3 FGSKVCSPVSGILFNNEWTTSALPAFTNEFGAPPSPANFIQP 9

D423

S451 G474

GGTLC1 GKQPLSSMCPTIMVGQDGQVRMVVGAAGGTQITMATAL 10

GGTLC2 GKQPLSSMCPTIMVGQDGQVRMVVGAAGGTQITTATAL 10

GGTLC3 GKQPLLSMCPTIMVGQDGQVRMVVGAAGGTQITTDTAL 10

S452 G473

GGTLC1 AIIYNLWFGYDVKWAVEEPRLHNQLLPNVTTVERNIDQ 11

GGTLC2 AIIYNLWFGYDVKRAVEEPRLHNQLLPNVTTVERNIDQ 11

GGTLC3 AIIYNLWFGYDVKRAVEEPRLHNKLLPNVTTVERNIDQ 11

GGTLC1 EVTAALETRHHHTQITSTFIAVVQAIVRMAGGWAAASDSRKGGEPAGY 12

GGTLC2 AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY 12

GGTLC3 AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY 12

Supplementary Fig. 1. Alignment of light chain only genes (GGTLC) that have a deduced amino acid sequence similar to the frame of GGT1 and GGT2. The amino acid residues that would differ from those found in the light chain of GGT1 are highlighted. The sequences are divided into segments encoded by different exons according to that of GGT1. Exon numbering indicated at the end of each line is that of the corresponding protein coding exons of the GGT1 gene. Han et al (2007) proposed that residues T381, N401, E420, D423, G473, G474, S451 and S452 in the active site of human GGT1 light chain interact with glutathione, based on analysis of the crystal structure of E. coli GGT residues T391, D433, S462 and S463 (Okada et al., 2006). Relevant sites are indicated above and below the sequence. The sequence underlined in exon 9 is one hallmark difference between GGT1 (similar to GGTLC1 and GGTLC2) and GGT2 (similar to GGTLC3). The GGTLC1 sequence is NP_842563 from NM_178311 (chr 20: 23.92 Mb, Wetmore et al (1993); the GGTLC2 sequence is NP_543029 from NM_080839 (chr 2: 21.3 Mb, locus 129026, gene 1, GGTL4); the GGTLC3 sequence is predicted XP_001128310 from predicted XM_001128310 (chr 22: 18.75 Mb, gene 11). There are variant cDNAs for both GGTLC2 (NM_199127) and GGTLC3 (predicted, XM_001128302) that encode additional amino acid residues between exons 10 and 11 that are the result of an in-frame read through of the intron.

For GGTLC2 this sequence is -ICVTPFLPGRAHPAQPPSHADHTPMQP- and for GGTLC3 –VCVTPFLPGPAHSAQPPSHADHTPMPQ-. The type III amino acid sequence reported by Leh et al. (1996) is similar to the GGTLC2 sequence with inclusion of the read-through of the nucleotide sequence between exons 10 and 11; however this type III cDNA has a C-terminal end (…GGVPATECSPGGQG*) that differs from that of GGTLC2 (…..GGEPAGY). An additional difference is an E at position 414 in GGTLC2 and a G at that position in the clone reported by Leh et al (1996).

1

Supplementary Figure 2.

GGT3P

MKKKLVVLGLLAVVLVLVIVGLCLWLPSASKEPDNHVYTRAVVAADAKQCLEIGR 1 DTLRDGGSAVDAAIAALLCVGLMNAHSMGIGVGLSSTIYNSTT 2

RKAEVINAREVAPRLAFASMFNSSEQSQK 3

GGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARQGFPVGKGLAAVLENKRTVIEQQPVLC 4 EVFCRDRKVLREGERLTLPRLADTYEMLAIEGAQAFYNGSLMAQIVKDIQAA 5 GGIVTAEDLNNYCAELIEHPLNISLGDAVLYMPSARLSGPVLALILNILK 6

GYNFSRESVETPEQKGLTYHRIVEAFRAYAKRTLLGDPKFVDVTE 7

VVRNMTSEFFAAQLRSQISDHTTHPISYYKPEFYTPDDGGTAHLSVVAEDGSAVSATSTINLY 8

GKQPLLSMCLTIMVGQDGQVRMVVGAAGGTQITTDTAL (NM_002058) 10

AIIYNLWFGYDVKRAVEEPRLHNKLLPNVTTVERNIDQ (NM_002058) 11

AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY* (NM_002058) 12

GGT4P

MKKKLVVLGLLAVVLVLVIVNLCLWLPSASKEPDNHVYTRAAVAADAKQCSEIGR 1

DTLRDGGSAVDAAIAALLCVGLMNAHSMSIGGGLFLTIYNSTS 2

GKAEVINAREVAPRLAFASMFNSLEQSQK 3 GGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARQGFPVGKGLAAVLENKRTVIEQQPVLW 4 HVCGEVFCRDRKVLREGERLTLPRVADTYETLAIEGAQAFYNGSLMAQIVKDIQAA 5 VMVQPHPSAHSSCCPVAGGIVTAEDLNNYCAELIEHPLNISLGDAVLYMPSAPLSGPVLALILNILK 6 GYNFSWESVETPEQKGLTYHRIVEAFWFAYAKRTLLGDPKFVNVTE 7 VVRNMTSEFFAAQLWAQISDNTTHTISYYKPKFYTPDDRGTAHLSVITEDGSAVSATSTINLY 8

FGSKVCSPVSGILFNNEWTTSALPA^SPMSLGYPPHLPISSSQGSSRSRPCSQRSWWARTARSGW

WWELLGARRSPQPLHWPSSTTSGSAMT* ^frame shift, one nt gone

Supplementary Fig. 2. GGT3P and GGT4P genes- deduced amino acid sequences. Numbers at the end of each line indicate the coding exon in GGT1 which encodes these residues. For GGT3P, all exons and intron-exon junctions were compared to those in GGT1. Residues in green indicate amino acids in the deduced sequence that differ from those in GGT1. The boxed area in GGT3P indicates a one nucleotide discrepancy between one cDNA record (NM_002058) and the sequence of genomic DNA and three other cDNAs. It is unclear whether this is reflective of a polymorphism or an error. Only the sequence that includes this residue (NM_002058) would encode a light chain with homology to that of GGT1. The other cDNAs have a reading frame that is shifted and amino acid residues thereafter unlike that of GGT1. The GGT3P gene sequence is from the genomic DNA and from several cDNAs (NR_003267, BC108264), whereas the GGT4P sequence is from a predicted mRNA XR_016938.

1