Phylogeny of the AGC kinases

Using the same methods as those used for the tyrosine receptor kinases, the sequence of the kinase region of the stem AGC kinase has been deduced as:

FEILKVIGRG AFGKVYLVRH KDTGKLYAMK VLKKQDMIER NQVQHVRAER

DILAFAESPF IVSLFYSFQT KRHLYLVMEY VPGGDLFSLL KNYGPFPEEM

ARLYIAEVVL ALEYLHSLGI IHRDLKPDNI LLDSDGHIKL TDFGLSKVLL

GdGTLAKTLC GTPEYLAPEV LLGQGYGKAV DWWSLGVILY ELLVGRPPFY

GDTPEELFGK IIKDEVRFPE KLSPEAKDLL RKLLQKDPEK RLGxxGAEEI

KRHPFF

Methods

AGC protein kinase domains were selected from Swissprot database(http://www.ncbi.nlm.nih.gov/entrez/) and arranged into families on the basis of homology relatedness. This corresponded to families defined using extracellular structure1. A family tree (fig.1) shows the sequence similarity between protein kinase domains, derived from public sequences and gene prediction methods2 . Domains were defined by hidden Markov model profile analysis and multiple sequence alignment. The initial branching pattern was built from a neighbor-joining tree derived from a clustalW protein sequence alignment of the domains was prepared by pairwise comparison (in practice, a tree available on a commercial website was used (http://www.cellsignal.com/reference/kinase/tk.asp accessed 6/03)).

Assuming that each branch point represents a gene duplication event, the immediate ancestral gene as it was at the time of duplication was given a name(fig 1) and a sequence was determined as a consensus sequence of its progeny using its nearest neighbour as an outgroup to determine which amino acid was the original where those of the progeny differed. (‘x’ was used where this could not be determined). To enable this, the amino acid sequences of the gene products had to be aligned. In order to align amino acids, sequences were ‘piled up’ to locate conserved stretches and variable inserts. Initially the clustal alignment of the NCBI conserved domain database for kinases (http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi) was used to give each amino acid a number in the (longest aggregate) sequence (Supplementary Fig. 1 online), though some adjustments were made subsequently as necessary to improve fit. The greatest need for adjustment was observed at the edges of the conserved domains.

For each amino acid, an evolutionary tree was constructed by using successive neighbours or derived neighbours as outgroups. The final stem sequence (S1) was rooted using stem sequences of the STE,TKL and RTK families derived in the same way, as outgroups.

Where ‘x’s accumulated, a tentative assignment was made by looking for amino acids that appeared in progeny on both sides of a divide. Finally, the tree was constructed that required the least number of mutations overall. Where there was a choice of equal parsimony, it was assumed that the same mutation had occurred twice during the family development rather than a forward mutation that was subsequently reversed.

Where a change occurred to an amino acid (or inserted stretch), it was assumed that it happened once only if possible (before duplication if it changed in both downstream branches; after duplication if in one only).

Each (numbered) amino acid in turn was then tested in the family tree to assess its likely pattern of evolution. Amino acids were labelled black and bold if there was no doubt (contained in all progeny), red if there were identical progeny in both branches, or one progeny matched the immediate outgroup, blue if identical progeny was only on one branch, plain black if inferred only from parentage. Amino acids were underlined if they represented a mutation.

1 Fantl, W.J., Johnson, D.E. & Williams,L.T. Signalling by receptor tyrosine kinases. Ann. Rev. Biochem. 62, 453-481 (1993)

2 Manning, G., Whyte, D.B., Martinez, R., Hunter,T. & Sudarsanam, S. The Protein Kinase Complement of the Human Genome. Science, 298, 1912-1934 , (2002)

Fig 1

MRCKA ---( ) AGC names for stem sequences

MRCKB ---( )---(MRCKAB)

DMPK2 ------( )---(S30)

DMPK1 ------( )—--(S24)

ROCK1 ---( ) ()------(S20)

ROCK2 ---( )------(ROCK12)()------(S16)

CRIK ------( ) ()

NDR1 ---( ) ()------(S4)

NDR2 ---( )---(NDR12) () ()

LATS2 ---( ) ()------(S17) ()

LATS1 ---( )---(LAT12) ()

AKT1 ---( ) ()

AKT2 ---( )---(AKT12) ()

AKT3 ------( )---(S31) ()

SGK1 ---( ) ()------(S14) ()

SGK3 ---( )---(SGK13) () () ()

SGK2 ------( )---(S32) () ()

PKCA ---( ) () ()

PKCB ---( )---(PKCAB) ()------(S12) ()

PKCG ------( )---(S33) () () ()

PKC-L ---( ) ()------(S25) () () ()

PKCE ---( )------(PKCEL) ()------(S21) () () ()

nPKC-delta ( ) () () () () ()

nPKC-theta ( )------(PKCdt) ()------(S18) () () ()

nPKC-iota –( ) () () () () ()

PKC zeta -( )------(PKCiz) () () () ()

PKN1 ----( ) ()------(S15) ()-----(S10) ()

PKN2 ----( )---(PKN12) () () () ()

PKN3 ------( )------(S19) () () ()

MSK1 ----( ) () () ()

MSK2 ----( )------(MSK12) () () ()

MSK2b () () () ()

p70-S6K –( ) () () () ()

p70S6Kb –( )------(p70ab) ()------(S13) ()------(S8) ()

RSK2 ---( ) () () () () ()

RSK1 ---( )---(RSK12) ()------(S22) () () ()

RSK3 ------( )---(S34) () () () ()

RSK4 ------( )---(S26) () () ()

BARK2 ---( ) () () ()

BARK1 ---( )------(BAR12) () () ()

GRK5 ---( ) () () () ()------(S2)

GRK4 ---( )---(GRK54) ()------(S11) () () ()

GRK6 ------( )---(S35) () ()------(S6) () ()

GRK1 ---( ) ()------(S27) () () () ()

GRK7 ---( )------(GRK17) () () () ()

PRKY ---( ) () ()------(S5) ()

PRKX ---( )------(PRKYX) () () ()

PRKACA –-( ) () () () ()

PRKACB –-( )---(PRKAB) ()------(S28) () () ()

PRKACG ------( )---(S36) ()------(S23) () () ()

PRKG2 --( ) () () () () ()

PRKG1 --( )------(PRK21) ()------(S9) () ()

PDPK1 ------( ) () ()

YANK2 --( ) () ()

YANK1 --( )---(YANK12) () ()

YANK3 ------( )---(S37) () ()

SGK494 ------( ) ()------(S7) ()

RSKL1 --( ) ()------(S38) ()------(S1)

RSKL1b ()------(RS1ab) ()

RSKL2 --( ) ()

RSKL2b ()

MAST1 --( ) ()

MAST4 --( )---(MAST14) ()

MAST2 ------( )---(S39) ()

MAST3 ------( )---(S29) ()

MASTL ------( )------(S3)

MASTLb