The more the merrier? How a few SNPs predict pigmentation phenotypes in the Northern German population

Amke Caliebe1,*, Melanie Harder2*, Rebecca Schuett2,4, Michael Krawczak1, Almut Nebel3, Nicole von Wurmb-Schwark2

1Institute of Medical Informatics and Statistics,2Institute of Legal Medicine, 3Institute of Clinical Molecular Biology, all at Christian-Albrechts University Kiel, Germany;4current address: State Criminal Investigation Department of Lower Saxony

Correspondence to:

PD Dr. rer. nat. Nicole von Wurmb-Schwark, Institute of Legal Medicine, Christian-Albrechts University Kiel, University Hospital Schleswig-Holstein, Arnold-Heller Str. 12, 24105 Kiel, tel.: +49 431 597 3633, fax: +49 431 597 3612, mail:

*These authors contributed equally to this work.

Material and Methods

Study population

A total of 400 unrelated individuals (197 male, 203 female) from Northern Germany were recruited for our study between 2010 and 2011. The median age was 27 years (inter-quartile range: 24-33 years). All individuals were born in Germany and had German parents and grandparents (self-report). All 400 participants were recruited and investigated in the same way. Whilst the first 300 participants were included in the ‘modelling sample’ of stage 1 (used for SNP selection), the remaining 100 individuals constituted the ‘estimation sample’ of stage 2 (prediction evaluation). All participants gave written informed consent prior to the study. Genotype and phenotype data were de-identified for analysis purposes according to the declaration of Helsinki. The project was approved by the Ethics Committee of the Medical Faculty of Christian-Albrechts University Kiel.

Phenotyping

Pigmentation phenotypes were documented by photographs taken at daylight conditions and from a distance of 30 cm, using a Canon EOS 400D (18-55 mm focal length). For each participant, one photograph was taken of each eye, the scalp hair and the inner arm. Photographs were normalised using the standard functions in Photoshop 4.0, and consensus phenotype calling was carried out by two raters by discussion. In the rare cases where no agreement was reached a third party was involved.

Eye colour was divided into three categories, namely blue, green and brown (Fig. 1a). Individual skin type was classified applying the Fitzpatrick scheme (1988) to the inner arm. Hair colour type was defined in multi-tiered fashion. To this end, a collection of coloured hair strands obtained from a hairdresser was categorised into nine evenly graded types of shading, ranging from light blond (type I) to black (type IX) (Fig. 1b). Strands of red hair or with red tint were omitted from this classification because it was intended to address the light-dark component only. Then, hair colour was divided into two sub-phenotypes, namely the red tint component (yes/no) and the light-dark component (I-IX). For each individual, the presence of red tint was ascertained (by questioning) in head hair, facial hair (beard), axillary hair or pubic hair. If an individual had red head hair, this was noted separately to enable a separate analysis for this special phenotype. The light-dark component was determined by reference to the hair strand collection mentioned above. Here, individuals with recognizable red tint were classified according to their basic hair colour type. For example, strawberry blonds were deemed class I (blond) whereas people with auburn hair were classified as one of IV, V or VI (brown). While this was possible for 22 red-haired individuals, four had no definable basic hair colour. These were excluded from the analysis of the light-dark component. No participants with exclusively white hair were included in the study. When a participant had dyed hair, the original hair colour was determined by the hairline.

Genotyping

Buccal swabs (COPAN) were taken from all 400 participants and DNA was extracted using Chelex 100 (Walsh et al. 1991). In a comprehensive PubMed search, 12 SNPs were identified as promising candidates for further analysis using the following criteria (Table 1, Supplementary Table S1): large odds ratio, validation in several independent studies, large sample sizes and adequate population backgrounds, low to no linkage disequilibrium with other candidate markers and suitability for genotyping in a single assay. In addition to the 12 SNPs, participants were genotyped for rs1426654 (SLC24A5), rs1129038 (HERC2) and rs1667394 (OCA2). SNP rs1426654 is a European ancestry marker (Giardina et al. 2008) used to control population background. SNPs rs1129038 and rs1667394 served as genotyping quality markers because they are in perfect LD with candidate SNPs rs12913832 and rs916977 respectively (Mengel-From et al. 2010; Sturm et al. 2008).

Primers were designed and checked for possible dimer and hairpin structures using the DNAstar Lasergene v8.1.2 software and BLAST. PCR fragments had to be shorter than 200 bp in order to meet the standards of reliable forensic or ancient DNA analysis.

For DNA amplification, a Multiplex PCR Master Mix (Qiagen) was used in a total reaction volume of 12.5 µl, with 0.2-0.5 ng template DNA. PCR was performed with a thermal cycler 2700 (Life Technologies) under the following conditions: (1) 95° C 15 min, (2) 35 cycles of 94° C 30 sec, 64° C (SNPs nos. 2-5, 7-8, 10 in Table 1) or 58° C (SNPs nos. 1, 6, 9, 11-12) 90 sec, 72° C 1 min and (3) 60° C 30 min. PCR products were purified using ExoSAP-IT (Affymetrix) according to the manufacturer’s protocol. Single-base extension (SBE) was carried out in a total reaction volume of 7 µl, including 0.5 µl of cleaned PCR products, using the SNaPshot Multiplex Kit (Life Technologies) on the same PCR cycler as before. The SBE cycling conditions were as follows: 25 cycles of 96° C 10 sec, 55° C 5 sec, 60° C 30 sec. Fragment analysis was performed with the ABI Prism 3130 Genetic Analyzer (Life Technologies) using GeneMapper v3.2. For more information on primer sequences and concentrations, see Table S2a.

The model proposed by Walsh et al. (2011) for the prediction of eye colour is based upon six SNPs. Five of these had been genotyped in all our study participants before (Table 1, SNPs nos. 1, 3, 10-12). The 100 individuals of stage 2 were additionally genotyped for rs16891982 (SLC45A2) as described above, with an annealing temperature of 58° C in the first PCR. For more information on primer sequences and concentrations, see Table S2b.

The model devised by Branicki et al. (2011) for predicting hair colour is based upon 13 single or compound markers. Three of these were also included in our set of candidate SNPs (Table 1, SNPs nos. 1, 3, 10) and were genotyped in all participants. The 100 stage 2 individuals were also genotyped for the remaining 10 markers, namely two compound markers in MC1R and rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs16891982 (SLC45A2) and rs2378249 (ASIP). SNPs were analyzed as described above, with an annealing temperature of 58° C for the first PCR. The MC1R markers were analyzed by sequencing the whole locus. To this end, a 1080 bp fragment was amplified and sequenced with an ABI Prism 3130xl Genetic Analyzer using the Big Dye Terminator v3.1 Cycle Sequencing Kit ( both Life Technologies), following the manufacturer’s protocols. See Table S2c for more information on primers used in this study.

Genotype and phenotype data of this study were submitted to the European Genome-phenome archive (EGA, with study accession number EGAS00001001174 (sample/proband ids EGAN00001268626-EGAN00001269025).

Statistical analysis

Genotypes for all markers of the two previously published models (or marker sets) (Branicki et al. 2011; Walsh et al. 2011) were only available for 100 individuals in our study (stage 2). To compare the predictive capability of the two marker sets to a model derived specifically for our target population, the 300 individuals of stage 1 were used to detect significant genotype-phenotype associations and to create an appropriate prediction model. Data from stage 2 then served for estimation and comparison of the predictive capability (sensitivity, specificity, predictive accuracy, area under the receiver operating characteristic curve AUC) for each new model and the two previously published models (Branicki et al. 2011; Walsh et al. 2011). For illustration, we also performed model selection and prediction evaluation on the whole data set (i.e. stages 1 and 2 combined), using cross-validation to estimate sensitivity and specificity.

Sample size calculations indicated that approximately 100 individuals per group would suffice to detect an OR of 3 as nominally significant, depending upon the minor allele frequency of the SNP of interest, and 150 individuals per group after Bonferroni adjustment (12 SNPs tested, 80% power, 5% significance level). Stage 1 therefore comprised 300 individuals. The association between a given trait (i.e. eye colour, hair colour/red tint, hair colour/light-dark component, skin colour) and a candidate SNP was tested for statistical significance using regression models. To allow for scarce genotypes, we performed permutation tests (100,000 permutations) in addition to standard asymptotic tests. Since p values were not found to be notably different, only p values from permutation tests will be given. Each SNP was analyzed both individually (simple regression) and in combination with other candidate SNPs (multiple regression with backward selection), also allowing for possible SNP-SNP interactions. Genotypic, additive allelic, dominant and recessive models were considered for each SNP. Results, however, will be presented for the additive model only because this model required the least parameters but yielded consistently large effects. To derive robust prediction models, phenotypes were categorised in various ways. Dependent on the scaling of the outcome, we performed logistic, linear, ordinal (proportional odds) and/or multinomial logistic regression. Since blue was by far the most frequent eye colour in our study, the analysis of eye colour was confined to the discrimination between blue and non-blue. For hair colour, red tint was treated as a dichotomous outcome whereas the light-dark component was treated in three different ways, either as dichotomous (blond vs. non-blond), ordinal, or quantitative (nine types of increasing darkness). Skin colour was treated either as dichotomous (types I-II vs. types III-IV) or as ordinal. Model selection was performed differently for the four traits. For eye colour and red tint, SNPs that remained significant in the multiple logistic regression analysis after backward selection and adjustment for multiple testing were included in the final model. For the light-dark component of hair colour, and for skin colour, a SNP had to be significant in all or in all but one of the multiple regression analyses after backward selection using different outcome definitions (i.e. at least two of three analyses for the light-dark component, at least one of two analyses for skin colour).

The relationships between traits were analyzed using logistic regression analysis, treating one trait as the dependent variable and the other traits as independent variables, both with and without the additional inclusion of SNP genotypes. All four traits were encoded as dichotomous variables in these analyses. Model selection was again performed by backward selection. Multidimensional scaling (MDS) was used to detect and visualise patterns in the phenotype data.

The predictive capability of a derived model was evaluated by means of the phenotype probability  from logistic regression analysis. This was done only for dichotomous outcomes (e.g. blue vs. non-blue eye colour). If >0.5, the corresponding phenotype was assumed to be present. Predictive capability was quantified by the sensitivity, specificity, predictive accuracy and AUC of the model in question.

All statistical analyses were performed with R v2.10.1 (R Development Core Team 2009) unless indicated otherwise. Hardy-Weinberg equilibrium was assessed by means of the exact test implemented in R package genetics(Warnes et al. 2008). Package MASS was used for ordinal and multinomial regression (Venables and Ripley 2002). Permutation tests of the linear and logistic regression models were performed with package glmperm(Werft et al. 2013). For ordinal regression models, permutation tests were programmed in house. The predictive capabilities of different models were evaluated with packages DiagnosisMed(Brasil 2010) and pROC(Robin et al. 2011). The proportion of phenotype heritability explained by a given marker was calculated according to So et al.(So et al. 2011). Note that these estimates apply to single markers and do not take into account the characteristics of the respective regression models. Furthermore, these estimates refer to the liability scale and therefore tend to be higher than on the observation scale.

Sample size calculations were performed with the GPower software v3.0.8. All tests were two-sided and a p value smaller than 0.05 was considered nominally statistically significant. P values were adjusted for multiple testing using the Bonferroni method.

Supplementary Table 1: Terms for PubMed search

Category 1 / Category 2 / Category 3 / Category 4 / Category 5
human / pigmentation / genotype/genotyping / prediction / origin/ancestry/
ancestral
eye colour/color / single nucleotide polymorphism/
SNP/SNPs / determination / forensic
skin colour/color / forensic phenotyping/
forensic DNA phenotyping
hair colour/color / Irisplex
red hair/red tint
curly hair

For the PubMed search were used:

Always: one word each of categories 1-3

Often: one word of category 4

Sometimes: one word of category 5

Furthermore articles were investigated that cited already retrieved articles.

Supplementary Table 2:Genotyping information for all analyzed SNPs

Supplementary Table 2a: Genotyping information for the 12 analyzed candidate SNPs and 3 control SNPs

Multiplex I
gene / size (bp) / SNP-ID / primer sequence / [primer] in µM / SNP allele / SBE-primer sequence / tail / total length (bp)
MC1R / 174 / rs1805007
NC_000016.10:g89919709C>T / forward:GCCGTGGACCGCTACATC / 0.6 / C/G/T / TCTCCATCTTCTACGCACTG / (GACT)5 / 40
rs1805008
NC_000016.10:g.89919736C>T / reverse: GAAGAAGACCACGAGGCACAG / C/T / AGCATCGTGACCCTGCCG / (A)14 / 32
OCA2 / 167 / rs4778138
NC_000015.10:g.28090674A>G / forward: GGAAAATCTGCACACTTAGAAA / 0.8 / G/A / GTGAAAATATAACATATCAAAATTG / (GACT)6 / 49
reverse: GCTGTAAATTTCCTCCCATCA
HERC / 181 / rs1667394
NC_000015.10:g.28285036C>T / forward: TTGGCAGCTTTTCTGTCTTCT / 0.2 / G/A / CATTGTTTCTTTGTTTGTTTGGT / (A)7 / 30
reverse: AAAATGAGAACTTGGTCAATCC
OCA2 / 64 / rs7495174
NC_000015.10:g.28099092A>G / forward: GGCTCCGTCGCACCCGTCTG / 0.2 / G/A (C/T) / AGGCAAGTTCCCCTAAAGGT / (A)5 / 25
reverse: GCGGCTTAGGAAGCAAGGCAAG
OCA2 / 254 / rs1800407
NC_000015.10:g.27985172C>T / forward: CAGAGGTGCTTTGCGTACCTTATGGT / 0,4 / G/A / AGGCATACCGGCTCTCCC / (GACT)9 / 54
reverse: GGGGTAATGTTAGTTTGGCTCCCTGTTCTTA
IRF4 / 121 / rs12203592
NC_000006.11:g.396321C>T / forward: TCATGTGAAACCACAGGGCA / 0.4 / G/A (C/T) / ACTTTGGTGGGTAAAAGAAGG / (GACT)8 / 53
reverse: CTGGCACCAAAAGTACCACA
HERC2 / 75 / rs916977
NC_000015.10:g.28268218T>C / forward: CACAGTGGGGATGCAGTTTGAGTA / 0.1 / G/A / GTGCAGCCTTGGCCAGCCTTCT / (GACT)6 / 46
reverse: TTGGCCTTTCTGTTCTTCTTGACC
Multiplex II
TYR / 73 / rs1393350
NC_000011.10:g.89277878G>A / forward: TATCCACCAACTCCTACTCTT / 0.05 / G/A / CCTCAGTCCCTTCTCTGCAAC / (GACT)9 / 57
reverse:TTATCATTTGTAAAAGACCACAC
SLC24A5 / 137 / rs1426654
NC_000015.10:g.48134287A>G / forward: GAAGAAAATAAAAATCACACTGAGTAAGC / 0.2 / G/A (C/T) / CTGAACTGCCCGCTGCCATGAAAGTTG / (GACT)4 / 43
reverse:CCTTGGATTGTCTCAGGATGTTGC
MC1R / 143 / rs1805009
NC_000016.10:g.89920138G>C / forward: CACCGCGCTCACCAGGAGC / 0.2 / G/C / CTCATCATCTGCAATGCCATCATC / (GACT)7 / 52
reverse:GGCTGCATCTTCAAGAACTTCAACC
HERC2 / 176 / rs1129038
NC_000015.10:g.28111713C>T / forward: GCCGACGACAGCAGCGACGAT / 0.1 / G/A / TGAGCCAGGCAGCAGAGC / (A)9 / 27
reverse:CAGACACACCAGGCAGCCTACAGTCT
SLC24A4 / 114 / rs12896399
NC_000014.9:g.92307319G>T / forward: ATTGAGTATCCTATATTTTATCTG / 0.1 / G/T / CTTTAGGTCAGTATATTTTGGG / 22
reverse:TCTTGATGTTGTATTGATGAGG
OCA2 / 66 / rs4778241
NC_000015.10:g.28093567A>C / forward: CTGGAAAGCAGTTTGACAGTT / 0.1 / C/A (G/T) / TGTTGGCTGGTAGTTGCAATT / (GACT)6 / 45
reverse:GTGCAATTGTTGGCTGGTAG
HERC2 / 163 / rs12913832
NC_000015.10:g.28120472A>G / forward: AAGAGGCGAGGCCAGTTTC / 0.05 / G/A / GCCAGTTTCATTTGAGCATTAA / (A)13 / 35
reverse:AGAAACGACAAGTAGACCATTT
reverse SBE-primer

Supplementary Table 2b: Genotyping information for additional SNP rs16891982 (SLC45A2) from Walsh et al.30

Singelplex-PCR
gene / size (bp) / SNP-ID / primer sequence / [primer]
in µM / SNP allele / SBE-primer sequence / tail / total length (bp)
SLC45A2 / 107 / rs16891982
NC_000005.10:g.33951588C>G / forward: AGAAACTTTTAGAAGACATCCTTAGGAGAGAGAAA / 0.2 / C/G (G/C) / AGGTTGGATGTTGGGGCTT / (GACT)7 / 47
reverse: AAGAGGAGTCGAGGTTGGATGTTGG
reverse SBE-primer

Supplementary Table 2c: Genotyping information for additional single or compound markers from Branicki et al. 29

Multiplex-PCR
gene / size (bp) / SNP-ID / primer sequence / [primer] in µM / SNP allele / SBE-primer sequence / tail / total length (bp)
TYR / 120 / rs1042602
NC_000011.10:g.89178528C>A / forward: ATGACCTCTTTGTCTGGATGC / 0.1 / A/C (T/G) / CAATGTCTCTCCAGATTTCA / (GACT)9 / 32
reverse: CTATGCCAAGGCAGAAAAGC
EXOC2 / 113 / rs4959270
NC_000006.12:g.457748C>A / forward: CTGGGGTTTACGATTCAACA / 0.2 / A/C / CCAAACTATGACACTATG / (GACT) / 22
reverse: AGGATGGAAAAGAACCACCA
SLC45A2 / 109 / rs28777NC_000005.10:g.33958854C>A / forward: GTGGGAGTTCCATGCCTTT / 0.2 / A/C / CATGTGATCCTCACAGCAG / (GACT)2 / 29
reverse: TCCAAGAGTCGCATAGGACA
Duplex-PCR
SLC24A4 / 152 / rs2402130
NC_000014.9:g.92334859G>A / forward: ACCTGTCTCACAGTGCTGCT / 0.2 / A/G / CATACGGAGCCCGTG / (GACT)7 / 43
reverse: TTCACCTCGATGACGATGAT
KITLG / 106 / rs12821256
NC_000012.12:g.88934558T>C / forward: TTAAGCTCTGTGTTTAGGGTTTTT / 0.1 / C/T (G/A) / GGGCATGTTACTACGGCAC / (GACT)5 / 39
reverse: TGAGTCATGAGTGCTTTGTTCC
Singleplex-PCR
ASIP / 135 / rs2378249
NC_000020.11:g.34630286G>A / forward: GACCTCAGTTCTGGAGAAAGC / 0.2 / A/G / CTAGGAACTACTTTGCACAGTA / (GACT)3 / 34
reverse: AAGGTGGCTGGTTTCAGTCT
Singleplex-PCR
TYRP1 / 120 / rs683
NC_000009.12:g.12709305C>A / forward: TTTTCTTTCACTTTATTACCTTCTTTC / 0.2 / A/C (T/G) / GCCTAGAACTTTAAT / (GACT)3 / 27
reverse: AAAGATTCTGAAAGGGTCTTCC
Singleplex-PCR MC1R Exon
gene / size (bp) / primer sequence / concentration of each primer (µM)
MC1R / 1080 / forward: GCAGCACCATGAACTAAGCA / 0.4, 3.2 in the sequencing reaction
reverse: TGCCCAGCACACTTAAAGC

Supplementary Table 3: Genotype-phenotype association analysis of eye colour

Supplementary Table 3a: Association between SNP genotype and blue eye colour (stage 1)

SNP-ID (gene) / Alleles / Regression p value
ref. / fav. / simple / multiplea
rs12913832 (HERC2) / A / G / <1.010-5 / <1.010-5 (1.210-4)
rs916977(HERC2) / A / G / <1.010-5 / n.s.
rs1800407(OCA2) / G / A / n.s. / 0.0014 (0.017)
rs7495174(OCA2) / G / A / <1.010-5 / n.s.
rs4778138(OCA2) / G / A / <1.010-5 / n.s.
rs4778241(OCA2) / A / C / <1.010-5 / n.s.
rs1805007(MC1R) / - / - / n.s. / n.s.
rs1805008(MC1R) / - / - / n.s. / n.s.
rs1805009 (MC1R) / - / - / n.s. / n.s.
rs12203592(IRF4) / - / - / n.s. / n.s.
rs12896399 (SLC24A4) / - / - / n.s. / n.s.
rs1393350 (TYR) / - / - / n.s. / n.s.

ref.: reference allele, fav.: favourable allele, n.s.: not significant. aFigures in brackets are p values after Bonferroni adjustment (12 SNPs tested). SNPs included in the final prediction model are printed in bold (see Table 2). All SNPs were analysed using an additive allelic model on the logit scale.

Supplementary Table 3b: Combined rs12913832/rs1800407 genotype and blue eye colour (stage 2)

rs12913832
(HERC2) / rs1800407
(OCA2) / N / Probability (blue eyes)a / Prevalence (blue eyes)
observed / expectedb
GG / AA / 0 / 1.00 / 0 / 0
GA / 6 / 0.98 / 5 / 5.9
GG / 70 / 0.90 / 61 / 63.0
GA / AA / 0 / 0.84 / 0 / 0
GA / 3 / 0.52 / 2 / 1.6
GG / 19 / 0.18 / 5 / 3.4
AA / AA / 0 / 0.12 / 0 / 0
GA / 0 / 0.027 / 0 / 0
GG / 2 / 0.0055 / 0 / 0.0

aConditional probability of blue eye colour, given the respective genotype, as calculated from the prediction model derived in stage 1 (see Table 2).bThe expected prevalence equals the product of the genotype prevalence (N) and the probability of blue eyes.

Supplementary Table 4: Genotype distribution,by eye colour,of the 12 candidate SNPs(all 400 individuals of stage 1 and 2)

SNP-ID
gene / blue (n=287) / green (n=52) / brown (n=61)
G1 / G2 / G3 / A1 / A2 / G1 / G2 / G3 / A1 / A2 / G1 / G2 / G3 / A1 / A2
rs12913832 HERC2 / GG
258
0.90 / GA
29
0.10 / AA
0
0 / G
545
0.95 / A
29
0.051 / GG
26
0.5 / GA
26
0.5 / AA
0
0 / G
78
0.75 / A
26
0.25 / GG
5
0.082 / GA
46
0.75 / AA
10
0.16 / G
56
0.46 / A
66
0.54
rs916977
HERC2 / GG
270
0.94 / GA
17
0.059 / AA
0
0 / G
557
0.97 / A
17
0.030 / GG
39
0.75 / GA
13
0.25 / AA
0
0 / G
91
0.88 / A
13
0.13 / GG
27
0.44 / GA
30
0.49 / AA
4
0.066 / G
84
0.69 / A
38
0.31
rs1800407 OCA2 / GG
261
0.91 / GA
25
0.087 / AA
1
0.0035 / G
547
0.95 / A
27
0.047 / GG
47
0.90 / GA
5
0.096 / AA
0
0 / G
99
0.95 / A
5
0.048 / GG
52
0.85 / GA
9
0.15 / AA
0
0 / G
113
0.93 / A
9
0.074
rs7495174
OCA2 / AA
284
0.99 / AG
3
0.010 / GG
0
0 / A
571
0.99 / G
3
0.0052 / AA
45
0.87 / AG
7
0.13 / GG
0
0 / A
97
0.93 / G
7
0.067 / AA
39
0.64 / AG
22
0.36 / GG
0
0 / A
100
0.82 / G
22
0.18
rs4778138
OCA2 / AA
247
0.86 / AG
39
0.14 / GG
1
0.0035 / A
533
0.93 / G
41
0.071 / AA
37
0.71 / AG
15
0.29 / GG
0
0 / A
89
0.86 / G
15
0.14 / AA
28
0.46 / AG
30
0.49 / GG
3
0.049 / A
86
0.70 / G
36
0.30
rs4778241
OCA2 / CC
239
0.84 / CA
46
0.16 / AA
1
0.0035 / C
524
0.92 / A
48
0.084 / CC
32
0.62 / CA
19
0.37 / AA
1
0.019 / C
83
0.80 / A
21
0.20 / CC
27
0.44 / CA
30
0.49 / AA
4
0.066 / C
84
0.69 / A
38
0.31
rs1805007
MC1R / CC
249
0.87 / CT
33
0.11 / TT
5
0.017 / C
531
0.93 / T
43
0.075 / CC
42
0.81 / CT
9
0.17 / TT
1
0.019 / C
93
0.89 / T
11
0.11 / CC
51
0.84 / CT
9
0.15 / TT
1
0.016 / C
111
0.91 / T
11
0.090
rs1805008
MC1R / CC
232
0.81 / CT
50
0.17 / TT
5
0.017 / C
514
0.90 / T
60
0.10 / CC
45
0.87 / CT
7
0.13 / TT
0
0 / C
97
0.93 / T
7
0.067 / CC
45
0.74 / CT
14
0.23 / TT
2
0.033 / C
104
0.85 / T
18
0.15
rs1805009
MC1R / GG
279
0.97 / GC
8
0.028 / CC
0
0 / G
566
0.99 / C
8
0.014 / GG
51
0.98 / GC
1
0.019 / CC
0
0 / G
103
0.99 / C
1
0.0096 / GG
61
1 / GC
0
0 / CC
0
0 / G
122
1 / C
0
0
rs12203592
IRF4 / CC
240
0.84 / CT
45
0.16 / TT
2
0.0070 / C
525
0.91 / T
49
0.085 / CC
46
0.88 / CT
6
0.12 / TT
0
0 / C
98
0.94 / T
6
0.058 / CC
55
0.90 / CT
5
0.082 / TT
1
0.016 / C
115
0.94 / T
7
0.057
rs12896399 SLC24A4 / GG
93
0.33 / GT
134
0.47 / TT
59
0.21 / G
320
0.56 / T
252
0.44 / GG
20
0.38 / GT
22
0.42 / TT
10
0.19 / G
62
0.60 / T
42
0.40 / GG
28
0.46 / GT
25
0.41 / TT
8
0.13 / G
81
0.66 / T
41
0.34
rs1393350 TYR / GG
158
0.55 / GA
118
0.41 / AA
11
0.038 / G
434
0.76 / A
140
0.24 / GG
31
0.60 / GA
18
0.35 / AA
3
0.058 / G
80
0.77 / A
24
0.23 / GG
34
0.56 / GA
21
0.34 / AA
6
0.098 / G
89
0.73 / A
33
0.27

1

Supplementary Table 5:Genotype-phenotype association analysis of red tint in hair colour

Supplementary Table 5a: Association between SNP genotype and red tint (stage 1)

SNP-ID (gene) / Alleles / Regression p value
ref. / fav. / simple / multiple§
rs12913832 (HERC2) / - / - / n.s. / n.s.
rs916977 (HERC2) / - / - / n.s. / n.s.
rs1800407 (OCA2) / - / - / n.s. / n.s.
rs7495174 (OCA2) / - / - / n.s. / n.s.
rs4778138 (OCA2) / - / - / n.s. / n.s.
rs4778241 (OCA2) / - / - / n.s. / n.s.
rs1805007 (MC1R) / C / T / <1.010-5 / <1.010-5 (1.210-4)
rs1805008 (MC1R) / C / T / 6.010-4 / 1.010-4(0.0012)
rs1805009 (MC1R) / - / - / n.s. / n.s.
rs12203592 (IRF4) / - / - / n.s. / n.s.
rs12896399 (SLC24A4) / - / - / n.s. / n.s.
rs1393350 (TYR) / - / - / n.s. / n.s.

ref.: reference allele, fav.: favourable allele, n.s.: not significant. §Figures in brackets are p values after Bonferroni adjustment (12 SNPs tested). SNPs included in the final prediction model are printed in bold (see Table 2). The p value for the interaction between rs1805007 and rs1805008 equalled 0.020 (0.24). All SNPs were analyzed using an additive allelic model on the logit scale.

Supplementary Table 5b: Combined rs1805007/rs1805008 genotype andred tint in hair colour (stage 2)

rs1805007
(MC1R) / rs1805008
(MC1R) / N / Probability
(red tint)$ / Prevalence (red tint)
observed / expected
CC / CC / 62 / 0.14 / 10 / 8.68
CT / 22 / 0.36 / 9 / 7.92
TT / 4 / 0.67 / 3 / 2.68
CT / CC / 9 / 0.47 / 6 / 4.23
CT / 1 / 0.76 / 1 / 0.76
TT / 0 / 0.92 / 0 / 0
TT / CC / 2 / 0.82 / 2 / 1.64
CT / 0 / 0.94 / 0 / 0
TT / 0 / 0.98 / 0 / 0

$Conditional probability of red tint, given the respective genotype, as calculated from the prediction model derived in stage 1 (see Table 2). The expected prevalence equals the product of the genotype prevalence (N) and the probability of red tint.