Gene Expression Marker (GEM) detection and genotyping in a RIL population using microarray data
This method was used successfully to construct a haplotype map for the Arabidopsis Bay-0 × Shahdara RIL population (West et al. 2006, Genome Research). All scripts (Affy_ELP_Translator_V017_Numeric.py, Python_MadMapper_V248_RECBIT_007.py, and py_matrix_2D_V248_RECBIT.py), example input and output files, and detailed instructions can be found on our Expression Level Polymorphism Project web site (

From genes with non-overlapping parental distributions, the maximum expression value for the lower-expressing genotype (Max), and the minimum expression value for the higher-expressing genotype (Min) are determined.

To minimize the number missing data scores when genotyping the RILs, an adjustment is performed of the distributions used for allele assignments by utilizing a “slicing” scheme.The custom Python script Affy_ELP_Translator_V017_Numeric.pyis used to perform the slicing scheme and the allele assignments. Five input files are used for this script:
1. probe intensity range:
7-th and 8-th columns in this file have Min and Max values for Bay parental microarrays; 9-th and 10-th columns have Min and Max values for Sha parental microarrays.

2. affy ID - ATH ID conversion:
first column - Affymetrix probe set ID; second column - Arabidopsis gene ID.

3. expression values:
scaled gene expression values for RILs and four (2 Bay + 2 Sha) parental microarrays. Affymetrix probe sets are in rows; chip IDs are in columns.

4. genotyping data for MS molecular markers:
genotyping data for microsatellite molecular markers - RIL IDs are in rows; marker IDs are in columns.

5. RIL keys (conversion RIL ID - Chip ID):
RIL keys: first column - numerical order; second column - RIL ID; third column - chip ID of biologial replicate 1; chip ID of biological replicate 2.

The *_master.tab output file with genotyping scores should be modified into a Master locus file by removing the last 4 columns with genotyping data corresponding to the two parental accessions.

The Master locus file is processed by the Python MadMapper program (Python_MadMapper_V248_RECBIT_007.py) to filter the dataset. Markers are removed if they have >10% missing data, or if they display pronounced allele distortion (>1:3; the expectation of allele segregation in a RIL population is 1:1). This Python program generates a "clean" locus file with the 188 GEMs which can be used for further mapping studies.

The genotype scores of the RILs for the five linkage groups can be used to calculate pairwise distances between markers using the MadMapper software (Python_MadMapper_V248_RECBIT_007.py). The CheckMatrix software (py_matrix_2D_V248_RECBIT.py) can then be used to create a graphical genotyping map and a heat map of linkage values.

Detailed instructions, plus example input files and output files, and all scripts can be found on our Expression Level Polymorphism Project web site(