SupplementaryMaterial

We provideda simulated data set and two SAS/IML programs to the readers for demonstrating and testing the new procedures of whole genome evaluation. To analyze your own data, slight modification of the programs is required. The necessary modification can be found in the comments of each program code.

The Data Sets

The simulated data set:

“trait.txt” –

This is the phenotype data stored in txt filewith the comma delimited format. The example shows the primary dataset of the format, which has 2 variables and 360 observations. The 1st column stores the individual identification numbers (numerical variable) and the 2ndcolumngives the phenotype values of the trait. This is an input file.

“IBD.txt” –

This is the IBD (identity by descent) data in txt file with the comma delimited format. The sample data is stored in anmatrix, where is the number of sib-pairs and is the number of markers. The 1st and 2ndcolumnarethe identification numbers(numerical variables) of sib-pairs. This is an input file.

“map.txt” –

This is the linkage map data of marker lociin txt file with the comma delimited format.A valid map dataset must always contain the three variables in the following order: chromosome, marker-name and marker-position. Marker-name is a character variable. The position of a marker must be numerical and measured in centiMorgan (cM).This is an input file.

The SAS Programs

The ML Program:

“ML_sas” –

This is the program implementing the maximum likelihood (ML) method. The program takes “trait.txt”, “map.txt” and “IBD.txt” as the input files and generates an output file named “QTLEffect_Position.xls”.

“QTLEffect_Position.xls” –

This is an output file generated by the program. The file has fiverows, each explained as follows,

Row 1: the order of the estimated QTL,

Row2: the identification of chromosome that carries the estimated QTL.

Row 3: the estimated QTL positions,

Row 4: the estimated QTL variances,

Row 5: AIC value, number of iterations, convergence error, mean of the population mean and residual variance of the trait.

The Bayesian Program:

“Bayesian_sas” -

This is the program implementing theBayesian method. The program takes the three input files described earlier and generates output files named“QTLMCMC_Effect.xls” and “QTLMCMC_Position.xls”.

“QTLMCMC_Effect.xls” stores the posterior sample for the simulated population mean (first column), the residual variance (second column) and the QTL variance components (remaining columns).

“QTLMCMC_Position.xls” stores the posterior sample for the simulated QTL positions. Identifications of chromosomes that carry the simulated QTL are given in the second row of the file.

You need to write your own programs to conduct the post MCMC analysis. The average value of each variable is the posterior mean (point estimate) and variance of each variable is the posterior variance (square root of the variance is the posterior standard deviation).