Cell type data used in the paper “Hattab, M.W., A.A. Shabalin, S.L. Clark, M. Zhao, G. Kumar, R.F. Chan, L.Y. Xie, R. Jansen, L.K.M. Han, K. P. Magnusson, G. van Grootheest, C.M. Hultman, B.W.J.H. Penninx, K.A. Aberg, and E.J.C.G van den Oord. 2016. "Correcting for cell-type effects in DNA methylation studies: Reference-based method outperforms latent variable approaches in empirical studies." Genome Biology”.

Samples

Data includes 30 reference methylomes generated using samples from six US subjects:

Subject / Gender / Age
1 / F / 21-35
2 / F / 36-50
3 / F / 51-65
4 / M / 21-35
5 / M / 36-50
6 / M / 51-65

Cell populations were isolated by positive selection using EasySep™ kits (Stemcell technologies) that applies magnetic nanoparticles coated with antibodies against a particular surface antigen (CD molecules). Specifically, we used CD3, CD19, CD20, CD14, and CD15 to isolate the common cell-types in blood from human subjects. The purity of the sorted cells was confirmed using a fluorescein (FITC)-conjugated sheep anti-mouse secondary antibody (Stemcell technologies) followed by flow cytometry with BD FACSAria II (BD Biosciences). Purity ranged from 94.5 to 99.6%.

An optimized methyl-CG binding domain enrichment sequencing (MBD-seq) protocol was used to assay the methylome

File content

To uncompress type: gunzip CellType_Means.csv.gz

The file CellType_Means.csv has all necessary information to estimate cell type proportions in a target data set assayed with MBD-seq by Houseman’s method (Houseman, E.A., et al., DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics, 2012. 13: p. 86.)

File columns are:

Chr = chromosome of CpG

Coord = coordinate of CpG

CD3 = DNA from CD3 positive cells

CD14 = DNA from CD14 positive cells

CD15 = DNA from CD15 positive cells

CD19 = DNA from CD19 positive cells

CD20 = DNA from CD20 positive cells

P.value = P value testing the null hypothesis that cell have similar methylation

Entries are mean CpG-coverage estimates across all 6 samples, which is a measure of the methylation status of that specific location. Coverage estimates were generated using a bioconductor package called RaMWAS following a previously described approach uniquely developed for MBD-seq data. van den Oord EJ, Bukszar J, Rudolf G, Nerella S, McClay JL, et al. (2013) Estimation of CpG coverage in whole methylome next-generation sequencing studies. BMC Bioinformatics 14: 50.

Laboratory protocol

Genomic DNA was extracted using the Gentra Puregene Blood/Tissue Kit (Qiagen, Valencia, CA) following the vendor’s instructions. The DNA was fragmented into a median length of 150 base pairs using ultrasonication with the Covaris S2 instrument (Covaris, Woburn, MA, USA). The amount of starting material for each methylation enrichment reaction was 1.5 µg. Methylated fragments were extacted MethylMiner - single elution with 500mM buffer

The methylation-enriched DNA from MethylMiner was used as input materials for barcoded fragment libraries for NextSeq 500 deep sequencing. Each library was sequenced using single-end chemistry at 75 bp read length.

Human samples were aligned to genomebuild hg19/GRCh37 using bowtie2