Effect Size Rankings for Population-Based Cohorts

Written by Derrek Hibar (), Theo van Erp () and Lianne Schmaal ()

for the ENIGMA-Major Depressive Disorder Working Group

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

Effect Size Rankings for Population-based Cohorts

ENIGMA-MDD Working Group

December 12, 2013

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

You will need three files to run the association analysis (described below). We recommend you keep these files in your working directory. Please, make sure to have exactly the same header labels and in the same order as shown in the example files so that the commands used in this protocol need not to be changed:

LandRvolumes.csv, which contains your imaging phenotypes (after quality control) for the entire sample.

· Make sure that missing values and individual volume measures that were excluded from the analysis during QC in the LandRvolumes.csv are coded as “NA” without the quotes. Note that we originally suggested marking these values with an “x” in the imaging protocol. The following R script handles excluded values better if they are marked with NA. Please do a “find and replace” in your favorite text editor for “x” and replace it with “NA” (again all without quotes).

Covariates.csv, which contains the full set of variables we will be controlling for in each of the models. Note that missing values for any of these covariates is not allowed. You will need to make this file.

o In Excel, or your favorite text editor, create a series of columns with the following headers: SubjID, Dx, Age, Sex, Site1, Site2, ...

o The IDs in the SubjID column should match the format of the IDs in the SubjID column in the LandRvolumes.csv file.

o Dx should be a column for diagnosis (where patients are coded as 1 and controls are coded as 0).

o Age is in years at time of scan

o Sex (Males=1, Females=2).

o The Site# columns are optional and only required if your sample requires correction for data collection at multiple sites. If you have no site variables to include, do not include any Site# columns. If you do have sites to include as covariates in the model, we will be correcting using dummy variables where the total number of different Site# columns is equal to (n-1) such that n is the total number of sites coded as dummy variables. For example, if data were collected at four different data centers then we would include 3 columns of dummy variables to indicate site (Site1, Site2, Site3).

o Recur refers to whether the MDD patient has a recurrent or first depressive episode. Code: recurrent patients=2, first episode patients=1, healthy controls=0.

o AD refers to current antidepressant use. Code: antidepressant users=2, antidepressant free MDD patients=1, healthy controls=0.

o Rem refers to whether the MDD patient is acutely depressed (6 month recency) or in remission. Code: acutely depressed=2, remitted patients=1, healthy controls=0.

o AO is age of onset in years, i.e. age at first depressive episode.

o Sev is severity in number of DSM-IV MDD criteria met (on basis of DSM-IV interview ranging from 0-9).

Missing values in the covariate columns are not allowed. If some variables are not applicable (e.g. remission), please code them anyway (for example for Rem: code all patients 2 if there are no remitted but only acutely depressed patients).

Beware that the columns in your csv files are separated by commas (,) not by semicolons (;). Check by opening csv files with texteditor or wordpad, replace all “;” by “,” and save file.

Run the analysis in R: The scripts below assume that you are calling R from the same location as your LandRvolumes.csv and Covariates.csv. If you are on Mac/Linux you can just cd to the directory on the command-line and then call R there and copy/paste the code below. On a PC with Windows you can open the R gui and use the setwd() command and give it the path to the folder that contains your files. For example, setwd("C:\\User\\Enigma\\") and then copy/paste the commands below.

At this point your LandRvolumes.csv and ventralDC.csv likely contains your freesurfer path as well as your SubjectIDs in the first column.

more LandRvolumes.csv # this will show something like this:

/raid/fbirn_phase3/freesurfer/000315778401,7676,8963,8773,7977,4120,4029,6153,5671,1712,1843,4399,4403,1407,1446,540,742,1.49191e+06

….

First we’ll get rid off the Freesurfer path using the commands below.

cp LandRvolumes.csv LandRvolumes.csv.orig

cat LandRvolumes.csv.orig | awk -F/ ' { print $NF } ' > LandRvolumes.csv

cp ventralDC.csv ventralDC.csv.orig

cat ventralDC.csv.orig | awk -F/ ' { print $NF } ' > ventralDC.csv

more LandRvolumes.csv # this will now show something like this:

000315778401,7676,8963,8773,7977,4120,4029,6153,5671,1712,1843,4399,4403,1407,1446,540,742,1.49191e+06

….

The following R script will produce mean, sd and range values of the subcortical volumes and ICV, demographic characteristics and runs the regression analyses for the whole sample and for each subgroup.

If your sample only includes first episode patients and not recurrent patients or vice versa, skip ONLY the regression analysis with numbers 2-4 (page 11 to 14) and number 13-16 (page 22 to 25).

If your sample only includes patients in remission and not acutely depressed patients or vice versa, skip ONLY the regression analysis with numbers 5-7 (page 14 to 17).

If your sample only includes antidepressant free patients and no patients on antidepressants or vice versa, skip ONLY the regression analysis with numbers 8-10 (page 17 to 20).

If you don’t have data for age of onset, skip ONLY the regression analysis with numbers 11 (page 20), 13-14 (page 22-24) and 17-22 (page 26-32).

If you don’t have data for severity (sum of 9 DSM-IV item scores), skip ONLY the regression analysis numbers 12 (page 21-22) and 15-16 (page 24-25).

# Before running the code the first time,

# you need to download & install the lsmeans package,

# using the install.packages command. The R "library" or "require" commands

# can then be used to make the package available within R.

# If you have write access to the directory where R is installed, you

# can just type 'install.packages("lsmeans")', if you do not you will

# want to specify a library location (lib.loc), see below.

# either

install.packages("lsmeans")

library(lsmeans)

# or

# install.packages("lsmeans", lib="/YourRlibDir")

# library("lsmeans",lib.loc="/YourRlibDir")

# Example:

# install.packages("lsmeans", lib="/master_raid/users/tvanerp/R")

# library("lsmeans",lib.loc="/master_raid/users/tvanerp/R")

# When you type 'search()', you should see "package:lsmeans" among the installed packages.

SubCort <- read.csv("LandRvolumes.csv", header=T);

# Check for duplicated SubjIDs that may cause issues with merging data sets.

if(anyDuplicated(SubCort[,c("SubjID")]) != 0) { stop('You have duplicate SubjIDs in your LandRvolumes.csv file.\nMake sure there are no repeat SubjIDs.') }

Covs <- read.csv("Covariates.csv",header=T);

# Check for duplicated SubjIDs that may cause issues with merging data sets.

if(anyDuplicated(SubCort[,c("SubjID")]) != 0) { stop('You have duplicate SubjIDs in your Covariates.csv file.\nMake sure there are no repeat SubjIDs.') }

n.covs <- ncol(Covs) - 1; #Total number of covariates, -1 removes the SubjectID column

n.sites <- n.covs - 8; #Find the number of site variables, subtract for Dx, Age, Sex, Recur, AD, Rem, AO, Sev from n.covs

n.sites #should return the correct number of Site variables that you included in your Covs.csv file

SubCort$Mvent <- rowMeans(SubCort[,c("LLatVent","RLatVent")]); #calculate mean Ventricle

SubCort$Mthal <- rowMeans(SubCort[,c("Lthal","Rthal")]); #calculate mean Thalamus

SubCort$Mcaud <- rowMeans(SubCort[,c("Lcaud","Rcaud")]); #calculate mean Caudate

SubCort$Mput <- rowMeans(SubCort[,c("Lput","Rput")]); #calculate mean Putamen

SubCort$Mpal <- rowMeans(SubCort[,c("Lpal","Rpal")]); #calculate mean Pallidum

SubCort$Mhippo <- rowMeans(SubCort[,c("Lhippo","Rhippo")]); #calculate mean Hippocampus

SubCort$Mamyg <- rowMeans(SubCort[,c("Lamyg","Ramyg")]); #calculate mean Amygdala

SubCort$Maccumb <- rowMeans(SubCort[,c("Laccumb","Raccumb")]); #calculate mean Accumbens

SubCort$MventDC <- rowMeans(SubCort[,c("Left.VentralDC","Right.VentralDC")]); #calculate mean ventral DC

#Order the SubCort file, we want ICV at the end

SubCort_ordered = SubCort[,c("SubjID", "LLatVent", "RLatVent", "Lthal", "Lcaud", "Lput",

"Lpal", "Lhippo", "Lamyg", "Laccumb", "Left.VentralDC", "Rthal", "Rcaud", "Rput", "Rpal", "Rhippo", "Ramyg", "Raccumb",

"Right.VentralDC", "Mvent", "Mthal", "Mcaud", "Mput", "Mpal", "Mhippo", "Mamyg", "Maccumb", "MventDC", "ICV")]

merged_ordered = merge(Covs, SubCort_ordered, by="SubjID");

#Check that the number of rows after merging is the same

if(nrow(SubCort) != nrow(merged_ordered)){

cat('WARNING: The LandRvolumes.csv and Covariates.csv have non-matching SubjIDs.','\n')

cat('Please make sure the number of subjects in your merged data set are as expected.','\n')

cat('The number of SubjIDs in LandRvolumes.csv is: ',nrow(SubCort),'\n')

cat('The number of SubjIDs in the merged_ordered data set is: ',nrow(merged_ordered),'\n')

}

#Get raw means for each of the structures

raw.means=colMeans(merged_ordered[(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans.Rdata")

#Get raw means for each of the structures for the patients

patients=which(merged_ordered$Dx==1)

raw.means=colMeans(merged_ordered[patients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[patients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[patients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[patients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[patients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_patients.Rdata")

#Get raw means for each of the structures for the controls

controls=which(merged_ordered$Dx==0)

raw.means=colMeans(merged_ordered[controls,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[controls,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[controls,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[controls,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[controls,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_controls.Rdata")

#Get raw means for each of the structures for the recurrent patients

recurpatients=which(merged_ordered$Recur==2)

raw.means=colMeans(merged_ordered[recurpatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the recurrent patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[recurpatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[recurpatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[recurpatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[recurpatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_recurpatients.Rdata")

#Get raw means for each of the structures for the first episode patients

firstpatients=which(merged_ordered$Recur==1)

raw.means=colMeans(merged_ordered[firstpatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the first eisode patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[firstpatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[firstpatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[firstpatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[firstpatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_firstpatients.Rdata")

#Get raw means for each of the structures for the acutely depressed patients

acutepatients=which(merged_ordered$Rem==2)

raw.means=colMeans(merged_ordered[acutepatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the acutely depressed patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[acutepatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[acutepatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[acutepatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[acutepatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_acutepatients.Rdata")

#Get raw means for each of the structures for the remitted patients

rempatients=which(merged_ordered$Rem==1)

raw.means=colMeans(merged_ordered[rempatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the remitted patients

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[rempatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[rempatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[rempatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[rempatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_rempatients.Rdata")

#Get raw means for each of the structures for the antidepressant users

ADpatients=which(merged_ordered$AD==2)

raw.means=colMeans(merged_ordered[ADpatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the antidepressant users

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[ADpatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[ADpatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[ADpatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[ADpatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_ADpatients.Rdata")

#Get raw means for each of the structures for the non-antidepressant users

noADpatients=which(merged_ordered$AD==1)

raw.means=colMeans(merged_ordered[noADpatients,(ncol(Covs)+1):ncol(merged_ordered)], na.rm=T)

#Get raw sd and number of subjects included for each of the structures for the non-antidepressant users

sd.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

n.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

min.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

max.raw=rep(NA,(ncol(merged_ordered)-ncol(Covs)))

for(z in (ncol(Covs)+1):ncol(merged_ordered)){

sd.raw[z-ncol(Covs)]=sd(merged_ordered[noADpatients,z], na.rm=T)

n.raw[z-ncol(Covs)]=length(merged_ordered[which(!is.na(merged_ordered[noADpatients,z])),z])

min.raw[z-ncol(Covs)]=min(merged_ordered[noADpatients,z], na.rm=T)

max.raw[z-ncol(Covs)]=max(merged_ordered[noADpatients,z], na.rm=T)

}

#Save raw values

save(raw.means, sd.raw, n.raw, min.raw, max.raw, file="RawMeans_noADpatients.Rdata")

#Get demographics

age.mu=mean(merged_ordered$Age) #raw age mean

age.sd=sd(merged_ordered$Age) #raw age sd

age.range=range(merged_ordered$Age) #raw age range

age.mu.dx0=mean(merged_ordered$Age[which(merged_ordered$Dx==0)]) #age mean for ctls

age.sd.dx0=sd(merged_ordered$Age[which(merged_ordered$Dx==0)]) #age sd for ctls

age.range.dx0=range(merged_ordered$Age[which(merged_ordered$Dx==0)]) #age range for ctls

age.mu.dx1=mean(merged_ordered$Age[which(merged_ordered$Dx==1)]) #age mean for patients

age.sd.dx1=sd(merged_ordered$Age[which(merged_ordered$Dx==1)]) #age sd for patients

age.range.dx1=range(merged_ordered$Age[which(merged_ordered$Dx==1)]) #age range for patients

age.mu.recur=mean(merged_ordered$Age[which(merged_ordered$Recur==2)]) #age mean for recurrent patients

age.sd.recur=sd(merged_ordered$Age[which(merged_ordered$Recur==2)]) #age sd for recurrent patients

age.range.recur=range(merged_ordered$Age[which(merged_ordered$Recur==2)]) #age range for recurrent patients

age.mu.first=mean(merged_ordered$Age[which(merged_ordered$Recur==1)]) #age mean for first episode patients

age.sd.first=sd(merged_ordered$Age[which(merged_ordered$Recur==1)]) #age sd for first episode patients

age.range.first=range(merged_ordered$Age[which(merged_ordered$Recur==1)]) #age range for first episode patients

age.mu.acute=mean(merged_ordered$Age[which(merged_ordered$Rem==2)]) #age mean for acutely depressed patients

age.sd.acute=sd(merged_ordered$Age[which(merged_ordered$Rem==2)]) #age sd for acutely depressed patients

age.range.acute=range(merged_ordered$Age[which(merged_ordered$Rem==2)]) #age range for acutely depressed patients

age.mu.rem=mean(merged_ordered$Age[which(merged_ordered$Rem==1)]) #age mean for remitted patients

age.sd.rem=sd(merged_ordered$Age[which(merged_ordered$Rem==1)]) #age sd for remitted patients

age.range.rem=range(merged_ordered$Age[which(merged_ordered$Rem==1)]) #age range for remitted patients

age.mu.AD=mean(merged_ordered$Age[which(merged_ordered$AD==2)]) #age mean for antidepressant users

age.sd.AD=sd(merged_ordered$Age[which(merged_ordered$AD==2)]) #age sd for antidepressant users

age.range.AD=range(merged_ordered$Age[which(merged_ordered$AD==2)]) #age range for antidepressant users

age.mu.noAD=mean(merged_ordered$Age[which(merged_ordered$AD==1)]) #age mean for non-antidepressant users

age.sd.noAD=sd(merged_ordered$Age[which(merged_ordered$AD==1)]) #age sd for non-antidepressant users

age.range.noAD=range(merged_ordered$Age[which(merged_ordered$AD==1)]) #age range for non-antidepressant users

AO.mu.dx1=mean(merged_ordered$AO[which(merged_ordered$Dx==1)]) #age of onset MDD mean for patients

AO.sd.dx1=sd(merged_ordered$AO[which(merged_ordered$Dx==1)]) #age of onset MDD sd for patients

AO.range.dx1=range(merged_ordered$AO[which(merged_ordered$Dx==1)]) #age of onset MDD range for patients

AO.mu.first=mean(merged_ordered$AO[which(merged_ordered$Recur==1)]) #age of onset MDD mean for first episode patients

AO.sd.first=sd(merged_ordered$AO[which(merged_ordered$Recur==1)]) #age of onset MDD sd for first episode patients

AO.range.first=range(merged_ordered$AO[which(merged_ordered$Recur==1)]) #age of onset MDD range for first episode patients

AO.mu.recur=mean(merged_ordered$AO[which(merged_ordered$Recur==2)]) #age of onset MDD mean for recurrent patients

AO.sd.recur=sd(merged_ordered$AO[which(merged_ordered$Recur==2)]) #age of onset MDD sd for recurrent patients

AO.range.recur=range(merged_ordered$AO[which(merged_ordered$Recur==2)]) #age of onset MDD range for recurrent patients

AO.mu.AD=mean(merged_ordered$AO[which(merged_ordered$AD==2)]) #age of onset MDD mean for first episode patients

AO.sd.AD=sd(merged_ordered$AO[which(merged_ordered$AD==2)]) #age of onset MDD sd for first episode patients

AO.range.AD=range(merged_ordered$AO[which(merged_ordered$AD==2)]) #age of onset MDD range for first episode patients

AO.mu.noAD=mean(merged_ordered$AO[which(merged_ordered$AD==1)]) #age of onset MDD mean for recurrent patients