List of Supplemental Digital Content

List of Supplemental Digital Content:

Supplemental Digital Content 1

Supplemental methods

Table S1. Descriptive statistics for each module identified using WGCNA.

Figure S1: Stacked bar charts indicating the relative abundances of the most abundant taxa detected in the gut microbiomes of PTSD subjects and trauma-exposed (TE) controls.

Figure S2. Alpha diversity measures of gut microbial communities of PTSD subjects and trauma-exposed (TE) controls.

Figure S3. Principal coordinate analysis (PCoA) plots of Bray-Curtis compositional dissimilarity (on genus level) of microbiota communities among PTSD participants and trauma-exposed (TE) controls.

Figure S4. Distance comparison plots with box plots illustrating weighted and unweighted UniFrac distances within and between sample groupings using the β-diversity distance matrix.

Figure S5. Biplot illustrating genus-level gut microbial community analysis and composition analysis using principal component analysis of centre log ratio-transformed and standardized data from 18 PTSD participants and 12 trauma-exposed (TE) controls.

Figure S6. Dendrogram displaying module assignment within weighted gene co-expression network analysis (WGCNA).

Figure S7. Bar chart illustrating predicted functional genomic content of the microbiota in PTSD participants and trauma-exposed (TE) controls.

Figure S8. Partial plots of the random forests predicted values for the estimated probability of PTSD from the random forest versus the relative abundance of the three phyla identified as important for distinguishing PTSD status.

Figure S9. Relationship between the random forests prediction model, relative abundance of [Actinobacteria, Verrucomicrobia] and PTSD scores (CAPS Total Score). PTSD was negatively associated with the relative abundance of Actinobacteria and Verrucomicrobia phyla

Figure S10. Relationship between the random forests prediction model, relative abundance of [Actinobacteria,Verrucomicrobia] and childhood trauma scores (Childhood Trauma Questionnaire (CTQ), Total Score).

Supplemental Digital Content 1

Supplemental methods

Clinical and metabolic measures

Lifetime trauma exposure was assessed with the Life Events Checklist for DSM-5 (LEC-5) (64). The LEC-5 assesses for exposure to 16 types of events that often meet criteria for a traumatic stressor and includes one additional item for any other potentially traumatic events not covered in the first 16 items. For each of the 17 items, the LEC-5 specifies whether it happened to the individual personally, whether they witnessed it happen to someone else, whether they learned about it happening to someone close to them or whether they were exposed to the event in the course of their occupation. Participants identify the traumatic experience that is still affecting them the most and this index trauma is used to assess for the symptoms of current posttraumatic stress disorder (PTSD).

All participants were evaluated for current and lifetime psychiatric and substance use disorders using the MINI International Neuropsychiatric Interview, version 6.0 (MINI) (63), which is a clinician-administered short, semi-structured diagnostic interview that takes about 15 minutes to administer.

Participants with metabolic syndrome (MetS) were excluded to eliminate additional confounding factors that could influence gut microbiome results. We assessed for the presence of MetS based on the harmonized Joint Interim Statement (JIS) criteria (110), with three or more out of five criteria indicating the presence of MetS. The criteria used were (i) raised waist circumference according to population- and country-specific definitions (men and women ≥ 90 cm) based on a recent validation study in the population assessed (111); (ii) raised triglycerides (≥ 1.7 mmol/l); (iii) low high-density lipoprotein cholesterol (HDL-C) (men <1.0 mmol/l, women <1.3 mmol/l); (iv) elevated blood pressure (≥130/85 mmHg) or on antihypertensive treatment; and (v) raised fasting glucose (≥5.6 mmol/l) or on treatment for diabetes (97).

Microbiome analyses

Microbial DNA was extracted from 1.4 ml stabilized stool (stool specimen homogenized in stool DNA stabilizing buffer) using the PSP® Spin Stool DNA Plus Kit (STRATEC Molecular, Birkenfeld, Germany) according to the manufacturer’s protocol 2 (“Isolation of total DNA from 1.4 ml stabilized stool homogenate with enrichment of bacterial DNA”). This protocol includes an initial cell lysis at 95 °C for 10 minutes to increase efficiency of DNA extraction of gram-positive bacteria. Quality and integrity of the DNA was evaluated on the NanoDrop2000c (ThermoFisher Scientific, Wilmington, DE, USA) and the Quant-iT™ PicoGreen® dsDNA Assay Kit with the Quantus™ Fluorometer (ThermoFisher Scientific). A minimum of 20 ng of purified DNA was used for the library preparation. The 16S ribosomal RNA (rRNA) gene amplicons were generated for the V3 and V4 regions of the 16S rRNA bacterial gene, which were recommended by Klindworth et al. (69). Illumina adapter overhang nucleotide sequences were added to the gene‐specific sequences. The full-length primer sequences targeting this region were

341 forward primer (5’-CCTACGGGNGGCWGCAG-3’) and

785 reverse primer (5’-GACTACHVGGGTATCTAATCC-3’)

Libraries were prepared using the 16S Metagenomic Sequencing Library Preparation Kit from Illumina, according to the manufacturer’s instructions. After amplification of the V3 and V4 region and, using a limited cycle PCR, adding Illumina sequencing adapters and dual-index barcodes to the amplicon target, amplicon sizes were measured using an Agilent Technologies 2100 Bioanalyzer trace (Agilent, Part # G2940CA). Library quantification, normalization, and pooling were performed according to the manufacturer’s specifications. Libraries were sequenced using multiplexed Illumina HiSeq paired-end 100 base pair sequencing according to the manufacturer’s instructions. Base calling was performed and FASTQ sequence reads generated using Illumina Casava Pipeline 1.8.2. Initial quality assessment was based on data passing the Illumina Chastity filter. Subsequently, reads containing adaptors and/or PhiX control signal were removed. The second quality assessment was based on the remaining reads using the FASTQC quality control tool version 0.10.0.

Taxonomy from DNA sequences

The operational taxonomic unit (OTU) table was prepared using QIIME v. 1.9 (70). Forward reads were demultiplexed using default parameters with a minimum quality score threshold set to 25. Following this step, 1,738,164 of 1,959,124 HiSeq reads passed quality control. These reads were assigned to OTUs using the closed-reference OTU picking method with Greengenes 97% reference database (Aug 13) (71). 1,690,568 reads mapped to the Greengenes database, identifying a total of 7,556 OTUs. The reference Greengenes taxonomy assignments and phylogenetic tree were used for all analysis where a tree or taxonomy assignment was required. The samples were rarefied at 30,000 sequences per person and then filtered to retain only OTUs that were present in at least 20% of the participants. This resulted in a total of 1,565 OTUs that were used in subsequent analyses of microbial diversity and taxonomic abundances.

Microbial diversity analyses

For analysis of α- and β-diversity, the rarefied and filtered data set, i.e., excluding OTUs that were present in <20% of participants, generating a raw OTU table containing 1,565 OTUs, was used. While rarefaction is a conservative approach that limits power for discovery of differences, it is essential to correct for different library sizes for α- and β-diversity analysis (112). Beta diversity was determined using QIIME for Bray-Curtis and UniFrac (unweighted and weighted) analyses (70, 113-115) and was visualized using EMPeror (116). The beta diversity measurements were compared between 18 cases (diagnosed with PTSD) and 12 trauma-exposed (TE) controls (not diagnosed with PTSD) using analysis of similarities (ANOSIM) using Bray Curtis, weighted UniFrac, and unweighted UniFrac metrics. Alpha diversity metrics were determined after rarefaction for Chao1, observed OTUs, phylogenetic diversity (PD)-whole tree, and Shannon diversity index using QIIME (70, 115, 117-120).

Comparison of taxa abundances

Operational taxonomic units of the rarefied data set were collapsed by taxonomic assignment and compared using QIIME (70). The abundances of each taxon within the cases and controls were compared for all taxa levels shown. The significance of the difference between groups was evaluated using the Kruskal-Wallis test, correcting for multiple testing using a Bonferroni correction (121-123).

Biplot analyses

To explore correlations among the different phyla and genera present in the samples, and their relationships to microbial community structure in PTSD participants and TE controls, compositional biplot analyses were performed.The full set of reads mapped to Greengenes, identifying a total of 7,556 OTUs, was used for this analysis. A multiplicative zero replacement technique was used to determine pseudo counts for zeros (124).A centre log ratio transformation (125) was conducted to transform the data anda singular value decomposition was performed to obtain a 2 rank approximation of the phyla and genera and the samples.

Weighted gene co-expression network analysis (WGCNA)

Network-based analysis was conducted using weighted gene co-expression network analysis (WGCNA). The full set of reads mapped to Greengenes, identifying a total of 7,556 OTUs, was used for this analysis. Modules were tested for association with variables of interest (age, Childhood Trauma Questionnaire (CTQ) score, time since index trauma (in months), plasma C-reactive protein (CRP) concentration (mg/L)) using general linear models. All statistical analyses were performed in R via the R Studio platform and used the WGCNA statistical package (126, 127). Initial analyses to check for potential outliers and excessive missingness in our data set were performed using the WGCNA package. In addition, hierarchical average linkage cluster analysis was performed to identify outlying samples. Neither of these resulted in the removal of any participants.

We used WGCNA to identify clusters, or modules, of OTUs that are highly co-expressed as well as a hub OTU that is the most highly connected. The nodes were the relative abundance values of OTUs, and we used this analysis to build modules of these nodes. To build the modules we used a “soft threshold” for assigning module membership, as suggested and thoroughly reviewed by Zhang and Horvath (128). A topological overlap matrix (TOM) was built and used to develop modules of connected nodes. Average linkage hierarchical cluster analysis was used to develop a dendrogram, and the “cutreeDynamic” function in the WGCNA package was used to assign modules based on this dendrogram (127, 128). The minimum module size of thirty nodes was assigned. Eigenvalues were generated for each of the modules. The eigenvalues were organized by treatment group and Student’s t-tests with Bonferroni correction were used to analyze the modules for differences between PTSD participants and TE controls.

In order to test the relationship of these modules to traits of the participants, ordinary least squares (OLS) regression models were generated. Twenty-two models were constructed, in which the module eigenvalues were regressed against age, CTQ total score, time since index trauma (in months), and plasma CRP concentration (mg/L) for each participant, using Bonferroni correction.

Phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt)

The functional potential of microbial communities was assessed using PICRUSt (129). The full set of reads mapped to Greengenes, identifying a total of 7,556 OTUs, was used for this analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) has been organized into 4 levels of hierarchies. Level one is the most general level of categories and level four is the most specific level of categories, each with KEGG Orthology (KO) terms. The analyses based on PICRUSt prediction were done on level 3 after grouping predicted KO terms into a higher level of categorization. Pairwise distances between samples were calculated using the Bray-Curtis metric on level 3 KO categories. The level 3 PICRUSt output was filtered to include only metabolism-related KO categories, and a second set of principal coordinates was calculated.

Table S1. Descriptive statistics for each module identified using WGCNA

Module / Cluster coefficient1 / Network density2 / Network centrality3 / Number of nodes4 / P value5 / Most connected node by phylum6
Black / 0.166 / 0.047 / 0.114 / 87 / 0.67 / Bacteroidetes
Blue / 0.182 / 0.05 / 0.137 / 123 / 0.55 / Firmicutes
Brown / 0.188 / 0.046 / 0.089 / 104 / 0.93 / Bacteroidetes
Cyan / 0.132 / 0.041 / 0.072 / 56 / 0.32 / Lentisphaerae
Dark Green / 0.228 / 0.07 / 0.109 / 35 / 0.60 / Firmicutes
Dark Red / 0.254 / 0.062 / 0.128 / 40 / 0.09 / Firmicutes
Green / 0.175 / 0.054 / 0.106 / 98 / 0.50 / Firmicutes
Green Yellow / 0.153 / 0.043 / 0.065 / 61 / 0.38 / Firmicutes
Grey 60 / 0.188 / 0.05 / 0.087 / 47 / 0.26 / Firmicutes
Light Cyan / 0.295 / 0.064 / 0.112 / 49 / 0.67 / Proteobacteria
Light Green / 0.138 / 0.055 / 0.115 / 47 / 0.25 / Firmicutes
Light Yellow / 0.175 / 0.06 / 0.064 / 45 / 0.95 / Firmicutes
Magenta / 0.203 / 0.052 / 0.068 / 75 / 0.40 / Bacteroidetes
Midnight Blue / 0.175 / 0.049 / 0.129 / 54 / 0.30 / Bacteroidetes
Pink / 0.201 / 0.065 / 0.144 / 82 / 0.36 / Bacteroidetes
Purple / 0.296 / 0.086 / 0.145 / 66 / 0.51 / Bacteroidetes
Red / 0.166 / 0.055 / 0.1 / 91 / 0.47 / Firmicutes
Royal Blue / 0.227 / 0.055 / 0.105 / 41 / 0.51 / Firmicutes
Salmon / 0.238 / 0.073 / 0.122 / 57 / 0.65 / Bacteroidetes
Tan / 0.241 / 0.65 / 0.087 / 58 / 0.35 / Bacteroidetes
Turquoise / 0.205 / 0.064 / 0.14 / 135 / 0.44 / Firmicutes
Yellow / 0.243 / 0.072 / 0.15 / 99 / 0.69 / Firmicutes

1The cluster coefficient is a measure of localized network density or “cliquishness” within the modules. 2Network density is a measure of the total amount of possible connections that exist within a given module. 3Network centrality is a measure of how much one individual node dominates the module’s connectedness. 4The P value was generated from the Student’s t-test comparison between the eigenvalues of the posttraumatic stress disorder (PTSD) participants and trauma-exposed (TE) controls, using Bonferroni correction. 5The most connected node by phylum was determined by the scaled connectivity coefficient.

Figure S1: Stacked bar charts indicating the relative abundances of the most abundant taxa detected in the gut microbiomes of posttraumatic stress disorder (PTSD) participants and trauma-exposed (TE) controls. A) Genus level. B) Family level. C) Class level. D) Order level. E) Phylum level.

Figure S2. Alpha diversity measures of gut microbial communities of PTSD subjects and trauma-exposed (TE) controls. A) Chao1. B) Observed species. C) Phylogenetic diversity (PD)-whole tree. D) Shannon diversity index. Analyses were performed on 16S rRNA V3/V4 amplicon data (a single amplicon of approximately 460 base pairs), with different rarefaction depths, represented by the x-axis. Values represent means ± SD.