Title: Host-Microbial interactions in Idiopathic Pulmonary Fibrosis

Authors: Philip L Molyneaux1&2, Saffron A G Willis Owen1, Michael J Cox1, Phillip James1, Steven Cowman1&2, Michael Loebinger1&2, Andrew Blanchard3, Lindsay M Edwards 3, Carmel Stock1&2, Cécile Daccord1&2, Elisabetta A Renzoni1&2, Athol U Wells2, Miriam F Moffatt1*, William OC Cookson1&2*, Toby M Maher1&2*

* These three senior authors contributed equally to the study.

Affiliations:

1 National Heart and Lung Institute, Imperial College London, London, UK.

2 Royal Brompton Hospital, London, UK.

3 Fibrosis Discovery Performance Unit, GlaxoSmithKline R&D, GlaxoSmithKline Medicines Research Centre, Stevenage, UK

Corresponding author:

Dr Philip L Molyneaux

National Heart and Lung Institute,

Imperial College London,

Guy Scadding Building,

Royal Brompton Campus,

London,

SW3 6LY,

UK.

Telephone: +44 (0)20 7594 2943

Email:

Author contributions: PLM planned the project, designed and performed experiments, analysed the data and wrote the manuscript; SAGWO, MJC & PJ performed statistical analyses, interpreted the results and helped write the manuscript; ER, CS, AUW, SC and ML recruited patients and healthy controls; CS & CD performed Genotyping; AB & LE helped analyse the data; TMM, MFM and WOC conceived the studies of IPF, planned the project, designed experiments, analysed the data and wrote the manuscript. All authors reviewed, revised and approved the manuscript for submission.

Funding: PLM was an Asmarley Clinical Research Fellow. Funding was from the Asmarley Trust and the Wellcome Trust. TMM is supported by an NIHR Clinician Scientist Fellowship (NIHR Ref: CS-2013-13-017). The project was supported by the NIHR Respiratory Disease Biomedical Research Unit at the Royal Brompton and Harefield NHS Foundation Trust, the AHSC Biomedical Research Centre at Imperial College London, the National Institutes of Health (HL097163 and HL092870) and an unrestricted academic industry research grant from GSK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Financial Disclosure: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Running Title: Host Microbe IPF

Descriptor: 9.23 Interstitial Lung Disease

Key Words: usual interstitial pneumonia, acute lung injury, microbiome, IPF, expression

Word Count: 3155

This article has an online data supplement, which is accessible from this issue's table of contents online at www.atsjournals.org.

At a Glance Commentary:

Scientific Knowledge on the Subject: Idiopathic pulmonary fibrosis (IPF) is a progressive and fatal disease of unknown cause. Changes in the respiratory microbiome and bacterial burden have been associated with disease progression in IPF. The role of the host response to the respiratory microbiome however remains unknown.

What this study adds to the field:

Integrated analysis of the host transcriptome and microbial signatures demonstrates an interaction between host and environment in IPF. The response to an altered and more abundant microbiome remains during longitudinal follow up, suggesting that the bacterial communities of the lower airways may be acting as persistent stimuli for repetitive alveolar injury in IPF.

Abstract

Rationale: Changes in the respiratory microbiome are associated with disease progression in Idiopathic pulmonary fibrosis (IPF). The role of the host response to the respiratory microbiome however remains unknown.

Objectives: To explore the host-microbial interaction in IPF.

Methods: Sixty patients diagnosed with IPF were prospectively enrolled, together with 20 matched controls. Subjects underwent bronchoalveolar lavage (BAL) and peripheral whole blood was collected into PAXgene tubes for all subjects at baseline. For IPF subjects additional samples were taken at 1, 3, and 6 months and (if alive) a year. Gene expression profiles were generated using Affymetrix Human Gene1.1ST Arrays.

Measurements and Main Results: Network analysis of gene expression data identified two gene modules that strongly associate with a diagnosis of IPF, BAL bacterial burden (determined by 16S quantitative PCR) and specific microbial OTUs, as well as lavage and peripheral blood neutrophilia. Genes within these modules that are involved in the host defence response include NLRC4, PGLYRP1, MMP9, DEFA4. The modules also contain two genes encoding specific antimicrobial peptides (SLPI and CAMP). Many of these particular transcripts were associated with survival and showed longitudinal over expression in subjects experiencing disease progression, further strengthening their relationship with disease.

Conclusions: Integrated analysis of the host transcriptome and microbial signatures demonstrates an apparent host response to the presence of an altered or more abundant microbiome. These responses remain elevated on longitudinal follow up, suggesting that the bacterial communities of the lower airways may be acting as persistent stimuli for repetitive alveolar injury in IPF.

Abstract word count: 249

Introduction

Idiopathic pulmonary fibrosis (IPF) is a progressive disease of unknown aetiology with a 5 year survival of only 20% (1). Current evidence suggests IPF develops in genetically susceptible individuals with dysfunctional alveolar epithelial repair mechanisms, following repeated episodes of alveolar injury (2). While our understanding of both the underlying genetics and potential environmental stimuli causing alveolar injury have progressed over recent years, the link between the two still remains unclear (3).

Active infection in IPF is known to carry a high morbidity and mortality (4). In individuals with IPF immunosuppression is clearly deleterious (5) while treatment adherent subjects in a large trial assessing prophylactic co-trimoxazole in IPF experienced a reduction in overt infections and mortality (6). Recent transcriptomic studies have hinted at the role of disordered host defence, and thus susceptibility to infection, as an important contributor to disease progression in IPF (7–9). Indeed polymorphisms within two genes TOLLIP and MUC5B have both recently been associated with IPF susceptibility and linked to alterations in the lung immune response (10, 11).

The toll-interacting protein (TOLLIP) gene encodes an adaptor protein, an important regulator of innate immune responses mediated through pattern recognition toll-like receptors. Polymorphisms in the TOLLIP genes have now been linked to both IPF susceptibility and mortality (12). To date, the most significant genetic association with IPF is with mucin 5B (MUC5B) polymorphisms, which have been linked to higher IPF risk but, paradoxically, slower disease progression (8). In mice the role of Muc5b appears essential for normal macrophage function and effective mucociliary clearance of bacteria (13), and evidence is building to suggest a similar role in humans. Impaired mucociliary clearance would allow bacteria to persist in the lower airways potentially acting as a trigger for alveolar injury. Indeed the recent characterisation of the respiratory microbiome in IPF has suggested that an increased bacterial burden and the presence of specific organisms could drive disease progression (14, 15).

While these observations strengthen the epidemiological argument that infective environmental factors may be integral to the pathogenesis of IPF in genetically susceptible individuals, to date there has been no assessment of the host-microbial interaction. We therefore set out to explore in individuals with IPF the relationship between the peripheral whole blood transcriptome, MUC5B and TOLLIP genotypes and the respiratory microbiome. We have used unbiased network analysis to cluster similarly expressed genes into modules allowing us to dissect large transcriptomic datasets into easy to interpret functional clusters. Some of these data have previously been presented in abstract form (16).


Methods

Study Design

Patients were prospectively recruited from the Interstitial Lung Disease Unit at the Royal Brompton Hospital, London, England between November 2010 and January 2013. A diagnosis of IPF was made, according to international guidelines (17), following multi-disciplinary team discussion. Healthy control subjects, including non-smokers and smokers, were recruited using the same protocols. Subjects were excluded if they had a history of self-reported upper or lower respiratory tract infection, antibiotic use in the prior 3 months, acute IPF exacerbation, or other respiratory disorders. Written informed consent was obtained from all subjects and the study was approved by the Local Research Ethics Committee (Ref 10/H0720/12 and 12/LO/1034).

Following recruitment, patients were re-assessed in clinic at 1, 3, 6 and 12 months. At baseline and each subsequent visit peripheral blood samples were collected into PAXgene RNA tubes (PreAnalytiX). Pulmonary function testing was performed at baseline, 6 and 12 months. At baseline subjects underwent Fibre-optic bronchoscopy with bronchoalveolar lavage (BAL) as previously described (18). Genomic DNA was extracted and the V3-V5 region of the bacterial 16S rRNA gene amplified using the 357F forward primer and the 926R reverse primer for 16S qPCR as previously described (18).

Genotyping

Genotypes of the MUC5B SNP rs35705950 and TOLLIP SNPs rs3750920 and rs5743890 were determined using TaqMan assays (Life Technologies, Carlsbad, CA). Reactions were performed in 384-well plates, and fluorescence was read using an Applied Biosystems Viia7 Sequence Detection System.

RNA Extraction, Quality Assessment and Expression

The PAXgene Blood RNA Kit (PreAnalytiX) was used to isolate RNA according to the manufacturer’s protocol. Total RNA was quantified using the Nanodrop ND 1000 UV-Vis spectrophotometer (Thermo Scientific) and the quality and integrity assessed using the 2100 Bioanalyser (Agilent) by ratio comparison of the 18S and 28S rRNA bands.

Thirty nanograms of each RNA sample was used to synthesize double-stranded (ds) cDNA using the Ovation® Pico WTA System V2 Kit (Nugen). Exogenous Poly-A positive controls were added to monitor the efficiency of the synthesis of the ds cDNA and target labelling process. The Encore® Biotin Module Kit (NuGEN) was used to fragment 2.8 µg of the purified cDNA template which was then hybridised, washed and scanned on the GeneTitan system (Affymetrix) using HuGene 1.1 ST 16- or 24-PEG array plates (Affymetrix). The complete data sets are available in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/).

Host transcriptome analysis

Raw expression data were background adjusted, quantile normalised and median polished using the RMA algorithm as implemented in the Affymetrix Power Tools software suite (APT, version 1.12.0). Linear Models for Microarray Data (LIMMA) was used to identify differentially expressed genes between each pair of sample groups and applying a single contrast between samples at different time points and the zero time point. Significance analysis of microarrays (SAM) was used to test the association between microarray gene expression and survival in IPF patients. P-values were adjusted for multiple testing using the Benjamini & Hochberg method for the control of the expected False Discovery Rate (FDR). Genes with significant differential expression were then selected by use of a cut-off of an FDR-adjusted P value and fold change in the level of expression.

Weighted Gene Co-expression Network Analysis (WGCNA) was used to discover correlation patterns among differentially expressed genes (19). Groups of transcript modules exhibiting high topological overlap were identified. A minimum module size of 40 was specified. A representative variable (module eigengene) was calculated for each module as the first principal component (20). WGCNA analysis was conducted in R 3.0.2 [Bioconductor (www.bioconductor.org)]

Module eigengenes generated from WGCNA analysis were correlated with phenotypic and microbial traits of interest. Genes in each cluster were analysed using DAVID (21) with the stringency level set to medium to allow functional annotation clustering and Gene Ontology (GO) term enrichment analysis. Cytoscape 3.2.158 was used to visualize the network with a prefuse force-directed layout (22). Survival analysis was performed with a Cox proportional hazards model to assess the association between continuous explanatory variables and overall survival. The statistical significance of association of variables with a diagnosis of IPF was assessed using stepwise backward logistic regressions to select the most parsimonious model from potential covariates.

Results

Subjects and Sampling

Sixty patients with IPF and 20 controls (Table 1) were enrolled into the study. The IPF subjects were predominantly male (65%) with a mean age of 67.8 years and had moderately severe disease (Carbon Monoxide Diffusing Capacity [DLCO] 40.9% predicted; Forced Vital Capacity [FVC] 73.4% predicted). Twenty-four IPF subjects died during follow up and a further 13 experienced a decline in FVC ≥10% and/or a decline in DLCO ≥15% over a 6-month period (Figure E1). None of the IPF subjects were receiving immunosuppressive therapy. The 20 controls were matched for age, sex and smoking history..

Half of the IPF subjects were sampled longitudinally, with 30 subjects sampled at time point 1 (1 month), 29 subsequently sampled at time point 2 (3 months), 30 sampled at time point 3 (6 months) and 21 of the 30 sampled at time point 4 (1 year sample).

Respiratory Microbiome

All of the subjects underwent bronchoalveolar lavage and we have previously reported the differences between the microbiota of IPF subjects compared to age, sex and smoking matched controls (14). We found that there was a two-fold increase in bacterial burden in the lavage of IPF subjects and significant differences in a number of bacterial operational taxonomic units (OTUs) compared to controls. There were 464 bacterial OTUs identified across the IPF and control subjects. IPF subjects had significantly higher sequence reads of 4 OTUs, a Haemophilus sp., Neisseria sp., Streptococcus sp. and a Veillonella sp., compared to control subjects,. The impact of these OTUs on the host transcriptome was therefore investigated further.

Baseline Gene Expression

At baseline, 1,358 transcript clusters were found to be differentially expressed between IPF and control subjects (1% False Discovery Rate [FDR]). Gene Ontology (GO) analysis revealed that the top GO biological processes most enriched within the transcript clusters were related to host defence and stress (Table E1).

The five top differentially expressed genes were thioredoxin (TXN, Fold Change [FC] 2.2, P=2.75E-09), Cystatin A (CSTA, FC 2.1, P=5.81E-09), Chemokine-Like Factor Superfamily Member (CMTM2, FC 1.7, P=1.78E-08), S100 calcium binding protein A12 (S100A12, FC 1.94, P=8.75E-08), and retinol binding protein 7 (RBP7, FC 1.86, P=1.45E-06) (Table E2). The largest fold change observed within the differentially expressed genes was 3.62 for ORM1. Two other notable genes with large fold changes in expression were specific antimicrobial peptides, secretory leukocyte peptidase inhibitor (SLPI, FC 2.29, P=7.05E-05) and Cathelicidin Antimicrobial Peptide (CAMP, FC 2.11, P=3.0E-04). Up-regulation of two transcripts previously associated with IPF was also seen, MMP9 and DEFA4 (Table 2).

The 1,358 differentially expressed transcript clusters identified were included in a signed weighted gene co-expression network analysis (WGCNA) (19). A total of 5 modules were identified and assigned a unique colour identifier; turquoise (containing 690 members), blue (289 members), brown (186 members), yellow (131 members) and green (54 members). Eight transcript clusters were unassigned.