Supplementary methods
Manual search for X chromosome synaptic genes
To search for a link between X-linked genes and the synaptic process, we interrogated PubMed for the whole list of X chromosome genes (except pseudogens) with thefollowing type of query: (names of the gene) AND (synapse OR synaptic OR synaptogenesis OR neurites). After discarding carefully all results concerning immunological synapses, we looked deeper into the link between the gene and the synapse, reading abstracts only or the entire paper if necessary. For the study we defined a neuronal synapse as the communication point between two neurons, an entity comprising the presynaptic release apparatus in the first neuron, the synaptic cleft and the postsynaptic element which relays a signal transmission in an electrical and a molecular way. We also included the third element of the tripartite synapse, which comprises presynaptic oligodendrocytes and synaptically associated astrocytes, as glia are now accepted to play a role in the formation of synapses and neurotransmission. Inclusion of this last synaptic part in our definition of the synapse is particularly relevant for SCZ because several studies showed a possible involvement of these cells in its pathology. We kept genes for which a synaptic localization or function was demonstrated (with the annotation“synaptic”) but also those that could be involved in the synaptic process, due to the protein family they belong to (with the annotation “potentially synaptic”).
Criteria used for the candidate gene selection
We built a scoring system based on different criteria:
Involvement in a human disease: We used the OMIM database and literature, and found that an important portion (32%) of genes from our list was already associated with a disease (Supplementary Table 2). It is known that different mutations in the same gene could lead to different diseases, as mutations in ARX gene have been shown to cause either ASD with MR or lissencephaly. We then decided to keep for our study genes for which mutations were controversial or unique. It concerns 6 genes which were integrated into the list of genes without any associated disease. Only genes in which several mutations have been demonstrated to cause disease unrelated (non related disease NRD) to ASD or SCZ in different families were discarded (33 genes). Concerning 25 genes causing diseases relevant (related disease RD) for our study (ASD or SCZ themselves, RTT or RTT-like syndrome or NS-MR), we decided to include them directly in the candidate gene list for the screening.
Nature of the synaptic involvement (2, 1 and 0 points): Our list combines genes encoding proteins with an evidence of synaptic localization (from proteomic studies and manual search), and also genes with a real or a putative synaptic function (SynDB and manual search). To evaluate the synaptic involvement of each gene, we decided to look into its synaptic function (2,1 or 0 points) on one hand and its synaptic localization (1 point) on the other hand. Concerning the synaptic function, we distinguished basic function in normal synaptic process (1 point) and more relevant functions in synapse modification like role in synaptogenesis, neurite outgrowth or synaptic plasticity (2 points). Indeed, impairment in these processes is more probable to cause cognitive impairment.
Expression in brain tissues (2, 1, 0 or -2 points): As we were looking for candidate genes for two mental illnesses and as we would screen individuals without comorbidity, we included a criterion of tissue expression pattern, in order to support genes highly expressed in the brain. For the tissue expression pattern we compiled data from different sources: SymAtlas microarray data ( EST profiles given by SynDB and data from literature (PubMed). Genes expressed in all tissues but which also encode a brain specific isoform were considered as brain dominant expressed genes.
Impairment of cognition in animal models (2 points): We took into account data available on animal models for each gene. Genes for which a disruption or mutation is known to cause learning and memory or behavioral impairments in mice or another animal model were supported. PubMed, OMIM and Jackson lab were used to find data.
Genetics (1 point): Although different studies have identified X-linked regions for ASD and SCZ, we decided not to include this criterion in the scoring system. Indeed, the linked regions found on the X chromosome are very large. For example, using the LOD-1 method to define ASD linked regions from Liu et al. and Auranen et al. studies, more than 70% of our genes were located in a linked region. Therefore, this argument was not relevant for the ranking. That is why we decided to assign this criterion only to genes located in a region where a chromosomal abnormality was described in ASD, SCZ or MR and also genes in which polymorphisms were associated with ASD or SCZ. PubMed and OMIM were used to find data.
Involvement in a relevant pathway for the disease or other relevant information (1 point): This criterion was added to support some genes involved in a pathway already described or assumed to be involved in ASD, SCZ or more generally in learning and memory processes. In this criterion was also included other relevant information, for example interaction with a protein involved in ASD or SCZ, but also expression data from postmortem biopsy microarrays of ASD or SCZ patients. Such data were found using PubMed.
A score was calculated that corresponded to the sum of all the points attributed using the five criteria described above. We obtained a distribution of genes ranked by candidate properties by sorting genes according to their score value (Supplementary Table 3). We selected 113 synaptic genes for the variant screening; 25 based on their involvement in related diseases and 88 that had a score superior to zero. Two out of the 113 genes were not screened: SYN1, which had been sequenced and analyzed before the beginning of this project (Cossette et al., unpublished data), and NXF5, because we were unable to design specific primers for this gene.