Supplementary information
Methods
Recombination-based cloning. FLAG-tagged yeast open reading frames (ORFs) were cloned using the Gatewayä recombination-based cloning system (Invitrogen). A galactose-inducible, C-terminal FLAG tag Gatewayä destination vector, called pGAL1-CFLAG, was constructed by inserting annealed FLAG-1/2 oligonucleotides (FLAG-1: 5'-GATCCCCCGGGATGGATTACAAGGATGACGACGATAAGTAACTGCA-3', FLAG-2: 5'-GTTATCCGCCCGGGCTCTTATCGTCGTCATCCTTGTAATCCATCCCGGGG-3'; FLAG = DYKDDDDL, Sigma-Aldrich) into a <GAL1 LEU2 CEN> base vector (MT2250) cut with BamHI and PstI, followed by insertion of conversion cassette B into the SmaI site. A doxycyclin-inducible C-terminal FLAG tag Gatewayä destination vector, called ptet-CFLAG, was constructed by inserting the conversion cassette B-FLAG tag region from pGAL1-CFLAG, removed as a SpeI-ClaI fragment, into pCM251 between the BamHI site and ClaI site1. Both donor vectors were propagated in the E. coli DB3.1 strain to prevent lethality of the ccdB gene in the Gatewayä conversion cassette. Yeast ORFs were amplified by PCR using a 5' primer that included the attB1 recombinational site (5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTA-3’), followed by the start codon and 18-24 bp of gene-specific sequence and a 3' primer that included the attB2 recombinational site (5'-GGGGACCACTTTGTACAAGAAAGCTGGGTC-3’) followed by 18-24 bp of gene-specific sequence immediately upstream of the stop codon. PCR amplification was performed with Platinum Taq Hi Fidelity DNA polymerase protocol using 100 ng of S288C yeast genomic DNA. PCR products were purified using a Millipore Multiscreen-PCR system and inserted into pGAL1-CFLAG using recombinational cloning as recommended (Invitrogen).
Capture of protein complexes. Strain BY4742 MATa his3D1 leu2D0 lys2D0 ura3D0 pep4D::KANR from the international yeast deletion consortium2, or a variant strain YP2 (BY4742 pep4D::KANR trp1D::HIS3) were used for protein expression. Yeast biology techniques were essentially used as described3. XY medium contains 2% bactopeptone, 1% yeast extract, 0.01% adenine, 0.02% tryptophan. To overcome difficulties in expression, such as for poorly expressed genes or developmentally regulated genes, all baits were expressed from either the inducible GAL1 or tet promoters for short induction periods. To maximize recovery of delicate protein complexes, we utilized concentrated cell extracts, from which the FLAG epitope could be captured with 50-100 % efficiency (data not shown). Yeast culture volumes of 500 mL or less were used to prepare cell extracts for capture on anti-FLAG resin (Sigma-Aldrich), according to either protocol A or protocol B, as follows. Where specified in Supplementary Table S2, cell cultures were treated after 1 hour galactose induction with one of the following: i) 4mg/mL a-factor mating peptide (WHWLQLKPGQPMY, Research Genetics), ii) 3mg/mL 4-nitroquinoline-1-oxide (Sigma), iii) 100mM hydroxyurea (Sigma) or iv) 15mg/mL nocodazole (Fluka), v) 0.1% methyl methane sulfonate (Sigma).
Protocol A: BY4742 bearing pGAL1-CFLAG expressing the ORF of interest was grown in XY medium containing 2% raffinose and 0.1% glucose to an OD600 of 1.3 to 1.5. Expression was induced with 2% galactose for 1-1.5 hours, after which cells were centrifuged and washed in lysis buffer (LB: 50 mM HEPES pH 7.5, 150 mM NaCl, 1 mM EDTA, 10 mM MgCl2, 50 mM b-glycerophosphate, 20 mM NaF, 2 mM benzamidine, 0.5% Triton X-100, 0.5mM DTT, 10 µg/mL leupeptin, 2 µg/mL aprotinin, 0.2mM AEBSF, 1 µg/mL pepstatin A. Where specified in Supplementary Table S2, LB contained 0.5% sodium deoxycholate. The cell pellet was resuspended in 1 mL LB per gram of cells and lysed by the glass bead method4. Cell extracts were clarified by centrifugation at 14,000 rpm for 20 min in a microcentrifuge. Clarified extracts were incubated with 50-80 µL of anti-FLAG- sepharose resin (Sigma-Aldrich) for 1 h at 4 ºC, then washed three times with wash buffer (WB; 50mM HEPES pH 7.5, 150mM NaCl, 1mM EDTA, 10mM MgCl2, 50mM b-glycerophosphate, 5% glycerol, 0.1% Triton X-100, 0.5mM DTT, 0.2mM AEBSF) and once with WB without Triton X-100. To help remove background proteins, beads were then incubated for 15 min at 4ºC (referred to as the pre-elution step) in HBS (100mM HEPES, 100mM NaCl, 0.2mM AEBSF) with 100 µg/mL non-specific HA competitor peptide (YPYDVPDYA, Research Genetics). FLAG-tagged protein complexes were eluted twice for 10 min. at room temperature (referred to as the elution step) in HBS with 200 µg/mL FLAG peptide (DYKDDDDK, Sigma). Eluates and pre-eluates were precipitated with TCA/deoxycholate, washed with acetone, air-dried, resuspended in protein sample buffer and were separated by SDS-PAGE on a 10-20% gradient gel (Novex). Proteins were detected by colloidal Coomassie stain (Gel-Code, Pierce) and selected for band-cutting based on their specific presence in the FLAG-tagged complex.
Small scale immunopreciptiation and immunoblot analysis for confirmation of complexes detected by HMS-PCI was carried out according to standard methods3, using reciprocally tagged interaction partners in a destination vector that drives ORF expression as a C-terminal MYC3 epitope fusion from the constitutive CDC53 promoter (pMT3163).
Protocol B: YP2 bearing ptet-CFLAG constructs were grown to near saturation, diluted to an OD600 of 0.2 in DOB-Trp medium (QBIOgene) containing 2% glucose and 2 µg/mL doxycylin and then grown for a further 6-8 hours to a final OD600 of 1.2–1.5. Alternatively, BY4742 bearing pGAL1-CFLAG constructs were induced as above. Capture onto anti-FLAG resin was carried out as in protocol A with the following exceptions. Cells were lysed in buffer containing 50mM Tris pH 7.3, 150mM NaCl, 1mM EDTA, 10 mM MgSO4, 50 mM b-glycerophosphate, 0.5% Triton X-100 and complete protease inhibitor cocktail (Roche). Pre-elution was carried out twice for 10 min at 4 ºC in 50 mM Tris pH 7.3 with a mixture of Angiotensin (DDVYIHPFHL, Sigma-Aldrich) and Bradykinin (PPGFSPFR, Sigma-Aldrich) peptides at 50 µg/mL each or, alternatively, with 100 µg/mL of the peptide, YDDKDKD (Schafer-N). FLAG-tagged protein complexes were eluted twice for 10 min. at room temperature in 50 mM Tris pH 7.3 with 200 µg/mL FLAG peptide (Schafer-N). All wash and elution steps were by gravity flow in 2 mL columns (Mobitech) and eluates were either precipitated with TCA as above or dried under vacuum.
Mass Spectrometry. For part of the dataset, automated imaging and excision of protein bands was achieved with a robotics workstation; otherwise bands were visualized and cut manually. Excised gel slices were reduced with DTT and alkylated with iodoacetamide essentially as described4,5. In-gel digestion with porcine trypsin (Promega, Madison, WI) was carried out on an automated robotics system and the resulting peptides were extracted under basic and acidic conditions. Peptide mixtures were subjected to LC-MS/MS analysis on a Finnigan LCQ Decaâ ion trap mass spectrometer (Thermo Finnigan, San Jose, CA) fitted with a Nanosprayâ source (MDS Proteomics). Chromatographic separation was accomplished using a Famosâ autosampler and an Ultimateâ gradient system (LC Packings, San Francisco, CA) over Zorbaxâ SB-C18 reverse phase resin (Agilent, Wilmington, DE) packed into 75mM ID PicoFritâ columns (New Objective, Woburn, MA). A cluster of IBM NetFinity X330 computers were used to match MS/MS spectra against gene and protein sequence databases. Protein identifications were made from the resulting mass spectra using two commercially available search engines, Mascotâ (Matrix Sciences, London, UK) and Sonarâ (ProteoMetrics, Winnipeg, Canada). An additional in-house search engine called Pepsea (MDS Proteomics) was also used for some searches. A relational database system called Piranha was developed to store and process raw mass spectrometric protein identifications (MDS Proteomics).
We note that the sensitivity of our mass spectrometric platform, which is well below the 100 fmol level (and on the order of 20 fmol for purified protein standards), enabled the use of small scale preparations for high throughput. However, because this process was near the detection limit for many proteins and because excised bands often contained many protein species, over a limited portion of the dataset we obtained an average reproducibility of 20% for the cohort of proteins associated with any given bait between any two experiments (data not shown). Although abundant, stoichiometric proteins were identified with near certainty, identification of sub-stoichiometric, low abundance associated proteins was less reliable. A number of parameters, including cell growth and lysis, immunoprecipitation conditions, digestion efficiency and recovery of peptides from gel slices, and run-to-run variations in mass spectrometry, among others, contribute to this variability. Based on a 74% validation rate in direct co-immunoprecipitation/antibody detection tests of a random subset of HMS-PCI interactions (data not shown), we estimate that a majority of the interactions detected are reproducible. We do wish to stress that MS-MS analysis of multiple affinity purifications is needed to comprehensively identify the proteins associated with any given bait protein.
Background filtering criteria. As a consequence of both the gentle isolation methods used to recover protein complexes from concentrated extracts and the ultra-sensitive mass spectrometry used to identify proteins in each gel slice, we detected non-specific contaminants in each complex purification. These recurrent background species were filtered from the dataset according to the following criteria: (i) any protein found in association with 3% or more of the baits assayed (see section below Eliminating Background Proteins Through Frequency of Interaction); (ii) structural components of the ribosome, which were detected in many preparations (see Supplementary Table S7C); (iii) all proteins that detectably bound to anti-FLAG resin in the absence of a FLAG-tagged bait protein6 (see Supplementary Tables S6; excluded proteins listed in order of frequency). One distinct advantage of the HMS-PCI approach is that non-specific interactions are more readily identified as the size of the dataset increases. An inherent difficulty with any data filtering scheme is that proteins that participate in many bona fide interactions are at risk of being excluded from analysis. Proteins of note in this category included actin, tubulin, karyopherins, chaperonins and heat shock proteins, all of which are known to form numerous distinct and biologically relevant complexes. Application of these filtering criteria reduced the dataset to 3,617 distinct protein identifications in association with 493 baits (Supplementary Tables S1 and S2). The filtered interaction set contains 1,578 different proteins or approximately 25% of the yeast proteome.
Eliminating Background Proteins Through Frequency of Interaction. Potential non-specific interactions, excluded from Supplementary Table S1, are based on the number of different baits an interactor protein bound7. A 3% binding frequency exclusion was found to remove background interactions while retaining interactions that are meaningful, as defined by literature validation. In Figure S2, roughly 94% of the known interactions found in PreBIND and MIPS are retained (166/177) when the 3% frequency exclusion is used to eliminate frequently binding proteins (Figure S2, dotted line). Typically, the excluded proteins are abundant proteins that are involved in metabolic processes, cell structure or biogenesis.
Identification of Hypothetical Proteins. As a byproduct of HMS-PCI, we identified many proteins of unknown function whose existence had previously only been predicted from the genome sequence. Given the difficulty in prediction of coding regions from genome sequence information even in yeast, the direct identification of encoded peptides by mass spectrometry provides an important validation of putative coding regions. Supplementary Table S3 contains a list of 531 proteins identified by mass spectrometry that fall into MIPS categories other than known proteins. Tables of hypothetical and putative proteins were obtained from the MIPS classification of ORFs (http://mips.gsf.de/proj/yeast/tables/classes/)8.
Bioinformatics Analysis and Methods
Connectivity Distribution. Interaction data from immunoprecipitation experiments reflect a population of protein complexes with unknown topologies, which cannot be accurately represented as pairwise protein interactions. The HMS-PCI data here is represented as hypothetical direct interactions between bait and associated proteins. The connectivity distribution of the this model network was calculated using the Pajek software package9 by partitioning the network by node (protein) degree (k). The resulting partition was exported to Microsoft Excel where the graph of the probability that a node in the network interacts with k other nodes, P(k), was plotted versus k. The resulting graph could be fitted using a power-law with an R2 value of 0.90. The power-law relationship was P(k) = 1042 k -1.8. The fit of the connectivity distribution to this power-law was worse at higher values of k, most likely from the effects of the filter that was applied to the raw HMS-PCI data to remove background and from the fact that the hypothetical model does not take indirect interactions in the immunoprecipitated protein complexes into account. Metabolic10 and protein interaction11,12 networks have been previously discovered to follow a power-law connectivity distribution13. Such networks are robust and maintain their integrity when subjected to random disruption of components14,15.
Gene Ontology Annotation. Yeast proteins in our dataset were annotated using terms from the Gene Ontology (GO) project (http://www.geneontology.org)16. A subset of terms from the Biological Process and Cellular Component GO ontologies were selected to form a generalized categorization of Saccharomyces cerevisiae cellular localizations and biological processes. Some related GO terms were collapsed into a single category. For example, "endoplasmic reticulum" and "Golgi apparatus" were combined to form the "endoplasmic reticulum/Golgi" category. Annotation was performed from the set of GO terms downloaded from the GO FTP site on November 6, 2001. The GO selected term subset is shown below for each ontology used here. If a category is the result of combining more than one GO term or changing the name of a GO term, the original individual term(s) are shown in brackets.
GO Cellular Component Ontology Selected Term Subset
ascus
bud
cell wall (external protective structure + cell wall)
cytoplasm
cytoskeleton
endoplasmic reticulum/Golgi (endoplasmic reticulum + Golgi apparatus)
extracellular
intracellular
lysosome/peroxisome/vacuole (lysosome + peroxisome + vacuole)
mitochondrion
nucleolus
nucleus
plasma membrane/nuclear membrane (plasma membrane + nuclear membrane)
shmoo
unknown (unknown + unlocalized + cell + obsolete)
GO Biological Process Ontology Selected Term Subset
aging
autophagy
budding
carbohydrate metabolism
cell adhesion
cell cycle (cell cycle + cell proliferation)
cell growth and/or maintenance
cell organization and biogenesis
cell shape and cell size control
chromosome organization and biogenesis
DNA damage response and repair (DNA damage response + DNA repair)
DNA metabolism
DNA recombination
DNA replication
general metabolism (metabolism + respiration)
mating (mating (sensu Saccharomyces))
mating-type determination
nucleolar and ribosome biogenesis (nucleologenesis + nucleolus organization and biogenesis + ribosome biogenesis)