1

<pnas> Titles are limited to three lines or 135

characters including spaces.</pnas>

BIOLOGICAL SCIENCES

X-ray structure of the N and C-terminal domain of a coronavirus nucleocapsid protein; structural basis of helical nucleocapsid formation

Hariharan Jayaram, Hui Fan, Brian R. Bowman, Amy Ooi,Jyothi Jayaram, Ellen W. Collison, Lescar Julian, B.V.Venkataram Prasad

Verna and Marrs McLean Department of Biochemistry and Molecular Biology; Baylor College of Medicine; Houston, Texas, 77030; U.S.A , Department of Veterinary Pathobiology; Texas A&M University; College Station, Texas ,77843;U.S.A;School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551

Get buried surface area for NTD dimer-done

Make packing diagram for NTD-did not do -required

Figure out the details of lescar c2 –did send email

Abstract (250 words allowed ..page 2-Current 202):

Coronaviridae cause a variety of respiratory and enteric diseases in animals and man including SARS, a disease with emerging global impact. Enveloped capsids of the virus enclose the single stranded genome associated with the nucleocapsid protein ( N protein). Using limited proteolysis we identified two stable globular domains of the nucleocapsid protein from infectious bronchitis virus. We present here the crystal structure of the N and C-terminal domains (NTD & CTD) of IBV- N protein. The NTD protein with basic residues concentrated on two long tethers in the protein constitutes a possible RNA wrapping module. The CTD exist as intimate domain swapped dimers that tend to organize into helical arrays. Inferring from interactions observed in crystals at different pHs for the NTD and CTD we hypothesize that the CTD is the key determinant of helical nucleocapsid formation in the virus. Similarity between CTD and the capsid forming domain of a related virus family reveals that this fold constitutes a new class of viral capsid folds that are employed in viruses with helical nucleocapsids.The coronavirus nucleocapsid is thus made up of an N-terminal RNA binding core connected to a C-terminal capsid forming domain that together organize the helical nucleocapsid in the virus.

Introduction

Oh These are dangerous, global importance:

Coronaviridae, a member of the order Nidovirales, is a family of viruses with ssRNA genomes which are a significant causative agent of human upper respiratory infections such as common colds and other severe illnesses such as SARS (severe acute respiratory syndrome). The brief SARS outbreak has established itself as an important model to evaluate scientific and social preparedness and responses to future globalpandemics(Hufnagel, Brockmann et al. 2004). Following the SARS outbreak there has been an explosion of research activity into coronavirus pathogenesis and biology. Structural studies have yielded structures of 4 of the SARS proteins including some bound to potential drug leads.We present here results on the structure of the nucleocapsid protein from infectious bronchitis virus a group III coronavirus which reveals important insights into the molecular architecture of the genome containing core of the virus and interesting insights into the evolutionary relatedness between coronaviruses and other closely related positive sense RNA viruses.

Coronavirus Background:

The coronaviruses are a family of enveloped positivestrand RNA viruses. Their capsids range in diameter from 80 to 160 nm and enclose a single 30kb long segment of positive sense ssRNA(Siddell 1995). Upon infection and cell entry the genomic RNA encodes a 3’ co-terminal set of four or more subgenomic mRNAs with a common leader sequence of 60-100 nucleotides attached post-transcriptionally at their 5’-ends. These subgenomic RNA with their consensus 5’ and 3 ‘ termini encode the various viral structural and non structural proteins required to replicate the virus and produce progeny virion capsids.

Coronavirus General capsid architecture:

The enveloped capsid of the virus is predominantly made up of the membrane glycoprotein (M) and another small transmembrane protein (E) and an array of spikes composed of the spike protein glycoprotein (S) which gives the spherical particles a corona. A significant protein component of the capsid is the nucleocapsid protein (N), which interacts with the genomic ssRNA forming the central core of the virion.

Capsid architecture and N protein:

Electron microscopic studies of detergent permeabilized transmissible gastroenteritis virus capsids (TGEV a prototype coronavirus) revealed that the internal nucleocapsid is helical and is composed of the ssRNA genome tightly associated with N-nucelocapsid protein(Risco, Anton et al. 1996; Risco, Muntion et al. 1998).

The N protein is typically a multifunctional phosphoprotein of molecular weight 50kDa to 60kDa.tag and is a major subgenomic RNA species and is consequently synthesized in large amounts in an infected cell(Stohlman and Lai 1979; Lai and Cavanagh 1997). N proteins share homology ranging from 70% to 25% to other N proteins within the same group and N proteins from other groups respectively.

One of the major functions of the N protein is its ssRNA binding ability. The N protein was shown to have a general RNA binding ability with an increased affinity for corresponding viral(Cologna and Hogue 1998). During the virus life-cycle the N protein interacts extensively with the genomic as well as the subgenomic RNA that are synthesized. Both of these RNA species have multiple copies of the N-protein tightly associated with it (Baric, Nelson et al. 1988; Narayanan, Kim et al. 2003).This specific binding is mediated by consensus sequences common to all viral RNA at their 5’ and 3’ ends.Specifically it was shown that theIBV and MHV N proteins have an affinity for sequences at the 5’ and 3’ end of viral RNA(Nelson, Stohlman et al. 2000; Zhou and Collisson 2000)By virtue of these interactions with consensus RNA sequences at the transcript termini the N-protein plays a role in controlling mRNA transcription, and translation and replication(Lai and Cavanagh 1997; Tahara, Dietlin et al. 1998; Schelle, Karl et al. 2005).The genomic RNA which is replicated during the virus life cycle is selectively packaged by recognition of a packaging signal by the M protein which has been identified for Mouse hepatitis virus and Bovine Cornavirus(Fosmire, Hwang et al. 1992; Cologna and Hogue 2000).Although genome incorporation for packaging is driven by M protein driven recognition of a packaging signal the function of the N protein may be more to serve as a structural template guiding genome packaging(Narayanan, Maeda et al. 2000; Narayanan, Kim et al. 2003; He, Leeson et al. 2004). Accrodingly several mutations in M protein could be rescued by compensatory mutations in N protein(Kuo and Masters 2002). The M and N protein interact thus closely via their C termini , an interaction which is very important for proper genome encapsidation and nucleocapsid formation.

Host Interactions and Nucleocapsid protein:

The abundance of N produced during an infection results in N playing an important role in host modulation during a coronavirus The N protein is primarily cytoplasmic but has been reported to enter the nucleus in some coronaviruses(Wurm, Chen et al. 2001).The N from SARS has been shown to interact with cycophilin an immuno-modulator ,activate the AP1 pathway an important pathway involved in cell cycle control as well as induce apoptosis in certain cell types(He, Leeson et al. 2003; Luo, Luo et al. 2004; Surjit, Liu et al. 2004). The N protein is also a major immunogen and an important diagnostic marker for coronavirus disease(Leung, Tam et al. 2004) and canhelp improve the efficacy of coronavirus vaccines(Cavanagh 2003; Zhao, Cao et al. 2005).

Biochemistry background:

Considerable biochemical information has become available on the in vitro behavior of N protein especially with regard to its oligomerization behavior and interaction with RNA.The full length N-protein is prone to disorder and aggregation in solution and its instability is suggested to be important for its role in virus capsid formation(Wang, Wu et al. 2004). Early sequence analysis of multiple N protein sequences suggested that the MHV-N protein is made up of three domains , two basic domains followed by an acidic domain at the extereme C-terminus. The RNA binding domain of nucleocapsid protein has been localized to a regions in the first 300 residues of the N-protein for various homologs with a core region in the N-terminus of MHV representing the minimal region required to bind RNA by itself (sequence 177 to 321 in MHV corresponds to 144 to 198 in IBV-N)(Ziebuhr 2004).The dimerization domain of N-protein has been localized to the C-terminal 200 residues by several studies which identified N-protein dimers both in the context of the domain by itself and the full length protein(He, Dobie et al. 2004; Surjit, Liu et al. 2004; Yu, Gustafson et al. 2005). An NMR structrre for the N-terminal domain for SARS-N clearly shows that The N –terminal domain is largely composed of coiled structure and interacts with RNA in solution(Huang, Yu et al. 2004). The N-protein therefore constitutes two functional domains, an RNA binding N-terminal domain and a C-terminal dimerization domain.

Results and Discussion: The full length N protein from infectious bronchitis virus has been purified and characterized previously. The N protein has strong interactions with 5’and 3’ conserved sequences of IBV RNA and also undergoes phosphorylation during an infection to generate multiple isoforms . Our structural characterization of full length N protein was impeded by its aggregation and degradation on storage under a variety of conditions (lane zero Figure 0b). Purified full length N protein was also extremely polydisperse in solution and not amenable to detailed structural characterization.

We employed the divide and conquer approach to study the protein structurally. Using limited proteolysis we chose to identify regions of the protein that represented stable domains that were resistant to proteolysis under limiting amounts of proteases trypsin (that cleaves after basic residues Arg and Lys) and V8 protease (cleaves after acidic residues Glu and Asp). The digestion pattern with v8 protease was not very distinct and yielded several diffuse bands( data not shown). Trypsin proteolysed the full length protein to a single ~17 kD band on a 17% denaturing SDS-PAGE gel within 15 minutes of trypsinization(Figure 0b). The “single” band thus observed was resistant to further degradation even upon typsinization for several hours and represented a stable region of the protein. Using N-terminal sequencing of the cleavage fragment we identified four tryptic fragments: two major cleavage sites that corresponded to cleavage at residues19 and 219 and two secondary cleavage sites at residues 27 and 226-migrated The optimized domain constructs termed NTD (N terminal domain) and CTD (C-terminal domain) were then cloned, expressed and purified to homogeneity. The N terminal domain thus identified was monomeric at moderate concentrations concentrations while the C-terminal domain protein was a dimer even at very low concentrations(Figure 0c). The C-terminal protein tended to aggregate during purification and thus was purified at very low concentrations and concentrated only prior to crystallization screening. The NTD and CTD proteins thus expressed failed to interact at a variety of salt and protein concentrations as assayed by gel-filtration co-fractionation and pull down experiments (Figure 0c and data not shown). NTD and CTD therefore represent independent domains of the full length protein and were suitable for structure determination separately.

Crystals of both the N-terminal and C-terminal domain were obtained in a variety of conditions. Although diffraction data were obtained for both domains, we were successful in phasing only the CTD data using MAD techniques. The NTD contained no cysteines or methionines and the several mutant proteins we crystallized failed to yield a structure owing to pseudo-body centering in the crystals coupled with poor diffraction quality..We present here the crystal structure of the C-terminal domain of IBV-N protein and the NTD phased using molecular replacement technqies using the structure solves recenetly by some of the co-authors.

For the CTD, of the three different space groups in which we were successful in obtaining diffraction data we successfully solved the structure of CTD in two different conditions (Table 1). One of these crystal is at an extremely low pH of 4.5 where the crystals have a distinct rod like appearance in rare cases but form large needles or flat sheets in most cases. The other condition ( Table I, Crystal II) yielded crystals which were flat sheets after several weeks. We were successful in obtaining two wavelength anomalous data with selenomethionine substituted protein for crystal I and native data for crystal II. Crystal I and Crystal II represented two different pHs and two different ionic strengths and had widely differing unit cell sizes(Table 1). The crystal morphology of both crystals i.e rods or needles at acidic pHs or flat sheet crystals at basic pHs indicated a tendency of the protein to pack very well in two dimensions. Besides these a third three-dimensional hexagonal-bipyramidal crystal form grown under similar conditions as Crystal I but at slightly elvated pH ( pH 5.2 ) and the absence of citrate or acetate was optimized. Despite the seeming three dimensional appearance of this crystal form, the diffraction pattern was extremely anistropic with almost no diffraction perpendicular to the principal long axis of the pyramid. This factor also characteristic of organization along only two dimensions prevented the solving of CTD structure under these conditions. We report the pH 4.5 structure of CTD with a dimer in the asymmetric unit and a pH 8.5 structure with 4 dimers or 8 molecules in the asymmetric unit. The observation of dimers as the building block of both crystals at these widely different pHs coupled with the dimer observed on gel filtration under extremely dilute conditions reveal that dimers of CTD were the obvious physiologically relevant form for this domain.

Structure of The CTD dimer: The CTD exists in both crystal forms as an intimate domain swapped dimer. The domain swapping is brought about by interaction between β-strands of one monomer with surrounding helices and loops from the other monomer to form a reciprocated, closed domain swapped dimer akin to that seen in crystal structures of cystatin A and RNAseA(Janowski, Kozak et al. 2001; Newcomer 2001). Accordingly a 12 residue long β-strand β2 (295 and 307) constitutes the interface between the two monomers (Figure 2 bottom). The overall topology of the dimer of IBV-N can be said to be a concave β-stranded floor of ~400Å2 area with the topology β1B-β2B-β2A-β1A surrounded by helices and loops. The helices 3 and 4 connected by loop region arch over this floor and constitute the roof of the dimer. The 12 residue long α-helix α5 located at the extreme C-terminus of CTD forms an angled wall that flanks either side of the dimer and is held in place by a tight turn made up residues 307 to 310(Figure 1 and Figure 2).

The dimerization interactions are very tight and bury a surface area of 5780Å2. Neither the serine rich domain (161 to 191, Figure 0) nor disulfide bonding are important in protein oligomerization as was expected based on previous biochemical data. The two cysteine residues C228 and C281 lie in close proximity in the interior of the dimer and are not disulfide bonded to each other in this structure. The crystals and protein prep was performed in the absence of reducing agent so the non disulfide bond mediated interaction seen here is probably identical to that seen in the virus nucleocapsid . The integrity of the dimer observed in solution is apparent when one considers the ~5000 Å2 buried surface area involved in the dimerization.

The dimeric structure observed at pH 4.5 was almost identical to all four dimers observed at pH 8.5 with the rmsd. for Cα-atoms in the core region (233 to 328) being ~0.3 Å.

The N and C termini in the five dimers observed differed from dimer to deimer based on its stabilizing interactions with neighboring dimers in crystal (pH 4.4 case) or within the asymmetric unit (pH 8.5 case). Further insight into the nature of the CTD in the capsid or context of the virus can be got from looking at the crystal packing interactions in both spacegroups. The presence of a dimer in the ASU in one crystal form and 4 dimers in the asu in the other crystal form allowed the analysis of dimer-dimer interactions not only at different pHs but in the presence and absence of any constraints imposed by crystal packing.

Crystal packing interactions in CTD insights into stability of helical packing interactions: The two structures presented here result in five kinds of inter-dimer interactions. Crystal packing in crystal I is brought about by dimer-dimer interactions with the nth dimer interacting with n-1 dimer and n+1 dimer from neighboring ASU (Figure 4b). In crystal II with 4 dimers in the ASU inter-dimer interactions are responsible for keeping the four dimers in the ASU together as well as mediating crystal packing(Figure 4a) accordingly giving rise to four classes of dimer-dimer interactions. Three of them i.e AB-CD, CD-EF and crystal packing interaction GH with ABn+1 belong to one class and a new class of “dimer-dimer” interactions involves the interaction between the GH dimer with a different interface formed by the CD-EF dimer (Figure 4a).

The uniformity of all but the last kind of dimer-dimer interactions observed in two crystals is apparent from a superposition of all four types of dimer-dimer interactions observed between the two crystals whereby the dimers all superpose with a minimum of 0.3 XXXÅ rmsd and a maximum of 0.8XXX Å rmsd (Figure 4c and Figure 4d). When the three dimers from crystal I are superposed from the three dimers from crystal II the rmsd between them is ~1.0 Å (Figure 4c). This clearly indicates that the dimers tend to swivel only slightly w.r.t each other and constitute a subtle module that is very prone to interacting with itself.