Instructions for whole genome assembly and annotation using CLC Genomics
- Log on to Virtual Machine by Remote Desktop
- Start -> All programs -> Accessories -> Remote desktop Connection
- IP address: inbrewin.unmc.edu
- ID: IBLab01 – 10 [ids will be assigned to avoid login conflict]
- Password is written down on the blackboard
- Accept certificate
- Click on CLC Genomics icon (right click, run as administrator)
- Allow instructor to open software
- Import Example data (needed for some examples)
- Select Help menu point
- Import Example Data
- Working environment
- Data folders
- Toolbox and ongoing/finished processes
- Processes: stop, pause, resume
- Main view area (center)
- Sequence settings and formatting parameters (right)
- Toolbars (top left): New, Import, Export, Download, Toolbox
- Import data
- Download fastq files from
- Download takes about 15’, in the meantime we will go through power point
- When you’re done, unzip the .zip file (this also takes a few minutes)
- In data area at top left: New, Folder, Name: X5
- Also make folder X8 for strain 8 and other for other exercises
- In folder area you can also right click and make a new folder
- Import file deNovoX8_contigs.fasta into the folder created for strain 8
- Import button (top left), Illumina, look for paired end set of fastq files
- Check “Paired reads”, Next, Save
- Download menu, Search for Sequences at NCBI
- Enter JF411744, Start Search
- Highlight sequence, Download and Save to folder X5
- Same for ATCV-1: NC_008724, save to folder X8
- Read mapping with guided assembly
- QC report: NGS Core tools, Create sequencing QC report
- Select two fastq files in strain 5 folder
- Results:
- Lengths distribution
- Quality distribution, PHRED score
- Trim sequences: NGS Core Tools, Trim Sequences
- Discard reads below certain length (50 bp)
- Save the trimmed sequences
- Just to double check, rerun the QC report on the trimmed sequences
- Select trimmed sequences, then run NGS Core tools, Create sequencing QC report
- Nucleotide contribution, Quality distribution, GC content should even out towards the end
- Map Reads to Reference, select trimmed sequence, select ATCV-1 reference
- NGS Core tools, Remove Duplicate Mapped Reads
- Mapping report: NGS Core tools, Create detailed mapping report
- Look at Coverage statistics, mapped reads, read length distribution
- Get the whole genome sequence
- NGS Core tools, Extract Consensus Sequence
- Choose Insert ‘N’ ambiguity symbols
- There is also the option to fill in the sequence from the reference sequence
- Save and rename sequence
- Format sequence (right panel)
- Select Fixed wrap, every 80 residues to make sequence more viewable
- Set spacing (e.g. every 10 residues)
- Number on sequences, number of strands (single, double)
- Press Export menu button
- Select type Fasta, provide custom file name, select directory, finish
- Sequence analysis: Classical Sequence Analysis,General Sequence Analysis, Create Sequence Statistics
- Note length, %N
- Variant analysis
- Select mapping file that was produced in step 5
- Resequencing analysis, Variant detectors
- Basic
- Fixed ploidy (set ploidy to 1 for microbes) (we need to wait for a while)
- Low frequency
- Check off Create annotated table
- Results: gives a great variety of variant information: position, type, variant length, referencebp, allelebp, coverage, amino acid change, etc…
- Tabular results exportable in Excel format
- De Novo Sequencing
- Import fastq files for X8 strain (top left menu item)
- Trim fastq files: NGS Core Tools, Trim sequences
- De novo sequencing , De novo assembly (at bottom)
- This takes a long time, so continue with file deNovoX8_contigs.fasta in step g
- Steps d-f are for producing this file
- Results table: consensus length, total read count, average coverage
- Highlight all rows in table, then press Extract Contigs button at bottom
- Rename file
- Classical seq. analysis, General seq. analysis, Join sequences
- Select contigs
- Save and change name
- Sequence similarity
- Open up the ATP8a1 ortholog alignment in the Example Data, Protein orthologs
- Classical seq. anal., Alignments and Trees, Create Pairwise Comparison
- Select the alignment that you just saved
- What are the number of similarities between Q29449 and P57792? P39524 and O94296?
- Secondary structure prediction for mRNA
- Classical seq. anal., RNA structure, predict secondary structure
- Select ATP8a1 mRNA
- Running application takes a while…
- Look at sequence
- Take note of elements: e.g. stem, bulge, hairpin loop
- Element list
- Take note of region, type, and qualifiers (free energy)
- Take a look at the secondary structure itself
- Individual elements also listed
- Motif search
- Select ATP8a1 genomic sequence
- Classical seq. analysis, General seq. analysis, Motif search
- Search for the motif CAACGCCCAA with 80% accuracy
- Enter CAACGCCCAA into Search string
- Uncheck “Include negative strand” (checkbox)
- Exclude regions with N’s (checkbox)
- How many hits are there, and at what positions?