Instructions for whole genome assembly and annotation using CLC Genomics

  1. Log on to Virtual Machine by Remote Desktop
  2. Start -> All programs -> Accessories -> Remote desktop Connection
  3. IP address: inbrewin.unmc.edu
  4. ID: IBLab01 – 10 [ids will be assigned to avoid login conflict]
  5. Password is written down on the blackboard
  6. Accept certificate
  7. Click on CLC Genomics icon (right click, run as administrator)
  8. Allow instructor to open software
  9. Import Example data (needed for some examples)
  10. Select Help menu point
  11. Import Example Data
  12. Working environment
  13. Data folders
  14. Toolbox and ongoing/finished processes
  15. Processes: stop, pause, resume
  16. Main view area (center)
  17. Sequence settings and formatting parameters (right)
  18. Toolbars (top left): New, Import, Export, Download, Toolbox
  1. Import data
  2. Download fastq files from
  3. Download takes about 15’, in the meantime we will go through power point
  4. When you’re done, unzip the .zip file (this also takes a few minutes)
  5. In data area at top left: New, Folder, Name: X5
  6. Also make folder X8 for strain 8 and other for other exercises
  7. In folder area you can also right click and make a new folder
  8. Import file deNovoX8_contigs.fasta into the folder created for strain 8
  9. Import button (top left), Illumina, look for paired end set of fastq files
  10. Check “Paired reads”, Next, Save
  11. Download menu, Search for Sequences at NCBI
  12. Enter JF411744, Start Search
  13. Highlight sequence, Download and Save to folder X5
  14. Same for ATCV-1: NC_008724, save to folder X8
  15. Read mapping with guided assembly
  16. QC report: NGS Core tools, Create sequencing QC report
  17. Select two fastq files in strain 5 folder
  18. Results:
  19. Lengths distribution
  20. Quality distribution, PHRED score
  21. Trim sequences: NGS Core Tools, Trim Sequences
  22. Discard reads below certain length (50 bp)
  23. Save the trimmed sequences
  24. Just to double check, rerun the QC report on the trimmed sequences
  25. Select trimmed sequences, then run NGS Core tools, Create sequencing QC report
  26. Nucleotide contribution, Quality distribution, GC content should even out towards the end
  27. Map Reads to Reference, select trimmed sequence, select ATCV-1 reference
  28. NGS Core tools, Remove Duplicate Mapped Reads
  29. Mapping report: NGS Core tools, Create detailed mapping report
  30. Look at Coverage statistics, mapped reads, read length distribution
  31. Get the whole genome sequence
  32. NGS Core tools, Extract Consensus Sequence
  33. Choose Insert ‘N’ ambiguity symbols
  34. There is also the option to fill in the sequence from the reference sequence
  35. Save and rename sequence
  36. Format sequence (right panel)
  37. Select Fixed wrap, every 80 residues to make sequence more viewable
  38. Set spacing (e.g. every 10 residues)
  39. Number on sequences, number of strands (single, double)
  40. Press Export menu button
  41. Select type Fasta, provide custom file name, select directory, finish
  42. Sequence analysis: Classical Sequence Analysis,General Sequence Analysis, Create Sequence Statistics
  43. Note length, %N
  44. Variant analysis
  45. Select mapping file that was produced in step 5
  46. Resequencing analysis, Variant detectors
  47. Basic
  48. Fixed ploidy (set ploidy to 1 for microbes) (we need to wait for a while)
  49. Low frequency
  50. Check off Create annotated table
  51. Results: gives a great variety of variant information: position, type, variant length, referencebp, allelebp, coverage, amino acid change, etc…
  52. Tabular results exportable in Excel format
  53. De Novo Sequencing
  54. Import fastq files for X8 strain (top left menu item)
  55. Trim fastq files: NGS Core Tools, Trim sequences
  56. De novo sequencing , De novo assembly (at bottom)
  57. This takes a long time, so continue with file deNovoX8_contigs.fasta in step g
  58. Steps d-f are for producing this file
  59. Results table: consensus length, total read count, average coverage
  60. Highlight all rows in table, then press Extract Contigs button at bottom
  61. Rename file
  62. Classical seq. analysis, General seq. analysis, Join sequences
  63. Select contigs
  64. Save and change name
  65. Sequence similarity
  66. Open up the ATP8a1 ortholog alignment in the Example Data, Protein orthologs
  67. Classical seq. anal., Alignments and Trees, Create Pairwise Comparison
  68. Select the alignment that you just saved
  69. What are the number of similarities between Q29449 and P57792? P39524 and O94296?
  70. Secondary structure prediction for mRNA
  71. Classical seq. anal., RNA structure, predict secondary structure
  72. Select ATP8a1 mRNA
  73. Running application takes a while…
  74. Look at sequence
  75. Take note of elements: e.g. stem, bulge, hairpin loop
  76. Element list
  77. Take note of region, type, and qualifiers (free energy)
  78. Take a look at the secondary structure itself
  79. Individual elements also listed
  80. Motif search
  81. Select ATP8a1 genomic sequence
  82. Classical seq. analysis, General seq. analysis, Motif search
  83. Search for the motif CAACGCCCAA with 80% accuracy
  84. Enter CAACGCCCAA into Search string
  85. Uncheck “Include negative strand” (checkbox)
  86. Exclude regions with N’s (checkbox)
  87. How many hits are there, and at what positions?