Problem No 1
Determinaiton of GC Content
Among the four nucleotides, {A,T,C,G}, the ratio of C & G over a DNA sequence carries some very important signals. This ratio is measured through “GC-content %” using the following formula.
GC-content % = ((n(G)+n(C))/(len(DNA)))*100%
Where
n(G) = number of G in the sequence
n(C) = number of C in the sequence
len(DNA)= length of the DNA sequence in base-pair(bp)
write a python program that can perform as bellow.
Input
1. DNA sequence as a string
Output
1. Length of the sequence
2. GC-content %
Example
Input
ATCG
Output
Length of the seqeunce = 4 bp
GC-content % = 50%
Problem No 2
Complement DNA strand
DNA forms the double helix structure with two strands of DNA. Though when we work with DNA seqeunce, we usually talk about a single DNA-sequence (single strand). But in the chromosome DNA remains in a double stranded form. These two strands are called complement of each other. One is named as 5’-3’ (forward strand) and other is named as 3’-5’(reverse strand). When it is not explicitly mentioned the strand type (or direction), it is assumed that the respective DNA sequence is of 5’-3’ or forward strand.
5’---ACCGTA---3’
| | ||| |
3’---TGGCAT---5’
In a complement DNA strand each base of the original DNA sequence is replaced by the following inter-changing rule-
A is replaced by T and vice-versa
C is replaced by G and vice-versa
This is because, in the double helix structure A of one strand is connected with T of other strand with hydrogen bond and same in the case of C & G.
write a python program that can perform as bellow.
Input
1. DNA sequence as a string
Output
1. Complement of input DNA sequence
Example
Input
ACCGTA
Output
Complement DNA Sequence = TGGCAT
Problem No 3
Reverse Complement of a DNA Sequence
This problem can be thought as an extension of Problem No 2 (read Problem No 2 first). In bioinformatics analysis the concept of Reverse Complement DNA sequence is very often encountered. If the complement of a DNA sequence is reversed, this is called reverse-complement of the original DNA sequence.
5’---ACCGTA---3’
| | ||| |
3’---TGGCAT---5’
Here the complement of (5’---ACCGTA---3’) is (3’---TGGCAT---5’), and reverse of (3’---TGGCAT---5’) is TACGGT, so the reverse complement of ACCGTA is TACGGT.
write a python program that can perform as bellow.
Input
1. DNA sequence as a string
Output
1. Reverse Complement of input DNA sequence
Example
Input
ACCGTA
Output
Reverse Complement DNA Sequence = TACGGT
Problem No 4
Codon List from a DNA sequence
Triplets of nucleotides (for example ATT, TCG, CCC, etc) are called Codons. Through the process of Transcription and Translation these Codons of a DNA sequence become responsible to produce an amino acid individually. And finally chain of amino acids builds a protein. 64 (4x4x4) different codons are possible.
Lets think of a DNA sequence as ATTTCGAGGT. If we start parsing codons from left to right, the possible codons will be ATT, TCG, AGG (ignore the right most remaining part with length <3 bp, in this case T).
Write a function/python program that returns the list of codons for a DNA sequence. This program should return/print the list of codons as the “list” data structure of python.
Input
1. DNA sequence as a string
Output
1. List of Codons
Example
Input
ATTTCGAGGT
Output
Codon-List = [‘ATT’,’TCG’,’AGG’]
Problem No 5
Translate a DNA Sequence
Each codon represents an amino acid (skim through Problem No 4). The standard Codon-To-Amino Acid mapping table is called the “Standard Genetic Code Table” or “Codon-Table”. This is built for codons derived from RNA (detail will discussed in separate space beyond this problem), as a result you will find U instead of T. But for the simplicity, in this specific problem definition you should use the customized (for DNA) genetic code table.
Standard Genetic Code
U / C / A / GU / UUU
UUC
UUA
UUG / UCU
UCC
UCA
UCG / UAU
UAC
UAA
UAG / UGU
UGC
UGA
UGG / U
C
A
G
C / CUU
CUC
CUA
CUG / CCU
CCC
CCA
CCG / CAU
CAC
CAA
CAG / CGU
CGC
CGA
CGG / U
C
A
G
A / AUU
AUC
AUA
AUG / ACU
ACC
ACA
ACG / AAU
AAC
AAA
AAG / AGU
AGC
AGA
AGG / U
C
A
G
G / GUU
GUC
GUA
GUG / GCU
GCC
GCA
GCG / GAU
GAC
GAA
GAG / GGU
GGC
GGA
GGG / U
C
A
G
Customized (for DNA) Genetic Code
ttt: F tct: S tat: Y tgt: C
ttc: F tcc: S tac: Y tgc: C
tta: L tca: S taa: * tca: *
ttg: L tcg: S tag: * tcg: W
ctt: L cct: P cat: H cgt: R
ctc: L ccc: P cac: H cgc: R
cta: L cca: P caa: Q cga: R
ctg: L ccg: P cag: Q cgg: R
att: I act: T aat: N agt: S
atc: I acc: T aac: N agc: S
ata: I aca: T aaa: K aga: R
atg: M acg: T aag: K agg: R
gtt: V gct: A gat: D ggt: G
gtc: V gcc: A gac: D ggc: G
gta: V gca: A gaa: E gga: G
gtg: V gcg: A gag: E ggg: G
There are 20 different amino acids. Detial table is as bellow.
20 Amino Acids and Their Codes
1-Letter code / 3-Letter Code / Name1 / A / Ala / Alanine
2 / R / Arg / Arginine
3 / N / Asn / Asparagine
4 / D / Asp / Aspartic acid
5 / C / Cys / Cysteine
6 / Q / Gln / Glutamine
7 / E / Glu / Glutamic acid
8 / G / Gly / Glycine
9 / H / His / Histidine
10 / I / Ile / Isoleucine
11 / L / Leu / Leucine
12 / K / Lys / Lysine
13 / M / Met / Methionine
14 / F / Phe / Phenylalanine
15 / P / Pro / Proline
16 / S / Ser / Serine
17 / T / Thr / Threonine
18 / W / Trp / Thryptophan
19 / Y / Tyr / Tyrosine
20 / V / Val / Valine
Write a function/program that takes a DNA sequence and returns/prints the translated protein sequence (using the customized codon table, and representing amino acids using 1-letter codes). Ignore right-most incomplete codon of length <3 bp, as explained in Problem No 4.
Input
1. DNA sequence as a string
Output
1. Amino Acids Sequence of Protein
Example
Input
TTTCCTAATC
Output
Protein Sequence =FPN
7