Determinaiton of GC Content

Problem No 1

Determinaiton of GC Content

Among the four nucleotides, {A,T,C,G}, the ratio of C & G over a DNA sequence carries some very important signals. This ratio is measured through “GC-content %” using the following formula.

GC-content % = ((n(G)+n(C))/(len(DNA)))*100%

Where

n(G) = number of G in the sequence

n(C) = number of C in the sequence

len(DNA)= length of the DNA sequence in base-pair(bp)

write a python program that can perform as bellow.

Input

1. DNA sequence as a string

Output

1. Length of the sequence

2. GC-content %

Example

Input

ATCG

Output

Length of the seqeunce = 4 bp

GC-content % = 50%

Problem No 2

Complement DNA strand

DNA forms the double helix structure with two strands of DNA. Though when we work with DNA seqeunce, we usually talk about a single DNA-sequence (single strand). But in the chromosome DNA remains in a double stranded form. These two strands are called complement of each other. One is named as 5’-3’ (forward strand) and other is named as 3’-5’(reverse strand). When it is not explicitly mentioned the strand type (or direction), it is assumed that the respective DNA sequence is of 5’-3’ or forward strand.

5’---ACCGTA---3’

| | ||| |

3’---TGGCAT---5’

In a complement DNA strand each base of the original DNA sequence is replaced by the following inter-changing rule-

A is replaced by T and vice-versa

C is replaced by G and vice-versa

This is because, in the double helix structure A of one strand is connected with T of other strand with hydrogen bond and same in the case of C & G.

write a python program that can perform as bellow.

Input

1. DNA sequence as a string

Output

1. Complement of input DNA sequence

Example

Input

ACCGTA

Output

Complement DNA Sequence = TGGCAT

Problem No 3

Reverse Complement of a DNA Sequence

This problem can be thought as an extension of Problem No 2 (read Problem No 2 first). In bioinformatics analysis the concept of Reverse Complement DNA sequence is very often encountered. If the complement of a DNA sequence is reversed, this is called reverse-complement of the original DNA sequence.

5’---ACCGTA---3’

| | ||| |

3’---TGGCAT---5’

Here the complement of (5’---ACCGTA---3’) is (3’---TGGCAT---5’), and reverse of (3’---TGGCAT---5’) is TACGGT, so the reverse complement of ACCGTA is TACGGT.

write a python program that can perform as bellow.

Input

1. DNA sequence as a string

Output

1. Reverse Complement of input DNA sequence

Example

Input

ACCGTA

Output

Reverse Complement DNA Sequence = TACGGT

Problem No 4

Codon List from a DNA sequence

Triplets of nucleotides (for example ATT, TCG, CCC, etc) are called Codons. Through the process of Transcription and Translation these Codons of a DNA sequence become responsible to produce an amino acid individually. And finally chain of amino acids builds a protein. 64 (4x4x4) different codons are possible.

Lets think of a DNA sequence as ATTTCGAGGT. If we start parsing codons from left to right, the possible codons will be ATT, TCG, AGG (ignore the right most remaining part with length <3 bp, in this case T).

Write a function/python program that returns the list of codons for a DNA sequence. This program should return/print the list of codons as the “list” data structure of python.

Input

1. DNA sequence as a string

Output

1. List of Codons

Example

Input

ATTTCGAGGT

Output

Codon-List = [‘ATT’,’TCG’,’AGG’]

Problem No 5

Translate a DNA Sequence

Each codon represents an amino acid (skim through Problem No 4). The standard Codon-To-Amino Acid mapping table is called the “Standard Genetic Code Table” or “Codon-Table”. This is built for codons derived from RNA (detail will discussed in separate space beyond this problem), as a result you will find U instead of T. But for the simplicity, in this specific problem definition you should use the customized (for DNA) genetic code table.

Standard Genetic Code

U / C / A / G
U / UUU
UUC
UUA
UUG / UCU
UCC
UCA
UCG / UAU
UAC
UAA
UAG / UGU
UGC
UGA
UGG / U
C
A
G
C / CUU
CUC
CUA
CUG / CCU
CCC
CCA
CCG / CAU
CAC
CAA
CAG / CGU
CGC
CGA
CGG / U
C
A
G
A / AUU
AUC
AUA
AUG / ACU
ACC
ACA
ACG / AAU
AAC
AAA
AAG / AGU
AGC
AGA
AGG / U
C
A
G
G / GUU
GUC
GUA
GUG / GCU
GCC
GCA
GCG / GAU
GAC
GAA
GAG / GGU
GGC
GGA
GGG / U
C
A
G

Customized (for DNA) Genetic Code

ttt: F tct: S tat: Y tgt: C

ttc: F tcc: S tac: Y tgc: C

tta: L tca: S taa: * tca: *

ttg: L tcg: S tag: * tcg: W

ctt: L cct: P cat: H cgt: R

ctc: L ccc: P cac: H cgc: R

cta: L cca: P caa: Q cga: R

ctg: L ccg: P cag: Q cgg: R

att: I act: T aat: N agt: S

atc: I acc: T aac: N agc: S

ata: I aca: T aaa: K aga: R

atg: M acg: T aag: K agg: R

gtt: V gct: A gat: D ggt: G

gtc: V gcc: A gac: D ggc: G

gta: V gca: A gaa: E gga: G

gtg: V gcg: A gag: E ggg: G

There are 20 different amino acids. Detial table is as bellow.

20 Amino Acids and Their Codes

1-Letter code / 3-Letter Code / Name
1 / A / Ala / Alanine
2 / R / Arg / Arginine
3 / N / Asn / Asparagine
4 / D / Asp / Aspartic acid
5 / C / Cys / Cysteine
6 / Q / Gln / Glutamine
7 / E / Glu / Glutamic acid
8 / G / Gly / Glycine
9 / H / His / Histidine
10 / I / Ile / Isoleucine
11 / L / Leu / Leucine
12 / K / Lys / Lysine
13 / M / Met / Methionine
14 / F / Phe / Phenylalanine
15 / P / Pro / Proline
16 / S / Ser / Serine
17 / T / Thr / Threonine
18 / W / Trp / Thryptophan
19 / Y / Tyr / Tyrosine
20 / V / Val / Valine

Write a function/program that takes a DNA sequence and returns/prints the translated protein sequence (using the customized codon table, and representing amino acids using 1-letter codes). Ignore right-most incomplete codon of length <3 bp, as explained in Problem No 4.

Input

1. DNA sequence as a string

Output

1. Amino Acids Sequence of Protein

Example

Input

TTTCCTAATC

Output

Protein Sequence =FPN