RIIMS BIOINFORMATICS AND BIO-MOLECULAR COMPUTING
- BIOINFORMATICS AND BIO-MOLECULAR COMPUTING
1. INTRODUCTION:
1.1 What is BIOINFORMATICS?
Bioinformatics = Biology + Information technology.
Can be defined as the body of tools, algorithms needed t o handle large and complex biological information.
Bioinformatics is a new scientific discipline created from the interaction of biology and computer.
The NCBI defines bioinformatics as:
"Bioinformatics is the field of science in which biology, computerscience, and information technology merge into a single discipline”
Fig 1.The interrelationship of the different subjects of sciences
Easy Answer -Using computers to solve molecular biology problems
Hard Answer - Computational techniques for management and analysis of biological data and knowledge.
What do you need to know?
It all depends on your background
Are you a … ?
Biologist with some computer knowledge or
Computer scientist with some biology knowledge,
Few do both well
“Bioinformatics is not just using a computer to store data, or speed up biology.With bioinformatics, you do biological hypothesis testing on a computer.”
“Bioinformatics combines the tools of Biology, Chemistry, Mathematics, Statistics and Computer Science to understand Life & its processes.”
Bioinformatics would not possible without advances in computing hardware and software: analysis of algorithms, datastructures and software engineering.
Fig 2.Revolution in biology
1.2 DNA ROLE:
DNA contains the instructions needed for a living organism to grow and function.
It tells cells exactly what role they should play in the body. It holds instructions to make your:
•Heart cells beat.
•Limbs form in the correct place.
•Immune system fight infection.
•Digestive system digest your dinner.
1.3 GENOMICS ERA:
High-throughput DNA sequencing.The first high-throughput genomics technology was automated DNA sequencingin the early 1990.
Baker’s yeast, Saccharomyces cerevisiae(15 million bp), was the first eukaryotic genome to be sequenced. In September 1999, Celera Genomics completed the sequencing of the Drosophila genome.The 3-billion-bp human genome sequence was generated in a competition between the publicly funded Human GenomeProject and Celera.Bioinformatics bridges many disciplines.
2.Biological Molecular Databases:
The current scenario a total of 387 biological databases listed in thespecialdatabase include Primary and Derived databases
•Sequences and Structures
•Genomic, Proteomic and related data
•Intermolecular interactions
•Metabolic pathways and cellular regulation
•Mutation (SNPs and others)
•Pathology
•Transgenics etc.
3.BIOINFORMATICS AND COMPUTER SCIENCE CURRICULA
According to LeBlanc and Dyer,“Genomic research intersected with 10 of the 14 knowledge focus groups involving at least 36 of the 132 units.”
It means that background in computer science is necessary for the bioinformatics curricula development and specialists’ preparation in bioinformatics.
For example, computerscience topics of algorithms, software engineering and databases are linked with biology topics ofcell evaluation and genetics. Linking the courses of biology and computer science topics .The group of algorithms highly relevant for computational statistics from computer science is machine learning, artificial intelligence (AI), and knowledge discovery indatabases or data mining.
4.DATA MINING:
Data Mining is the process of extracting knowledge hidden in large volumes of raw data. Data mining automates the process of finding relationships and patterns in raw data and delivers results that can be either utilized in an automated decision support system or assessed by a human analyst.Data mining techniquescan be divided into two broad categories: “predictive data mining and discovery data mining”.
- Predictive techniques: Classification, Regression.
- Discovery techniques:Association Analysis, Sequence Analysis, Clustering.
4.1 Predictive data mining:
Itis applied to a range of techniques that find relationship between a specific variable (called target variable) and the other variables in your data.
Classification is about assigning data records into pre-defined categories.
In this case the target variable is the category and the techniques discover
the relationship between the other variables and the category.Regression is about predicting the value of a continuous variable from the other variables in a data record.
The most familiar value prediction techniques include linear and polynomial regression.
4.2 Discovery data mining:
Itis applied to range of techniques that find patterns inside your data without any prior knowledge of what patterns exist. The following are examples of discovery mining techniques:
Clustering is the term for range of techniques, which attempts to group data
records on the basis of how similar they are.
Association and sequence analysis describes a family of techniques that
determines relationship between data records.
Where is the knowledge we have lost in information?
Where is the wisdom we have lost in knowledge?
--T.S. Elliot, "The Rock"
4.4 Knowledge Discovery:
Data mining is one stage in an overall knowledge discovery process. This process involves selection and sampling of the appropriate data from the database(s); preprocessing and cleaning of the data to remove redundancies, errors, and conflicts; transforming and reducing data to a format more suitable for the data mining; data mining; evaluation of the mined data; and visualization of the evaluation results.
Fig 3 Data mining in the larger context of the knowledge discovery process.
Development of new tools for data mining.
- Sequence alignment
- Genome sequencing
- Genome comparison
- Micro array data analysis
- Proteomics data analysis
- Small molecular array analysis.
To derive “information” and gain “knowledge” from the data
Bioinformatics is a good career path for computer scientists.
Create tools adapted to the individual needs.Access the vast amount of biological informationstored in data bases.
APPLICATIONS:
Bioinformatics is the use of IT in biotechnology for the data storage, data warehousing and analyzing the DNA sequences. In Bioinfomatics knowledge of many branches are required like biology, mathematics, computer science, laws of physics & chemistry, and of course sound knowledge of IT ... Microbial genome applications;
Why Should Biologists Use Computers?
Computers are powerful devices for understanding any system that can be described in a mathematical way. As our understanding of biological processes has grown and deepened, it isn't surprising, then, that the disciplines of computational biology and, more recently, bioinformatics, have evolved from the intersection of classical biology, mathematics, and computer science.
BIOINFORMATICS FOR VARIOUS FIELDS:
- Molecular medicine,Antibiotic resistance, Forensic analysis of microbes
- Bio-weapon creation,Evolutionary studies,Crop improvemen,Insect resistance
- Improve nutritional quality,Development of Drought resistance varieties
- Vetinary Science, Personalisedmedicine,Preventative medicine
- Gene therapy,Drug development,Microbial genome applications
- Waste cleanup,Climate change Studies,Alternative energy sources
- Biotechnology
5. ADVANTAGES:As computing power increases and our databases of genetic and molecular information expand, the real of bioinformatics is sure to grow and change drastically, allowing us to build models of incredible complexity and utility.It was through the combined efforts of using information technology and computer science that allowed them to create a large database capable of housing and securely storing all of the valuable work that was being done with studies on DNA. The database that was created allowed scientists to be able to access millions of records of parts and molecules of DNA sequences from different species to compare to the work that was currently being done.Combines the opportunity for a flexible response with ability to determine frequencies, correlations & quantitative analyses. Particularly useful for tapping attitudes, perceptions, and opinions.
6. LIMITATIONS:
Cannot discriminate sensitivity and subtlety from the data.
No assumption of equal intervals.
No check on whether respondents are telling the truth.
One persons ‘strongly agree’ may be another’s ‘weakly agree’.
7. CONCLUSION:
I Collected information from a variety of sources regarding the composition of core bioinformatics curricula
There is an overwhelming consensus that the next generation of Bioinformaticians must be trained from scratch as Biologist+Computer Scientist challenging the orthogonal traditional view.
8.BIBILOGRAPHY:
LeBlanc, M. D. and Dyer, B. D. (2005). Bioinformatics and Computing Curricula 2001. Why
Computer Science is well positioned in a post-genomic world,
- BIO-MOLECULAR COMPUTING
1. DEFINITION:
Molecular computing is an emerging field to which chemisty biophysics, molecular biology, electronic engineering,solid state physics and computer science contribute to a large extent.
It involves the encoding, manipulation and retrieval
of information at a macromolecular level in contrast to the current techniques, which accomplish the above functions via IC miniaturization of bulk devices.
1.1 THE AIM OF THIS ARTICLE:
Is to exploit these characteristics to build computing systems, which have many advantages over their inorganic .
Counterparts.
DNA computing began in 1994 when Leonard Adleman proved thatDNA computing was possible by finding a solution to a real- problem, a Hamiltonian Path Problem, known to us as the Traveling Salesman Problem,with amolecular computer.In theoretical terms, some scientists say the actual beginnings of DNA computation should be attributed to
Charles Bennett's work. Adleman, now considered the father of DNA computing, is a professor at the
University of Southern California and spawned the field with his paper, "Molecular Computation of Solutions of Combinatorial Problems."Since then, Adleman has demonstrated how the massive parallelism of a trillion DNA strands can simultaneously attack different aspects of a computation to crack even the toughest combinatorial problems.
1.2 Adleman's Traveling Salesman Problem:
The objective is to find a path from start to end going through all the points only once. This problem is difficult forconventional computers to solve because it is a "non-deterministic polynomial time problem".
These problems, when they involve large numbers, are intractable with conventional computers, but can be solved using massively parallel computerslike DNA computers. The Hamiltonian Path problem was chosen by Adleman because it is known problem.
The following algorithm solves the Hamiltonian Path problem:
Generate random paths through the graph.
Keep only those paths that begin with the start city (A) and conclude with the
end city (G).
If the graph has n cities, keep only those paths with n cities. (n=7)
Keep only those paths that enter all cities at least once.
Any remaining paths are solutions.
The key was using DNA to perform the five steps in the above algorithm. Adleman's first step was to synthesize DNA strandsof known sequences, each strand 20 nucleotides long. He represented each of the six vertices of the path by a separate strand, and further.He represented each of the six vertices of the path by a separate strand, and further represented each edge between two consecutive vertices, such as 1 to 2, by a DNA strand which consisted of the last ten nucleotides of the strand representing vertex 1 plus the first 10 nucleotides of the vertex 2 strand.Then, through the sheer amount of DNA molecules (3x1013 copies for each edge in this experiment!) joining together in all possible combinations, many random paths were generated.Adleman used well-established techniques of molecular biology to weed out the Hamiltonian path, the one that entered all vertices, starting at one and ending at six.After generating the numerous random paths in the first step, he used polymerase chain reaction (PCR) to amplify and keep only the paths that began on vertex 1 and ended at vertex 6.The next two steps kept only those strands that passed through six vertices, entering each vertex at least once. At this point, any paths that remained would code for a Hamiltonian path, thus solving the problem
1.3 Open Problem:
DNA Strand Engineering
Given a DNA strand, there are polynomial-time algorithms that predict the secondary structure of the strand.
1.3.1 Inverse Problem: find an efficient algorithm that, given a desired secondary structure, generates a strand with that structure.
1.4 Applications:
–Information storage, retrieval for DNA computing
–Molecular bar codes for chemical libraries.
1.5 DNA Computing on Surfaces
DNA computers will be thousands of times smaller and more powerful than silicon based computers. One pound of DNA has ability to store more data than every electronic devices ever made to date. A water droplet sized DNA computers will have more computing power than today's most powerful supercomputers. Another advantage of DNA computing over silicon based computers is the ability to do parallel calculations. Silicon based microprocessors can only do on calculation at a time while DNA computer will be able to do many simultaneous calculations. The creation of practical DNA computers will start a whole new computer revolution. Conclusion
The idea of DNA Based computing is to subvert the mechanisms produced by evolution and use them to do data processing we want to do.
Advantages
- By DNA computing, people can get and analyze the information of materials. Through DNA computing, we can find all the genes in the DNA sequence and to develop tools for using this information in the study of some fields, such as biology, medicine biology, physics, and so on. The team from HP and U.C.L.A. has found a way to build circuits using chemical processes, making the switches as small as a molecule. Tim Gardner, a graduate student at Boston University, recently made a genetic system that can store a single bit of information—either a 1 or a 0.
- More parallel: for some problem too big to fit or run in a silicon machine, DNA computer, which be with pure parallel power or massive memory, will be able to do a computation quickly than a powerful supercomputer.
- By creating DNA computing, some fields are combined together to reach a desirable goal, by the way, this is improving those fields and some new fields come out.
Disadvantages of DNA Computing:
- Slow: algorithms proposed so far use really slow molecular-biological operations. Each primitive operation takes hours when you run them with a small test tube of DNA. Scale up to the vast amounts of DNA we're talking about, and they may slow down dramatically.
- Hydrolysis: the DNA molecules can fracture. Over the six months you're computing, your DNA system is gradually turning to water. DNA molecules can break – meaning a DNA molecule, which was part of your computer, is fracture by time.
- Unreliable: every operation you want to do in your DNA computer is random. The components in the DNA computer are probabilistic. Because there are some noisy components, the computing sometimes is not reliable. If a tiny subcircuit is supposed to give the answer "1," it may yield that answer 90 percent of the time and "0" the rest of the time. To make DNA computing work, we have to figure out how to build a reliable computer out of noisy components.
- Not transmittable: the model of the DNA computer is concerned as a highly parallel computer, with each DNA molecule acting as a separate process or. In a standard multiprocessor a Connection-buses transmit information from one processor to the next. But the problem of transmitting information from one molecule to another in a DNA computer has yet to be solved. Current DNA algorithms compute successfully without passing any information, but this limits their flexibility.
- Not practical: DNA computing is not a here and now practical technology. It just is a pie-in-the-sky research project.
- No generality: Some concrete algorithms are just for solving some concrete problems. Every algorithm has some constraints on it.
- The process is extremely laborious and time-consuming
- Radioactive probes pose health and disposal risks (although chemiluminescent technology eliminated this risk)
- A relatively large amount of sample is required to perform the tests
- The method requires high molecular weight, un-degraded DNA
- The use of yield gels is an essential, but time consuming, step in the analysis not only to estimate the amount of DNA recovered but also to determine the suitability of the sample for analysis
Advantages of DNA microarray tests include high throughput (lots of information with one test), and good coverage of the genome with the chips that have larger numbers of test spots.
Disadvantages include incomplete coverage, which can lead to false normal results, and the ability to test only for unbalanced rearrangements (duplications and deletions), and not balanced translocations or inversions.
Advantages over “solution phase” chemistry
- Facile purification steps
- Reduced interference between strands
- Easily automated
1.6 Disadvantages:
1.6.1 Loss of information density (2D)
- Lower surface hybridization efficiency
- Slower enzyme kinetics
The Future of Bioinformatics
We are currently witnessing a technological revolution. With the increase of sequencing projects, bioinformatics continues to make considerable progress in biology by providing scientists with access to the genomic information. This progress is especially contributed by the Human Genome Project. The information obtained with the help of Bioinformatics tools furthers our understanding of various genetic and other diseases and helps identify new drug targets. With technological developments of the Internet, scientists are now able to freely access volumes of such biological information, which enables the advancement of scientific discoveries in biomedicine.
In spite of being young, the science of Bioinformatics exhibits tremendous potential for playing a major role in the future development of science and technology. This is evident from the fact that modern biology and related sciences are increasingly becoming dependent on this new technology. It is expected that Bioinformatics will especially contribute in the future as the leading edge in biomedicine to pharmaceutical companies by expediently yielding a greater quantity of lead drugs for therapy.