1. Motivation:

Suppose that you had to find a needle in a haystack. How would you go about doing it and how fast would you be able to do it? These are questions that many Biologists have been trying to answer. Only, their haystack is the DNA, the needle is the promoter and the ‘thing’ that does the search is the protein molecule.

The protein molecule acts like a tiny machine capable of making decisions. Just like any physical machine, this molecular machine dissipates energy and is affected by the thermal noise surrounding it. Surprisingly, there is a correspondence between these molecular machines making their vital decisions and a receiver in a communication system. Many Molecular Biologists applying Shannon’s communication theory to Biological systems. This has led to many interesting results and we will take a look at some of them in this review.

2. What are Molecular Machines?

Let us first understand what exactly we mean by a molecular machine. There are millions of chemical and biological reactions that take place in our body. Only some of these processes can be considered as performed by molecular machines. To be called a molecular machine, the process should satisfy certain conditions.

1.  A molecular machine is a single macromolecule or a macro molecular complex and not a macroscopic reaction. Unlike a macroscopic chemical reaction, the molecular machine does not stop its operation once equilibrium is reached.

2.  A molecular machine performs a specific function. The molecular machines are very important for the survival of the organism.

3.  A molecular machine is usually energized before performing its function. The source of this energy can be ATP, photons or any microscopic process such as thermal fluctuations.

4.  A molecular machine dissipates energy while performing its function. Therefore the energy of the machine before performing the function, called the ‘Before’ state is higher than the energy in the ‘After’ state.

5.  A molecular machine gains information by selecting from a set of possible states.

6.  Molecular machines are isothermal engines. The machines are usually embedded in a huge bath and have no way of insulating themselves.

Although the conditions that should be satisfied to make a process a molecular machine seem very stringent, there are many biological processes that can be considered molecular machines. Some examples are as follows:

·  When DNA is fragmented into 400 base-pair long units and heated, the fragments get into a high-energy state. This results in the separation of the double helix into single strands. When the solution is slowly cooled, the single strands bind to their complementary strands and reform the helix. The DNA acts like a molecular machine because of the precision with which this process occurs.

·  A ribosome is a collection of proteins and RNAs. During protein synthesis, ribosome plays a very crucial part. The synthesis always starts at a ribosome binding site on the gene. The ribosome needs to find this binding site from among millions of other bases. For example, in E.Coli, there are approximately 2600 genes, each of which start with a ribosome binding site. These 2600 binding sites have to be found among 4.7 million base pairs. The ribosome performs this daunting task with machine like precision. An important question that needs to be asked to understand the process is to ask how many choices must the ribosome make before it can find the binding site. Information theory helps in answering some of these questions.

3.  The power of the Logarithm:

To understand Information theory we first need to understand the power of the logarithmic function, which is used extensively in information theory. To get an intuitive idea about the meaning of this function let us consider a simple mathematical problem. You have eight boxes as shown in Fig.1 and one of the boxes has a candy in it. Your task is to find out which box has the candy by asking questions. One obvious question you might ask is ’does this box have a candy in it?’ You would need to ask a maximum of eight questions before you found the candy. The more efficient way would be ask the following three questions:

1.  Is it in the left four boxes?

2.  Is it in the front four boxes?

3.  Is it in the bottom four boxes?

The answer to these three questions will always find the candy for you.

Fig.1 Where is the candy?

If you double the number of boxes, the number of questions you need to ask increases by one.

A ribosome searches by looking for patterns undergoing random Brownian motion and not by asking questions. To model this, we should find a way of labeling our boxes. Suppose now we decide to label the boxes using binary digits. One way of doing that would be to use 0 if the answer to your question is ‘No’ and 1 if the answer is ‘Yes’. The labels for the eight boxes would look like, 000, 001, 010, 011, 100, 101, 110, 111. We would need three digits to identify each of the boxes unambiguously. So if I hid the candy in a box labeled 011, the ribosome would randomly search until it found the pattern 011.

Suppose now that I put a candy in two of the boxes. Now I can drop the first digit and name the boxes as 00, 01, 10, 11, 00, 01, 10, and 11. We now have pairs of boxes with identical labels. I can now say that the candy is in both the boxes labeled 01. The ribosome would randomly search for the label 01.

The number of binary digits you need or equivalently, the number of questions you need to ask determines how fast you can find the candy. If you had 1 candy in 8 (=23 ) boxes you need log23 = 3 digits. If you had 2 (=21 ) candies in 8 (=23 ) boxes you need log23- log21 =3-2 =2 questions. In general, if you have n boxes with m candies you need Log(n)-Log(m) questions. In the case of E. Coli, the ribosome needs to find 2600 candies out of 4.7 million boxes. It needs Log(4.7 million) – Log(2600) = 10.8 digits or questions to find the candies.

The logarithm function gives us an intuitive way of thinking about information and we will use it extensively in this review.

4.  Components of a Communication System:

We will start our exploration of the correspondence between molecular machines and communication systems by first understanding the different components that make up a communication system.

The communication system essentially consists of five parts as shown schematically in Fig.2


Fig 2. Components of a communication system

An information source produces the message that needs to be communicated. The message may be discrete or continuous and can take many different forms like sound as in telephones, letters as in telegraphy etc. A transmitter converts this message into a signal that can be transmitted. A channel is the medium through which the signal can be transmitted from the source to the receiver. The receiver performs the inverse operation of the transmitter and converts the signal into its original form. The destination is the person or entity that finally receives the message. The communication system may be perturbed by noise at either one of the terminals or in the channel.

In a communication system, the receiver sends one message out of a set of possible messages. The system must be designed to operate for every possible message and not just for the message that was actually sent. The measure of information when one of the messages is selected from a set is the logarithm of the number of possible messages. Communication systems can be discrete, continuous or mixed based on the kind of messages they send. We first consider a discrete, noiseless communication system.

5. Discrete Noiseless system:

A discrete communication system is one in which a sequence of choices from a set of symbols can be transmitted. Telegraphy is an example of a discrete communication system in which a sequence of 32 symbols (26 letters and 6 punctuation marks) is transmitted. The first important question to ask is how can we represent an information source mathematically? In the case of telegraphy, we considered the information source to be an entity that produces a sequence by making choices out of 32 available symbols. These symbols are not completely random since they need to follow the rules of the English language. The choice of a symbol depends in general on the symbols before it and the probability to go from one symbol to another. A process that produces a sequence of symbols based on a set of probabilities is called a stochastic process. We may hence consider a discrete information source to be represented by a stochastic process. Mathematically, stochastic processes are represented by discrete Markov chain. Stochastic processes generate all natural written languages such as English. For the purpose of understanding, suppose we have a language with the first five letters of the English alphabet: A,B,C,D and E.

A.  Suppose the successive choices are independent and each letter is equally probable with a probability of 0.2. A typical sequence generated by this language would be:

B D C B C E C C C A D C B D D A A E C E E A

A B B D A E E C A C E E B A E E C B C E A D.

B.  Now suppose the letters are not chosen independently. The probability of a letter depends on the previous letter chosen. The structure can now be represented by a set of transition probabilities in a two dimensional matrix. Suppose there are three letters A, B, and C with transition probability matrix as shown below:

A typical sequence produced by this system would be:

A B B A B A B A B A B A B A B B B A B B B B B AB A B A B A B A B B B A C A C A B

B A B B B B AB B AB A C B B B AB A.

We can get very complicated stochastic processes by considering probabilities not just dependent on the previous letter but on n letters before that. We would then have an n-1 dimensional matrix of transition probabilities to describe the process.

Among all possible markov processes, there is a type that is very important in communication theory. This type of Markov process is called ‘Ergodic Process’. In simple terms, ergodic process is a process in which every sequence produced by the process is the same in statistical properties. All the exmples that we will see in this review are ergodic.

Entropy of an Information source:

Now that we know how an information source can be represented mathematically, the next question to ask is can we somehow measure how much information is produced by this source and at what rate this information is being produced.

Suppose we have a set of events S1, S2,…… Sn with a set of probabilities p1, p2,……. pn. Let H(p1, p2,……. pn) be a measure of uncertainty of our outcome. Then we require H to have the following properties:

1.  H should be continuous in the pi .

2.  When all the events are equally likely, the probabilities are equal and uncertainty increases with the number of possible events. Therefore H should be a monotonic increasing function of n, the total number of events.

3.  If a choice can be broken down into two successive choices, the original H should be a weighted sum of the individual values of H.

It turns out that the only possible H that satisfies these conditions is of the form,

(1)

where K is a constant that defines the unit we use to measure H.

This form of H can be recognized from statistical mechanics as the form of entropy. For convenience we will drop the constant K. The entropy in the case of a system with two possibilities with probabilities p and q=1-p is then,

H=-pLog(p)-qLog(q) (2)

A plot H as a function of p is shown below in Fig.3:

Fig. 3 Plot of entropy against the probability p.

Using this plot we can observe some interesting properties of H:

·  H is zero if and only if all but one of the probabilities are zero. This means that when we are certain of one outcome, H vanishes.

·  H is never negative.

·  When all the events are equally probable, each of the probabilities is equal to 1/n and the value of H is Log(n). This is the maximum value of the function because this is the most uncertain event.

·  Any change towards equalization of probabilities always increases H.

·  Suppose there are two events x and y with m possibilities for the first and n possibilities for the second. The uncertainty H(x,y) for the joint event should be less than or equal to the sum of the individual uncertainties. i.e.

H(x,y) H(x) + H(y) (3)

The equality holds only if the two events are independent.

·  In the above example, for any particular value i that x can take, there is a conditional probability pi(j) that y has the value j.

This is given by,

(4)

where p(i, j) is the probability of the occurrence of i for x and j for y.

We can define the conditional entropy Hx(y) as the average of the entropy of y for each value of x multiplied by the probability of obtaining that value of x and y.

(5)

Using this definition, it can be shown that,

H(x, y) = H(x) + Hx(y) (6)

·  Combining equations 3 and 6 we get,

Hx(y) H(y) (7)

This means that uncertainty of y is never increased by knowledge of x.

For the discrete information source we have been talking about, the total entropy of the source can be defined as the sum of entropy of each possible state multiplied by the probability of occurrence of that state.

(8)

This is the entropy of the source per symbol of the text. The entropy per second is given by,

(9)

where m is the average number of symbols produced by the source per second. If successive symbols are independent then,