ADDITIONAL MATERIAL

A model-based optimization framework for the inference of regulatory interactions using time-course DNA microarray expression data

Reuben Thomas1, Carlos J. Paredes 4,5, Sanjay Mehrotra2, Vassily Hatzimanikatis3,*

and Eleftherios T. Papoutsakis4,6*,

1Laboratory of Molecular Toxicology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, USA; 2Department of Industrial Engineering and Management Science, Northwestern University, Evanston, Illinois 60208-3120, USA; 3Laboratory of Computational Systems Biotechnology, EPFL, CH-1015 Lausanne, Switzerland; and 4Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208-3120, USA; 5 Current address: Gevo, Inc., 133 N. Altadena Dr. Suite 310, Pasadena, CA 91107, USA. 6 Current address: Dept. of Chemical Engineering and the Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711.

Email: Reuben Thomas – ; Carlos J. Paredes – ; Sanjay Mehrotra – , Vassily Hatzimanikatis – ; Eleftherios T. Papoutsakis –

*Corresponding authors

Contact Information:

Vassily Hatzimanikatis
Laboratory of Computational Systems Biotechnology,
EPFL, CH-1015 Lausanne, Switzerland
Email:
Tel: +41 (0)21 693 98 70
Fax: +41 (0)21 693 98 75 / Eleftherios T. Papoutsakis,
Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208-3120, USA.
Email:
Tel: 001-847-491-7455
Fax: 001-847-491-3728

Keywords: Gene Regulation, S-systems, DNA arrays, Time-varying, Optimization, Bacillus anthracis

Running Title: “Performance of model-based methods for network inference”

A1. Dynamic characteristics of the transcription/translation model

In this section we analyze the way the parameters of the S-system based regulation model (Equation 1, Main paper) affect the dynamic characteristics of the different mRNAs and proteins in the system. The dynamic characteristics considered include,

1)  The stability of the system around a steady state.

2)  The time necessary to reach steady state.

3)  The collinearity or the near-collinearity between the time profiles of the different mRNA and protein concentrations.

4)  Sensitivity of the dynamic behavior to the parameters of the model.

In order to analyze these four issues, the required analysis for the general model framework is very complex. If however, we make the assumption that the rates of synthesis of any of the mRNAs and proteins are never more than, for example, twice their corresponding degradation rates, then the required analysis becomes easy. At steady state, the rates of synthesis and degradation are equal. Hence, intuitively, we are trying to analyze the evolution of the system from a time point that is not too ‘far away’ from the steady state. Mathematically at any time t,

for the ith mRNA and protein,

(A.1)

From Equation (1) in the main paper,

Therefore Equation A.1 implies

(A.2)

This condition will convert the non-linear system of differential equations (Equation (1), main paper) to a linear system of differential equations in the logarithmic space. The steps to achieve this goal are given below.

From the mass balance requirement of the ith protein,

Using Equation A.2 and the fact that,

when (A.3)

(A.4)

The mass balance requirement of the ith mRNA implies,

Using Equation A.3,

Using Equations A.3 and A.4,

(A.5)

where,

Now, we introduce the following definitions:

·  A n´n matrix E whose (i,j)th element is given by .

·  A 2n´1 vector b whose first n components are zero and for the remaining n components the (n+i)th element is given by .

·  An n´1 vector bs whose ith component is given by .

·  A 2n´1 vector x whose first n components the logarithm of n protein concentrations and the next n components are the first derivatives (with respect to time) of the log of the n protein concentrations at a given time t.

·  I, the n´n identity matrix.

·  O, be the n´n zero matrix.

·  D a n´n diagonal matrix whose ith diagonal component is given by .

·  A 2n´2n matrix A given by,

Using the above definitions Equation (A.5) for all the i genes can be written in a compact matrix-vector notation as a set of linear differential equations,

(A.6)

At steady state, the protein concentrations do not change with time, i.e., their derivatives are zero. The log of the protein concentrations at steady state are given by a solution to the system of equations,

(A.7)

Note that the stability of the system requires that the above system of equations does have at least one solution or bs lies in the range space of the matrix operator E, i.e.,

Now, we derive the solution of Equation (A.6) in terms of the eigenvalues and eigenvectors of the matrix A. Unless the parameters are related in a definite manner (e.g. all or some of them are exactly equal), matrix A will not have repeated eigenvalues. Hence A will have 2n distinct (possibly complex) eigenvalues. Let the eigenvalues of A be given by, and the corresponding set of eigenvectors be given by . Then

where is a solution of Equation (A.7). s are constants whose values are determined by the initial conditions.

A1.1 Stability of the system

For stability the real part of the all the eigenvalues should be negative. Therefore, for the stability of the system two conditions need to be met,

(A.8)

A1.2 Collinearity between time profiles

Let denote a 2n´1 vector whose ith component is given by , X denote a 2n´Ns matrix whose column vectors are the vector x at Ns distinct time points, Q be the 2n´2n matrix whose column vectors are the eigenvectors of the matrix A. a 2n´ Ns matrix whose column vectors denote are the vector at the Ns time points. Then,

(A.9)

Then similarity between the time profiles of the different proteins is associated with the conditioning of the matrix X. The conditioning of X in turn depends on the conditioning of the matrices Q and . If the condition number of X is denoted by cond(X) then the following relation holds true ([1])

The condition number of the matrix would be high if all or some of the eigenvalues are close to each other. It would also be high if, over the chosen Ns time points, the vector does not change much, i.e., the system evolves at a slow rate. This would happen in the scenario when the real part of the different eigenvalues is close to zero and the Ns samples are chosen only for a relatively small time period.

The condition number of the matrix Q would be high if the all or some of its row vectors are close to being linearly dependent. An example of a situation when this occurs is the following. Suppose the network of interactions between the n genes in the system can be split into two disjoint sub-networks of interactions, each with n/2 genes. Also suppose that the two sub-networks are exactly similar in structure and parameter values, i.e., there is a one-to-one correspondence between the genes in the two sub-networks. Then, corresponding genes in the two sub-networks would behave in the exact same manner even though they don’t interact with each other. A detailed analysis of how the parameters and the structure of the network of interactions between the genes affect the linear dependencies between genes is left as future work.

A1.3 Sensitivity of the dynamic behavior to the parameters of the model

The method proposed in this paper requires estimates of the half-lives of the different mRNAs and proteins in the system. The issue here is how accurate do these estimates have to be in order that for the dynamic behavior to be not too different from what it really is. We have just seen that the dynamic behavior is governed by the eigenvalues and eigenvectors of the matrix A. So the thing to do would be to check how sensitive the eigenvalues and eigenvectors of A are to errors in its elements. Using a result from Horn and Johnson [1], we can claim that the perturbation of the eigenvalues of A due to errors in its elements depends on the condition number of the eigenvector matrix, Q. Let Z be a matrix representing the errors in the components of A. Let be an eigenvalue of A+Z. There exists an eigenvalue of A, such that,

where || || denotes the matrix norm.

A2 Error in prediction of protein concentrations

Let and be the true mRNA and protein concentrations as a function of time t, from time 0 to time T and let and denote their approximations. The mRNA

concentrations are approximated as splines while the protein concentrations are obtained by using the mRNA approximations in the protein mass-balance equations. At any time t,

Therefore,

Hence the error in approximating the protein concentrations is a function of the error in approximating in the mRNA concentrations and the error in the estimation of the initial protein concentration. Notice that as time increases the error due to the estimation of initial protein concentration decreases.

A3. Data for the synthetic networks

Define the interactions in the network (the elasticities, eij) by the matrix G. The notation of the other parameters is the same as used in the main paper.

A3.1 “Low” network

Table A.1: Interactions corresponding to the genes from the “Low” network whose time-course profiles are characterized by a relatively low degree of similarity. The strength of all interactions is 0.5. A positive sign corresponds to an activating interaction, a negative sign to an inhibiting interaction and a zero value to an absent interaction. A row corresponds to a regulated gene and a column corresponds to a regulator gene.

1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10
1 / -0.5 / -0.5 / 0 / 0 / 0.5 / 0 / 0 / 0 / 0 / 0
2 / 0 / 0 / 0 / -0.5 / 0 / -0.5 / 0 / 0 / -0.5 / 0
3 / -0.5 / 0.5 / 0 / 0 / 0 / 0 / 0 / 0 / 0 / -0.5
4 / 0.5 / 0 / 0 / 0 / 0 / 0 / -0.5 / 0 / 0 / 0.5
5 / 0 / 0 / 0 / 0.5 / 0 / 0 / 0 / -0.5 / 0 / -0.5
6 / 0 / 0.5 / 0 / 0 / 0 / 0.5 / 0.5 / 0 / 0 / 0
7 / 0 / 0 / -0.5 / 0 / 0.5 / -0.5 / 0 / 0 / 0 / 0
8 / 0 / 0 / 0 / -0.5 / 0 / -0.5 / 0 / -0.5 / 0 / 0
9 / 0 / 0.5 / 0 / 0 / 0 / 0 / -0.5 / -0.5 / 0 / 0
10 / 0 / 0 / 0 / 0 / 0 / -0.5 / 0 / 0 / -0.5 / -0.5

Table A.2: The values of the different parameters for the 10 genes of the “Low” network whose time-course profiles are characterized by a relatively low degree of similarity. m0 and p0 are given in concentration units. b and d are given in inverse time units while a and g are in units that are consistent with the utilized units of concentration and time.

a / b / g / d / m0 / p0
1 / 0.25 / 0.59 / 0.22 / 0.38 / 1.00 / 2.00
2 / 0.20 / 0.23 / 0.11 / 0.14 / 3.00 / 4.00
3 / 0.28 / 0.40 / 0.27 / 0.35 / 5.00 / 6.00
4 / 0.70 / 0.05 / 0.30 / 0.29 / 7.00 / 8.00
5 / 0.71 / 0.52 / 0.24 / 0.31 / 9.00 / 10.00
6 / 0.06 / 0.35 / 0.18 / 0.30 / 11.00 / 12.00
7 / 0.16 / 0.65 / 0.24 / 0.11 / 13.00 / 14.00
8 / 0.11 / 0.22 / 0.11 / 0.21 / 15.00 / 16.00
9 / 0.00 / 0.59 / 0.15 / 0.04 / 17.00 / 18.00
10 / 0.70 / 0.69 / 0.08 / 0.29 / 19.00 / 20.00

A3.2 “Medium” network

Table A.3: Interactions corresponding to the genes from the “Medium” network whose time-course profiles are characterized by a relatively medium degree of similarity. The strength of all interactions is 0.5. A positive sign corresponds to an activating interaction, a negative sign to an inhibiting interaction and a zero value to an absent interaction. A row corresponds to a regulated gene and a column corresponds to a regulator gene.

1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10
1 / -0.5 / 0 / 0 / 0 / 0 / 0 / 0 / -0.5 / -0.5 / 0
2 / 0.5 / -0.5 / 0 / 0 / 0 / 0 / 0 / -0.5 / 0 / 0
3 / 0.5 / 0.5 / 0 / 0 / 0.5 / 0 / 0 / 0 / 0 / 0
4 / 0 / 0.5 / 0 / -0.5 / 0 / -0.5 / 0 / 0 / 0 / 0
5 / 0 / 0 / 0 / -0.5 / -0.5 / 0 / 0 / 0 / 0 / -0.5
6 / 0 / 0 / 0.5 / 0 / 0 / 0 / -0.5 / 0 / 0 / -0.5
7 / 0 / 0 / -0.5 / 0 / 0 / 0 / 0 / 0.5 / 0 / -0.5
8 / 0 / 0 / 0.5 / 0 / 0 / 0 / -0.5 / 0 / 0 / -0.5
9 / 0 / 0 / -0.5 / 0 / 0 / 0.5 / 0 / 0.5 / 0 / 0
10 / 0 / -0.5 / 0 / 0 / -0.5 / 0.5 / 0 / 0 / 0 / 0

Table A.4: The values of the different parameters for the 10 genes of the “Medium” network whose time-course profiles are characterized by a relatively medium degree of similarity. m0 and p0 are given in concentration units. b and d are given in inverse time units while a and g are in units that are consistent with the utilized units of concentration and time.

a / b / g / d / m0 / p0
1 / 0.61 / 0.31 / 0.09 / 0.03 / 1.00 / 2.00
2 / 0.57 / 0.66 / 0.04 / 0.01 / 3.00 / 4.00
3 / 0.44 / 0.03 / 0.02 / 0.02 / 5.00 / 6.00
4 / 0.56 / 0.06 / 0.02 / 0.01 / 7.00 / 8.00
5 / 0.51 / 0.12 / 0.05 / 0.10 / 9.00 / 10.00
6 / 0.22 / 0.73 / 0.07 / 0.07 / 11.00 / 12.00
7 / 0.24 / 0.51 / 0.02 / 0.07 / 13.00 / 14.00
8 / 0.07 / 0.36 / 0.07 / 0.02 / 15.00 / 16.00
9 / 0.27 / 0.58 / 0.09 / 0.04 / 17.00 / 18.00
10 / 0.10 / 0.09 / 0.07 / 0.08 / 19.00 / 20.00

A3.3 “High” network