Project 1 Solutions

CS 321, Fall 2003

Tony Faradjian

The function pickCA, provided on the web page, extracts the C-alpha coordinates of the several chains and places them in cells:

> r = pickCA('1MHC.pdb')

r =

Columns 1 through 4

[276x3 double] [99x3 double] [9x3 double] [276x3 double]

Columns 5 through 6

[99x3 double] [9x3 double]

To plot the six chains, we use the function plot_chains (a disadvantage: the colors are hardcoded). This function takes an optional second argument that causes C-alphas to be plotted as points.

function plot_chains(r, pts)

% plot_chains(r, pts). Plot the chains of a protein.

% r: {1 m}: m chains, r{i} is [n_i 3], the coordinates of the i-th
% chain. NOTE: r can also be an N-by-3 matrix in case of a single
% chain.

% pts: [1]: OPTIONAL: boolean indicating whether residues should also
% be represented as points. DEFAULT is 0.

if nargin == 1

pts = 0;

end

% cell array in any case

if ~iscell(r)

r = {r};

end

figure

hold

colour = ['b' 'g' 'y' 'r' 'm' 'c']; % max of 6 colors
for i = 1:length(r)

ii = mod(i, length(colour)) + 1;

plot3(r{i}(:,1), r{i}(:,2), r{i}(:,3), [colour(ii) '-']);

if pts

plot3(r{i}(:,1), r{i}(:,2), r{i}(:,3), [colour(ii) '.']);

end

axis equal

rotate3d on

The last line activates MATLAB’s rotate3d facility (‘help rotate3d’). The plot below is produced by the following code:

plot_chains(r, 1)

title('MHC C-alphas')

xlabel('x (Angstroms)')

ylabel('y (Angstroms)')

zlabel('z (Angstroms)')

Internal coordinates are computed using the function interncoord from HW1 (code supplied on web page). The following function, interncoord_m, simply calls intercoord on each chain:

function [d, a, t] = interncoord_m(r)

% [d a t] = interncoord_m(r). Call interncoord on several chains.

% r: {1 m}: m chains

% d, a, t: each {1 m}: internal coordinates corresponding to r. (See
% interncoord for more detail.)

for i = 1:length(r)

[d{i} a{i} t{i}] = interncoord(r{i});

end

We call it as follows:

> [d a t] = interncoord_m(r)

d =

Columns 1 through 4

[275x1 double] [98x1 double] [8x1 double] [275x1 double]

Columns 5 through 6

[98x1 double] [8x1 double]

a =

Columns 1 through 4

[274x1 double] [97x1 double] [7x1 double] [274x1 double]

Columns 5 through 6

[97x1 double] [7x1 double]

t =

Columns 1 through 4

[273x1 double] [96x1 double] [6x1 double] [273x1 double]

Columns 5 through 6

[96x1 double] [6x1 double]

To build distance histograms, lump all the distances into one vector, and pass it to hist:

> dd = cat(2,d); % ('help cat')

> [n x] = hist(dd);

> n = n / length(dd); % normalize

Plot it with bar:

bar(x,n);

title('Probability distribution of MHC virtual bond lengths');

xlabel('b (Angstroms)');

ylabel('P(b)');

The unusual feature is the nonzero probability around 3.1 Angstroms. The following code finds these anomalous bonds:

> ff = [];

> for i = 1:length(d)

fin = find(d{i} < 3.5); % 3.5 is quite arbitrary

ff = [ff; [repmat(i, length(fin), 1), fin]];

end

> ff

ff =

1 209

2 31

4 209

531

There are four of them, one in each of chains A, B, D, and E. A glance at the pdb file for 1MHC shows that residues no. 210 of the A and D chains and residues no. 32 of the B and E chains are proline. These four bonds are unusually short because the prolines are in the cis isomer, instead of the more common trans isomer.

The function prot_spline fits a spline through a protein. It uses MATLAB’s csapi (similar to spline) then passes the interpolant to plot_chains:
function splin = prot_spline(r, d)

% splin = prot_spline(r, d). Fit cubic splines through the chains of a
% protein and plot.

% r: {1 m}: m chains, r{i} is [n_i 3], the coords of the i-th chain.
% NOTE: r can also be an N-by-3 matrix in case of a single chain.

% d: [1]: degree of interpolation, i.e., factor by which the number of % points increases.

% splin: {1 m}: corresponding splines

% cell array in any case

if ~iscell(r)

rr = r;

clear r

r{1} = rr;

clear rr

end

for i = 1:length(r)

splin{i} = csapi(1:length(r{i}), r{i}');

p{i} = ppval(splin{i}, 1:d:max(splin{i}.breaks))'; % note transpose

end

plot_chains(p);

Use it as follows:

> prot_spline(r);

Current plot held

> title('Cubic spline interpolation of 1MHC');

> xlabel('x (Angstroms)');

> ylabel('y (Angstroms)');

> zlabel('z (Angstroms)');

The PDB’s Motif Summary page for 1MHC (accessed via the Geometry link) supplies secondary structure annotation. Based on that information, the script sscorrMHC.m hardcodes a cell array ss of secondary structure vectors then uses it to make three correlation plots, one for each internal coordinate vs. the PDB annotation. This script is long but simple. It uses MATLAB’s subplot to organize the three plots into one figure.

% sscorrMHC: script to display correlation between internal coordinates

% and PDB secondary structure annotation of protein 1MHC. Assumes 1MHC
% is stored in variable r {1 6} with r{i} having size [n_i 3]. Also
% assumes that internal coordinates of 1MHC are stored in d, a, t (each
% {1 6}).

% ------

% hardcode helix indices

helix = cell(1,6);

% A chain helices

helix{1} = [50:52 58:84 138:150 153:161 163:174 176:179 254:256];

% D chain helices

helix{4} = [50:54 57:84 138:150 152:161 163:174 176:179 253:256]; % no other helices: remaining cells empty

% ------

% hardcode sheet indices

sheet = cell(1,6);

sheet{1} = [3:12 22:28 31:39 42:47 94:103 109:118 ...
121:126 133:135 186:193 199:208 229:230 ...
234:235 241:249 214:219 222:223 257:262 270:272];

sheet{2} = [6:11 21:30 50:51 55:56 62:70 ...

35:41 44:45 78:84 91:94];

% no sheets in the C chain: sheet{3} is empty

sheet{4} = [3:12 22:28 31:39 42:47 94:103 109:118 ...
121:126 133:135 186:193 199:208 229:230 234:235 ...
241:249 214:219 222:223 257:262 270:272];

sheet{5} = [6:11 21:30 50:51 55:56 62:70 35:41 44:45 78:84 91:94];

% no sheets in F chain: sheet{6} is empty

% ------

% build secondary structure vectors ss.

ss = cell(1,6);

for i = 1:length(ss)

ss{i} = 2 * ones(1, size(r{i}, 1)); % default is coil

ss{i}(helix{i}) = 0;

ss{i}(sheet{i}) = 1;

end

clear i

% ------

% PLOTS

colour = ['b' 'g' 'r' 'k' 'm' 'c'];

% bond length correlation plot

subplot(3,1,1)

hold

for i = 1:length(ss)

plot(ss{i}(1:end-1), d{i}, [colour(i) 'o']);

end

xlabel('ss');

ylabel('bond length (Angstroms)');

title('Correlation plot of bond length vs. secondary structure');

% bond angle correlation plot

subplot(3,1,2)

hold

for i = 1:length(ss)

plot(ss{i}(2:end-1), a{i}/pi, [colour(i) 'o']);

end

xlabel('ss');

ylabel('bond angle (in units of pi)');

title('Correlation plot of bond angle vs. secondary structure');

% torsion angle correlation plot

subplot(3,1,3)

hold

for i = 1:length(ss)

plot(ss{i}(2:end-2), t{i}/(2*pi), [colour(i) 'o']);

end

xlabel('ss');

ylabel('torsion angle (in units of 2 * pi)');

title('Correlation plot of torsion angle vs. secondary structure');

Here is the result:

Not surprisingly, the bond length does not correlate well with secondary structure: it’s almost always roughly 3.8 Angstroms. The torsion angle does better, clustering roughly around 0.2(2) for helices and tending to be larger than 0.4(2) for beta sheets. It is widely scattered for loops, though. (The terms “loop,” “coil,” and “turn,” all refer to secondary structure that is neither alpha nor beta.) The best correlator with secondary structure is the bond angle. It separates helices and beta sheets into two almost disjoint subsets (less than for helices, greater than  for sheets) with less scatter than the torsion angle does, and although it too is unpredictable for coil residues, it still shows less scatter there than the torsion angle does.

Finally, we construct a Cartesian representation of 1MHC from its internal coordinates. The procedure is to use the bond and torsion angles to rotate running backbone tangents and plane normals to append the next C-alpha coordinate. Thus we need to write code that computes a rotation matrix R about an arbitrary axis n and by an arbitrary angle . To rotate a (column) vector v, we simply compute R*v. To compute R, we first change basis so that n coincides with the z-axis. This change of basis is itself a rotation, given by matrix Tn. If Rz() is a rotation about the z-axis by angle  (which is easy for us to write down), then
R = Tn-1 * Rz() * Tn = Tt * Rz() * Tn.
The change of basis rotation Tn is computed by the function alignmat:

function T = alignmat(ax)

% T = alignmat(ax). Compute a matrix that rotates an axis to the
% z axis.

% ax: [1 3]: axis to be rotated

% T: [3 3]: rotation matrix

x = ax(1);

y = ax(2);

z = ax(3);

r = norm(ax);

if r == 0

error('ax is zero');

end

rho = norm(ax(1:2));

if rho == 0

T = eye(3);

if z < 0

T(1,1) = -1;

T(3,3) = -1;

end

return

end

ct = z / r; % cosine(theta)

st = rho / r; % sin(theta)

cp = x / rho;

sp = y / rho;

T = [ ct * cp ct * sp -st; ...

-sp cp 0; ...

st * cp st * sp ct ];

Now we can use it in rotmat, the function that computes R:

function R = rotmat(ax, angle)

% R = rotmat(ax, angle). Compute a 3-D rotation matrix.

% ax: [1 3]: vector giving rotation axis (not necessarily normalized).

% angle: [1]: angle of rotation.

% R: [3 3]: corresponding rotation matrix.

T = alignmat(ax);

c = cos(angle);

s = sin(angle);

zrot = [c -s 0; ...

s c 0; ...

0 0 1];

R = T' * zrot * T;

Now we can write intern2cart, the function that converts the internal representation of a chain to a Cartesian representation. As stipulated in the assignment, we resolve amibguities about the absolute position and orientation of the resulting structure by placing the first residue at the origin, the second on the positive x-axis, and the third in the xy-plane.

function r = intern2cart(d, a, t)

% r = intern2cart(d, a, t). Compute a Cartesian representation of a

% chain from its internal coordinates.

% d: [1, n-1]: bond distances

% a: [1, n-2]: bond angles

% t: [1, n-3]: torsion angles

% r: [n 3]: Cartesian representation with r(1,:) = 0 and r(2,2:3) = 0.

n = length(d) + 1;

r = zeros(3, n);

r(1,2) = d(1);

e = [0; 0; 1]; % running plane normal

g = [1; 0; 0]; % running chain tangent

t = [t; 0]; % simplify loop with dummy 0

for i = 3:n

R = rotmat(e, pi - a(i-2));

g = R * g;

r(:,i) = r(:,i-1) + d(i-1) * g;

R = rotmat(g, t(i-2));

e = R * e;

end

r = r';

The following line reconstructs the A chain of MHC:

> rA = intern2cart(d{1}, a{1}, t{1});

Does it work? The contact maps are easy enough to check out:

> cmb = triu(dist(r{1}') <= 7);

> image(cmb + 1)

> colormap([1 1 1; 0 0 0])

> title('Contact map of 1MHC_A')

> figure

> cma = triu(dist(rA') <= 7);

> image(cma + 1)

> colormap([1 1 1; 0 0 0])

> title('Contact map of 1MHC_A after reconstruction ...

from internal coordinates')

They look identical. A more rigorous test of similarity is to measure the difference in their distance matrices:

> max(max(abs(dist(r{1}') - dist(rA'))))

ans =

2.1227e-011

This number shows that corresponding elements of the two distance matrices are identical up to 10 decimal places. Since the C-alpha coordinates were given to only 3 decimal places, we conclude that the code is correct and that the reconstruction from internal coordinates is more than satisfactorily accurate.