Week 11+ IML Class Activities

Week 11+ “IML” Class Activities

File: week-11-12nov07-IML.doc

Directory (hp):

C:\baileraj\Classes\Fall 2007\sta402\handouts

Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts

SAS/IML Programming

* Basic matrix concepts: rows, columns, scalars

* matrix operators

* subscripting

* matrix functions

* creating matrices from data sets and vice versa

* sample applications

Acknowledgments:

Thanks to Jim Deddens (UC) for sharing handouts that served

as the starting place for a number of the illustrations contained herein (Deddens Handouts #17-22)

References:

SAS/IML User's Guide on SAS OnlineDoc (

Links of particular interest …

> Understanding the Language

> Working with Matrices

> Working with SAS Data Sets (bringing a SAS data set into IML)

> Programming Statements (program control/module definition)

Introductory remarks

* IML = Interactive Matrix Language – it is a programming

language for matrix manipulations (can control flow of

operations with commands (e.g. DO/END, START/FINISH, IF-THEN/

ELSE, etc.). Lots of built-in matrix manipulation functions

and subroutines are available.

* basic object = data matrix (so can perform operations on

entire matrices)

* can run interactively or store statements in a module

* can build functions/subroutines using these modules

Matrix Definitions and vocabulary

* Matrices have entries of the same type (either all numeric or character – note the difference

from a SAS data set which can include variables of either type).

* Matrix names are 1 to 8 characters (begin with letter or underscore and continue with letters,

numbers or underscores)

* Matrix possess property of DIMENSION (# rows and columns)

* ROW VECTOR = 1xn matrix

COLUMN vector = mx1 matrix

SCALAR = 1x1 matrix

Example 1: Basic matrix definition + subscripting

options nodate nocenter formdlim="-";

PROCIML;

*makes a 2x3 matrix;

C = {123,456};

print '2x3 example matrix C = {1 2 3,4 5 6} =' C;

*select 2nd row;

C_r2 = C[2,];

print '2nd row of C = C[2,] =' C_r2;

*select 3rd column;

C_c3 = C[,3];

print '3rd column of C = C[,3] =' C_c3;

*select the (2,3) element of C;

C23 = C[2,3];

print '(2,3) element of C = C[2,3] =' C23;

*makes a 1x3 matrix by summing over rows in each of the columns;

C_csum=C[+,];

print '1x3 column sums of C = C[+,] =' C_csum;

*makes a 2x1 matrix by summing over columns in each of the rows;

C_rsum=C[,+];

print '2x1 row sums of C = C[,+] =' C_rsum;

run;

2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3

4 5 6

C_R2

2nd row of C = C[2,] = 4 5 6

C_C3

3rd column of C = C[,3] = 3

C23

(2,3) element of C = C[2,3] = 6

C_CSUM

1x3 column sums of C = C[+,] = 5 7 9

C_RSUM

2x1 row sums of C = C[,+] = 6

Example 2: Basic IML code and matrix definitions

options nodate nocenter formdlim="-";

PROCIML;

*makes a 2x3 matrix;

C = {123,456};

print '2x3 example matrix C = {1 2 3,4 5 6} =' C;

*makes a 1x3 matrix by summing over the rows in each of the columns;

C1=C[+,];

print '1x3 column sums of C = C[+,] =' C1;

*makes a 2x1 matrix by summing over the columns in each of the rows;

C2=C[,+];

print '2x1 row sums of C = C[,+] =' C2;

*makes a matrix out of second column of C;

F = C[,2];

print 'extract 2nd column of C into new vector (F) = C[,2] =' F;

*puts second column of c on diagonal;

D = DIAG( C[,2] );

print 'put 2nd column of C into a diagonal matrix (D) = DIAG(C[,2]) =' D;

*makes a vector out of the diagonal;

CC= VECDIAG(D);

print 'convert diagonal (of D) into vector (CC) = VECDIAG(D) =' CC;

*exponentiates each entry;

DD = EXP(D);

print 'exponentiate D yielding DD = EXP(D) =' DD;

*puts c next to itself - column binds C with itself;

E = C || C;

print 'Column bind C with itself yielding E = C||C =' E;

*puts a row of 2's below C - row bind ;

F = C // SHAPE(2,1,3);

print "Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) =" F;

*creates a 3x2 matrix with matrix entry C;

K = REPEAT(C,3,2);

print '6x6 matrix = ' K;

*raises each entry of columns 2 & 3 of C to the third power then multiples by 3 and adds 3;

G = 3+3*(C[,2:3]##3);

print '3 + 3*(col2&3)^3 (G) = ' G;

*raises each entry of C to itself;

H = C ## C;

print 'raise C elements to itself (H) = C##C =' H;

*multiplies each entry of C by itself;

J = C # C;

print 'elementwise multiplication of C with itself (J) = C#C =' J;

quit;

SAS OUTPUT . . .

2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3

4 5 6

1x3 column sums of C = C[+,] = 5 7 9

2x1 row sums of C = C[,+] = 6

extract 2nd column of C into new vector (F) = C[,2] = 2

put 2nd column of C into a diagonal matrix (D) = DIAG(C[,2]) = 2 0

0 5

convert diagonal (of D) into vector (CC) = VECDIAG(D) = 2

exponentiate D yielding DD = EXP(D) = 7.3890561 1

1 148.41316

Column bind C with itself yielding E = C||C =

1 2 3 1 2 3

4 5 6 4 5 6

Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) = 1 2 3

4 5 6

2 2 2

6x6 matrix = 1 2 3 1 2 3

4 5 6 4 5 6

1 2 3 1 2 3

4 5 6 4 5 6

1 2 3 1 2 3

4 5 6 4 5 6

3 + 3*(col2&3)^3 (G) = 27 84

378 651

raise C elements to itself (H) = C##C = 1 4 27

256 3125 46656

elementwise multiplication of C with itself (J) = C#C = 1 4 9

16 25 36

Example 3 More IML Matrix definitions

prociml;

c1 = shape(1,5,1); * using shape function;

c2 = {1,1,1,1,1};

c3 = T({11111}); * transpose matrix;

c4 = T({[5] 1}) ; * using repetition factors;

print 'shape(1,5,1) = ' c1;

print '{1,1,1,1,1} = ' c2;

print 'T({1 1 1 1 1}) = ' c3;

print 'T({[5] 1}) = ' c4;

quit;

shape(1,5,1) = 1

{1,1,1,1,1} = 1

T({1 1 1 1 1}) = 1

T({[5] 1}) = 1

Example 4:creating matrices from data sets and vice versa

libname mydat 'D:\baileraj\Classes\Fall 2003\sta402\data';

prociml;

/* read SAS data in IML */

use mydat.nitrofen;

read all var { total conc } into nitro;

/* alternative coding */

use mydat.nitrofen var{ total conc };

read all into nitro2;

nitro = nitro || nitro[,2]##2; * adding column with conc^2;

nitro2 = nitro2 || (nitro2[,2]-nitro2[+,2]/nrow(nitro2)) ; * add column with centered concentration;

nitro2 = nitro2 || nitro2[,3]##2; * adding column with scaled conc^2;

show names; * show matrices constructed in IML;

*print nitro;

*print nitro2;

create n2 from nitro2; * creates SAS data set n2 from matrix nitro;

append from nitro2;

/* a little graphing in IML */

call pgraf(nitro[,2:1],'*','Nitrofen concentration', 'Number of young', 'Plot of #young vs. conc');

quit;

procprint data=n2;

title 'print of data constructed in IML';

run;

Plot of #young vs. conc

40 ˆ

‚

‚ * *

35 ˆ *

‚ *

‚ * *

‚ * * *

30 ˆ * *

‚ * *

N ‚

u ‚ * * * *

m ‚ * *

b 25 ˆ

e ‚ *

r ‚ * *

‚

o ‚ *

f 20 ˆ

‚

y ‚

o ‚ *

u ‚ *

n 15 ˆ * *

g ‚

‚ *

‚

10 ˆ

‚

‚ * *

‚ *

5 ˆ *

‚ *

‚

0 ˆ *

‚

Šƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒ

0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320

Nitrofen concentration

print of data constructed in IML

Obs COL1 COL2 COL3 COL4

1 27 0 -157 24649

2 32 0 -157 24649

3 34 0 -157 24649

4 33 0 -157 24649

5 36 0 -157 24649

6 34 0 -157 24649

7 33 0 -157 24649

8 30 0 -157 24649

9 24 0 -157 24649

10 31 0 -157 24649

11 33 80 -77 5929

12 33 80 -77 5929

13 35 80 -77 5929

14 33 80 -77 5929

15 36 80 -77 5929

16 26 80 -77 5929

17 27 80 -77 5929

18 31 80 -77 5929

19 32 80 -77 5929

20 29 80 -77 5929

21 29 160 3 9

22 29 160 3 9

23 23 160 3 9

24 27 160 3 9

25 30 160 3 9

26 31 160 3 9

27 30 160 3 9

28 26 160 3 9

29 29 160 3 9

30 29 160 3 9

31 23 235 78 6084

32 21 235 78 6084

33 7 235 78 6084

34 12 235 78 6084

35 27 235 78 6084

36 16 235 78 6084

37 13 235 78 6084

38 15 235 78 6084

39 21 235 78 6084

40 17 235 78 6084

41 6 310 153 23409

42 6 310 153 23409

43 7 310 153 23409

44 0 310 153 23409

45 15 310 153 23409

46 5 310 153 23409

47 6 310 153 23409

48 4 310 153 23409

49 6 310 153 23409

50 5 310 153 23409

Example 5: sample applications I: estimating PI

prociml;

nsim = 4000;

temp_mat = J(nsim,3,0);

/* col 1 = random x coordinate

col 2 = random y coordinate

col 3 = 1 if y <= sqrt(1-x^2)

do isim = 1 to nsim;

temp_mat[isim,1] = uniform(0); * generate X ~ unif(0,1);

temp_mat[isim,2] = uniform(0); * generate Y ~ unif(0,1);

end;

print ‘temp_mat = ‘ temp_mat;

junk1 = temp_mat[,1]#temp_mat[,1];

print ‘temp_mat[,1]#temp_mat[,1] = ‘ junk1;

junk2 = temp_mat[,2];

print ‘temp_mat[,2] = ‘ junk2;

temp_mat[,3] = (temp_mat[,2]<=

sqrt(J(nsim,1,1)-temp_mat[,1]#temp_mat[,1]));

pi_over4 = temp_mat[+,3]/nsim;

pi_est = 4*pi_over4;

se_est = 4*sqrt(pi_over4*(1-pi_over4)/nsim);

pi_LCL = pi_est – 2*se_est;

pi_UCL = pi_est + 2*se_est;

print ‘------‘;

print ‘Estimating PI using MC simulation methods with ‘ nsim ‘data points’;

print ‘PI-estimate = ‘ pi_est;

print ‘SE(PI-estimate) = ‘ se_est;

print ‘CI: [‘ pi_LCL ’ , ‘ pi_UCL ’]’;

print ‘------‘;

quit;

------

NSIM

Estimating PI using MC simulation methods with 4000 data points

PI_EST

PI-estimate = 3.126

SE_EST

SE(PI-estimate) = 0.0261349

PI_LCL PI_UCL

CI: [ 3.0737303 , 3.1782697 ]

------

Example 6: sample applications I: estimating PI using a MODULE

options nocenter nodate;

prociml;

/* MODULE TO ESTIMATE PI

- Monte Carlo simulation used

- Strategy:

Generate X~Unif(0,1) and Y~Unif(0,1)

Determine if Y <= sqrt(1-X*X)

PI/4 estimated by proportion of times condition above is true

- INPUT

nsim

- OUTPUT

estimate of PI along with SE and CI

start MC_PI(nsim);

temp_mat = J(nsim,3,0);

do isim = 1 to nsim;

temp_mat[isim,1] = uniform(0); * generate X ~ unif(0,1);

temp_mat[isim,2] = uniform(0); * generate Y ~ unif(0,1);

end;

temp_mat[,3] = (temp_mat[,2]<=

sqrt(J(nsim,1,1)-temp_mat[,1]#temp_mat[,1]));

pi_over4 = temp_mat[+,3]/nsim;

pi_est = 4*pi_over4;

se_est = 4*sqrt(pi_over4*(1-pi_over4)/nsim);

pi_LCL = pi_est - 2*se_est;

pi_UCL = pi_est + 2*se_est;

print '------';

print 'Estimating PI using MC simulation methods with ' nsim 'data points';

print 'PI-estimate = ' pi_est;

print 'SE(PI-estimate) = ' se_est;

print 'CI: [' pi_LCL ' , ' pi_UCL ']';

print '------';

finish MC_PI;

/**************************************************************************/

run MC_PI(400);

run MC_PI(1600);

run MC_PI(4000);

quit;

SAS OUTPUT . . . (edited)

------

NSIM

Estimating PI using MC simulation methods with 400 data points

PI_EST

PI-estimate = 3.08

SE_EST

SE(PI-estimate) = 0.0841665

PI_LCL PI_UCL

CI: [ 2.911667 , 3.248333 ]

------

NSIM

Estimating PI using MC simulation methods with 1600 data points

PI_EST

PI-estimate = 3.125

SE_EST

SE(PI-estimate) = 0.0413399

PI_LCL PI_UCL

CI: [ 3.0423203 , 3.2076797 ]

------

NSIM

Estimating PI using MC simulation methods with 4000 data points

PI_EST

PI-estimate = 3.198

SE_EST

SE(PI-estimate) = 0.0253219

PI_LCL PI_UCL

CI: [ 3.1473562 , 3.2486438 ]

------

Example 7: Using IML to construct a randomization test for testing equality of 2 populations (revisiting WEEK7 example)

/* RECALL – t-test previously conducted

procttest;

title NITROFEN: t-test of (0, 160) concentrations;

class conc;

var total;

run;

NITROFEN: t-test of (0, 160) concentrations

The TTEST Procedure

Statistics

Lower CL Upper CL Lower CL Upper CL

Variable conc N Mean Mean Mean Std Dev Std Dev Std Dev Std Err

total 0 10 28.827 31.4 33.973 2.4737 3.5963 6.5654 1.1372

total 160 10 26.612 28.3 29.988 1.6229 2.3594 4.3073 0.7461

total Diff (1-2) 0.2424 3.1 5.9576 2.2981 3.0414 4.4977 1.3601

T-Tests

Variable Method Variances DF t Value Pr > |t|

total Pooled Equal 18 2.28 0.0351

total Satterthwaite Unequal 15.5 2.28 0.0372

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

total Folded F 9 9 2.32 0.2252

/* ------

use PLAN to generate a set of indices for the randomization test

and then use TRANSPOSE to package the output

Do the calculations in PROC IML instead of DATA step programming

------*/

options nocenter nodate;

libname class 'D:\baileraj\Classes\Fall 2003\sta402\data';

title “Randomization test in IML – Nitrofen conc 0 vs. 160 compared”;

data test; set class.nitrofen;

if conc=0 | conc=160;

procplan;

factors test=4000 ordered in=20;

output out=d_permut;

run;

proctranspose data=d_permut prefix=in out=out_permut(keep=in1-in20); by test;

run;

prociml;

/* read SAS data in IML */

use class.nitrofen;

read all var { total conc } where (conc=0|conc=160) into nitro;

use out_permut;

read all into perm_index;

obs_vec = nitro[,1];

obs_diff = sum(obs_vec[1:10]) - sum(obs_vec[11:20]); * test statistic;

PERM_RESULTS = J(nrow(perm_index),2,0); * initialize results matrix;

do iperm = 1 to nrow(perm_index);

ind = perm_index[iperm,]; * extract permutation index;

perm_resp = obs_vec[ind]; * select corresponding obs;

perm_diff = sum(perm_resp[1:10]) - sum(perm_resp[11:20]);

PERM_RESULTS[iperm,1] = perm_diff; * store perm TS value/indicator;

PERM_RESULTS[iperm,2] = abs(perm_diff) >= abs(obs_diff);

end;

perm_Pvalue = PERM_RESULTS[+,2]/nrow(PERM_RESULTS);

print ‘Permutation P-value = ‘ perm_Pvalue;

from SAS OUTPUT . . .

PERM_PVALUE

Permutation P-value = 0.03575

/* code for testing components */

print nitro;

print perm_index;

obs_vec = nitro[,1];

print obs_vec;

ind = perm_index[1,];

print ind;

permdat = obs_vec[ind];

print permdat;

tranind = T(ind);

print obs_vec tranind permdat;

/* alternative coding */

obs_vec = shape(nitro[,1],1,20);

obs_diff = sum(obs_vec[1,1:10]) - sum(obs_vec[1,11:20]);

print obs_vec obs_diff;

PERM_RESULTS = J(nrow(perm_index),2,0); * initialize results matrix;

do iperm = 1 to nrow(perm_index);

ind = perm_index[iperm,];

perm_resp = shape(obs_vec[1,ind],1,20);

perm_diff = sum(perm_resp[1,1:10]) - sum(perm_resp[1,11:20]);

PERM_RESULTS[iperm,1] = perm_diff;

PERM_RESULTS[iperm,2] = abs(perm_diff) >= abs(obs_diff);

end;

perm_Pvalue = PERM_RESULTS[+,2]/nrow(PERM_RESULTS);

print'Permutation P-value = ' perm_Pvalue;

Noble IML example 1: bisection method

options ls=78 formdlim='-' nodate pageno=1;

/* find sqrt(x = 3) using bisection */

prociml;

x = 3;

hi = x;

lo = 0;

history = 0||lo||hi;

iteration = 1;

delta = hi - lo;

do while(delta > 1e-7);

mid = (hi + lo)/2;

check = mid**2 > x;

if check

then hi = mid;

else lo = mid;

delta = hi - lo;

history = history//(iteration||lo||hi);

iteration = iteration + 1;

end;

print mid;

create process var {iteration low high};

append from history;

------

MID

1.7320509

procprintdata=process;

run;

/* output from PROC PRINT

------

Obs ITERATION LOW HIGH

1 0 0.00000 3.00000

2 1 1.50000 3.00000

3 2 1.50000 2.25000

4 3 1.50000 1.87500

5 4 1.68750 1.87500

6 5 1.68750 1.78125

7 6 1.68750 1.73438

8 7 1.71094 1.73438

9 8 1.72266 1.73438

10 9 1.72852 1.73438

11 10 1.73145 1.73438

12 11 1.73145 1.73291

13 12 1.73145 1.73218

14 13 1.73181 1.73218

15 14 1.73199 1.73218

16 15 1.73199 1.73209

17 16 1.73204 1.73209

18 17 1.73204 1.73206

19 18 1.73204 1.73205

20 19 1.73205 1.73205

21 20 1.73205 1.73205

22 21 1.73205 1.73205

23 22 1.73205 1.73205

24 23 1.73205 1.73205

25 24 1.73205 1.73205

26 25 1.73205 1.73205

------

symbol interpol=join value=#;

axis1label=('Limits');

procgplotdata=process;

plot (low high)*iteration / overlayvaxis=axis1;

run;

quit;

Noble IML example 2: Ridge regression

Ridge regression

where for

1. Center and scale all variables

Note:

for

2. Let

(i.e. biased estimation)

OLS solution when c = 0

Example:

A. Suppose k = 3 and the indendepent variables are orthogonal to each other then

B. Suppose k = 3 and the indendepent variables and two of the variables are highly correlated with each other then

Let c = 0.01

Illustration:

Measurements were taken on 17 U.S. Navy hospitals

Independent variables =

Average daily patient load

Monthly x-ray exposures

Monthly occupied bed days

Eligible population in area (divided by 1000)

Average length of patients’ stay in days

Response = Monthly labor hours

options ls=78 formdlim='-' nodate pageno=1;

data hospital;

input patient_load x_ray bed_days population stay_length labor_hours;

cards;

15.57 2463 472.92 18.0 4.45 566.52

44.02 2048 1339.75 9.5 6.92 696.82

20.42 3940 620.25 12.8 4.28 1033.15

18.74 6505 568.33 36.7 3.90 1603.62

49.20 5723 1497.60 35.7 5.50 1611.37

44.92 11520 1365.83 24.0 4.60 1613.27

55.48 5779 1687.00 43.3 5.62 1854.17

59.28 5969 1639.92 46.7 5.15 2160.55

94.39 8461 2872.33 78.7 6.18 2305.58

128.02 20106 3655.08 180.5 6.15 3503.93

96.00 13313 2912.00 60.9 5.88 3571.89

131.42 10771 3912.00 103.7 4.88 3741.40

127.21 15543 3865.67 126.8 5.50 4026.52

252.90 36194 7684.10 157.7 7.00 10343.81

409.20 34703 12446.33 169.4 10.78 11732.17

463.70 39204 14098.40 331.4 7.05 15414.94

510.22 86533 15524.00 371.6 6.35 18854.45

;

procregdata=hospital;

model labor_hours = patient_load x_ray bed_days population stay_length / vif;

run;

------

The SAS System 1

The REG Procedure

Model: MODEL1

Dependent Variable: labor_hours

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 5 490195304 98039061 238.74 <.0001

Error 11 4517236 410658

Corrected Total 16 494712540

Root MSE 640.82590 R-Square 0.9909

Dependent Mean 4978.48000 Adj R-Sq 0.9867

Coeff Var 12.87192

Parameter Estimates

Parameter Standard Variance

Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 1957.65555 1062.65900 1.84 0.0925 0

patient_load 1 -19.08612 96.18624 -0.20 0.8463 9348.14904

x_ray 1 0.05574 0.02123 2.62 0.0236 7.95401

bed_days 1 1.69311 3.04722 0.56 0.5896 8710.16847

population 1 -4.07037 7.11731 -0.57 0.5789 23.00120

stay_length 1 -392.64933 207.75252 -1.89 0.0854 4.21971

%macro find_c(datain=,y=,x=,dataout=);

/*------+

| parameters |

| datain = dataset to be analyzed |

| y = response variable |

| x = list of independent variables |

| dataout = dataset containing ridge estimates |

+------*/

proc iml;

/* read in data */

use &datain;

read all var {&x} into data;

/* center */

n = nrow(data);

k = ncol(data);

centered = data -j(n,n,1)*data/n;

/* scale */

r = j(k,k,.);

do i = 1 to k;

do j = 1 to k;

r[i,j] = (centered[,i]`*centered[,j])

/sqrt(centered[,i]`*centered[,i]*centered[,j]`*centered[,j]);

end;

/* find c via bisection */

hi = 1;

lo = 0;

delta = hi - lo;

do while(delta > 1e-8);

c = (hi + lo)/2;

maxvif = max(diag(inv(r+c*i(k))*r*inv(r+c*i(k))));

if maxvif > 5

then lo = c;

else hi = c;

delta = hi - lo;

end;

mattrib c label='Biasing constant';

print c;

create temp var {c};

append from c;

data _null_;

set temp;

call symput('c',c);

run;

proc reg data=hospital outest=&dataout noprint;

model &y = &x / ridge=&c;

run;

%mend;

%find_c(datain=hospital,

y=labor_hours,

x=patient_load x_ray bed_days population stay_length,

dataout=ridge);

procprintdata = ridge;

run;

quit;

------

Biasing constant

0.0318075

------

Obs _MODEL_ _TYPE_ _DEPVAR_ _RIDGE_ _PCOMIT_ _RMSE_ Intercept

1 MODEL1 PARMS labor_hours . . 640.826 1957.66

2 MODEL1 RIDGE labor_hours 0.031808 . 724.538 798.25

patient_ stay_ labor_

Obs load x_ray bed_days population length hours

1 -19.0861 0.055738 1.69311 -4.07037 -392.649 -1

2 12.5864 0.063608 0.43146 2.24981 -171.973 -1

Example: Diagonalize a symmetric matrix:

(based on Deddens h.o.)

PROC IML;

A={1 2 3,4 5 6,5 7 9}; * this inputs a matrix;

print A;

G=GINV(A); * this is the generalized inverse;

print G;

B=A`*A;

print B;

CALL EIGEN(M,E,B); * this computes the eigenvalues of B;

print M; * M is a vector of eigenvalues;

print E; * E is a matrix whose columns are;

M=FUZZ(M);

print M;

BB=E*DIAG(M)*E`; * orthogonal eigenvectors;

print BB;

RUN;

1 2 3

4 5 6

5 7 9

-0.777778 0.6111111 -0.166667

-0.111111 0.1111111 7.027E-17

0.5555556 -0.388889 0.1666667

42 57 72

57 78 99

72 99 126

245.33969

0.660309

-1.14E-14

0.4115876 0.8148184 -0.408248

0.5638129 0.1242915 0.8164966

0.7160382 -0.566235 -0.408248

245.33969

0.660309

42 57 72

57 78 99

72 99 126

Example Generating Multivariate Normal random variables

(based on Deddens h.o.)

Generate a random sample of size 100 from a bivariate normal

distribution with mean (mu1,mu2)=(8,10), and variance covariance

matrix V=(4 3,3 9). X1~ N(8,4); X2 ~ N(10,9)

with Cov(X1, X2) = 3, so the correlation is 3/sqrt(4*9)=.5

PROC IML;

X=SHAPE(0,2,100); *X= 2x100 matrix of 0s;

MU=8*SHAPE(1,1,100)//10*SHAPE(1,1,100); *MU=2x100 matrix of 8s and 10s;

DO N=1 TO 100;

DO M=1 TO 2;

X[M,N]=RANNOR(0); * X = 2x100 matrix with indep. z-values;

END;

V={4 3,3 9};

CALL EIGEN (M,E,V);

T=E*DIAG(SQRT(M))*E`; * this is the square root of V;

Y=MU+T*X; *Y is a 2x100 matrix;

W=Y`; *W is a 100x2 matrix;

CREATE LAST FROM W; *creates a data set from the matrix W;

APPEND FROM W;

PROC CORR COV DATA=LAST; *computes covariances and correlations;

VAR COL1 COL2;

RUN;

The CORR Procedure

2 Variables: COL1 COL2

Covariance Matrix, DF = 99

COL1 COL2

COL1 4.42686739 3.57748316

COL2 3.57748316 10.77696443

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

COL1 100 8.00686 2.10401 800.68631 2.11136 13.18996

COL2 100 10.01085 3.28283 1001 2.90976 17.50832

Pearson Correlation Coefficients, N = 100

Prob > |r| under H0: Rho=0

COL1 COL2

COL1 1.00000 0.51794

<.0001

COL2 0.51794 1.00000

<.0001

This could also be done using the following code:

PROC IML;

MU={8 10};

COV={4 3, 3 9};

CALL VNORMAL(W,MU,COV,1000,0);

CREATE LAST FROM W;

APPEND FROM W;

PROC CORR COV DATA=LAST;

VAR COL1 COL2;

RUN;

The CORR Procedure

2 Variables: COL1 COL2

Covariance Matrix, DF = 999

COL1 COL2

COL1 4.252231110 3.409982609

COL2 3.409982609 9.263162292

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

COL1 1000 8.00001 2.06209 8000 1.44165 13.93833

COL2 1000 10.03342 3.04354 10033 -0.04513 19.35617

Pearson Correlation Coefficients, N = 1000

Prob > |r| under H0: Rho=0

COL1 COL2

COL1 1.00000 0.54333

<.0001

COL2 0.54333 1.00000

Example: Regression using matrix definition for LSE

(based on Deddens h.o.)

PROC IML is a interactive matrix language that contains lots of

matrix functions, SAS functions, and various programming statements.

This handout uses PROC IML to perform usual linear regression. (from

h.o. #22)

DATA DRAPER;

INPUT X1 X2 X3 X4 Y;

datalines;

7 26 6 60 78.5

1 29 15 52 74.3

11 56 8 20 104.3

11 31 8 47 87.6

7 52 6 33 95.9

11 55 9 22 109.2

3 71 17 6 102.7

1 31 22 44 72.5

2 54 18 22 93.1

21 47 4 26 115.9

1 40 23 34 83.8

11 66 9 12 113.3

10 68 8 12 109.4

PROC IML;

USE DRAPER;

READ ALL INTO XX; *this makes a matrix out of a data set;

Y=XX[,5]; N=NROW(XX); R=NCOL(XX);

ONE=SHAPE(1,N,1);

X=ONE||XX[,1:4];

BETA=INV(X`*X)*X`*Y;

SIGMA=SQRT((Y-X*BETA)`*(Y-X*BETA)/(N-R));

SE=SIGMA*SQRT(VECDIAG(INV(X`*X)));

TVALUE=BETA/SE;

PVALUE=(1-PROBT(ABS(TVALUE),N-R))*2;

FINAL=BETA||SE||TVALUE||PVALUE;

print FINAL (|colname={ESTIMATE SE TVALUE PVALUE}

rowname={INTERCPT X1 X2 X3 X4} FORMAT=8.4|);

CREATE LAST FROM FINAL; *this creates a SAS data set from a matrix;

APPEND FROM FINAL;

PROC print DATA=LAST;

RUN;

PROC REG DATA=DRAPER;

MODEL Y=X1 X2 X3 X4;

RUN;

Alternative coding: could read in the matrices XX, Y, and X using:

READ ALL VAR{ Y } INTO Y;

READ ALL VAR{ X1 X2 X3 X4 } INTO XX;

X=ONE||XX;

Example: MLE for Weibull Parameters using IML

/* ------

Find the maximium likelihood estimate for the Weibull distribution using the Newton Raphson method.

SUPPOSE

beta

beta-1 - alpha * x

f(x) = alpha * beta * x * e for x > 0

beta

log[f(x)] = log(alpha) + log(beta) + (beta-1) log(x) – alpha * x

In order to somewhat conform with PROC LIFEREG we will let:

mu=-log(alpha), i.e.

log[f(x)] = -mu + log(beta) + (beta-1)log(x) – exp(-mu)*x^beta

Recall that in order to find the mle, one forms the log-likelihood, takes the derivative F(theta) = LL'(theta). To find the maximium, one needs to find when the function F(theta) equals 0. One then uses the second derivative to find the standard error. Newton's method is an iterative procedure to find when a function is equal to 0.

-1

theta(n+1) = theta(n) - F'(theta(n)) * F(theta(n))

here theta=(alpha,beta) is the vector of unknown parameters. (based on Deddens h.o. #23)

revised: 2/26/2003

comments and elaboration: 11/09/2003

------*/

DATA D1;

DO I=1 TO 500;

X=RANEXP(123456789)/10; ONE=1; *we generate some data;

OUTPUT;

END;

/* obtain estimates from LIFEREG */

PROCLIFEREG data=D1;

MODEL X=ONE/D=WEIBULL; RUN;

Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -2.3355 0.0494 -2.4322 -2.2388 2239.54 <.0001

ONE 0 0.0000 0.0000 0.0000 0.0000 . .

Scale 1 1.0474 0.0366 0.9781 1.1216

Weibull Shape 1 0.9547 0.0333 0.8916 1.0224

/* repeat parameter estimation using IML */

DATA D2; SET D1; KEEP X ;

PROCIML;

USE D2;

READ ALL INTO X;

X=X`;

LL1= { 1, 1 };

NN=NCOL(X);

* compute an initial estimate of theta;

THETA={ 0 , .1 };

* do at most 20 iterations or until the deriv(of LL) is quite small;

DO JJ=1 TO 20 UNTIL ( SUM(LL1) < .0001 );

MU=THETA[1,];

BETA=THETA[2,];

SUMIT=SHAPE(1,NN,1); * vector of 1s;

* compute the Log-Likelihood;

LL = (MU +LOG(BETA) +LOG(X)*(BETA-1) -EXP(MU)*(X##BETA)) * SUMIT;

* compute the derivative of the log-likelihood;

LL1 = ( ( 1 - EXP(MU)*(X##BETA) ) * SUMIT ) //

( ( (1/BETA) + LOG(X) - EXP(MU)*LOG(X)#(X##BETA) )* SUMIT );

* compute the matrix of second derivatives;

LL2 = ( ( - EXP(MU)*(X##BETA) ) *SUMIT ||

( -(EXP(MU)*LOG(X)#(X##BETA) ) *SUMIT ) ) //

( ( -(EXP(MU)*LOG(X)#(X##BETA) ) * SUMIT ) ||

( (-1/(BETA*BETA) -EXP(MU)*LOG(X)#LOG(X)#(X##BETA) ) *SUMIT ) );

* iterate to find the mle;

THETA=THETA - INV(LL2)*LL1;

* compute the estimates corresponding to PROC LIFEREG;

SIGMA=1/BETA; INTERCEPT=-MU*SIGMA;

* print standard errors using the second derivative;

SE=SQRT(-VECDIAG(INV(LL2))); ZVALUE=THETA/SE;

* print iteration history;

print JJ THETA SE ;

END;

* print the final estimates;

print JJ THETA SE ZVALUE INTERCEPT SIGMA;

JJ THETA SE

1 0.6000362 0.0530356

0.1972324 0.0044464

2 1.0112191 0.0490821

0.3726667 0.0085816

3 1.6117082 0.0578591

0.6303589 0.0154567

4 2.0731078 0.0678525

0.8612001 0.0240567

5 2.2180598 0.0746581

0.9469739 0.030719

6 2.2296905 0.077051

0.9546784 0.0331178

7 2.2297713 0.0772825

0.9547346 0.0333382

8 2.2297713 0.0772842

0.9547346 0.0333398

JJ THETA SE ZVALUE INTERCEPT SIGMA

8 2.2297713 0.0772842 28.851564 -2.335488 1.0474115

0.9547346 0.0333398 28.63646

Example: Poisson Regression using IML and GENMOD

This is SAS MACRO which uses PROC IML to perform maximium likelihood estimation, using Newtons method to find the iterative solution (based on Deddens h.o. #26).

Incidence of nonmelanoma skin cancer among women in Minn./

St. Paul vs. women in Dallas/Ft. Worth;

Ref: Lunneborg (1994) ch. 19 problem 2;

Kleinbaum, Kupper and Muller Regression book as well

Input variables:

age = midpt of age categ. 15-24, 25-34, . . . 75-84, 85+

city = 0 (Minneapolis-St. Paul)

= 1 (Dallas-Ft. Worth)

Response variables:

Cases = # nonmelanoma skin cancers

PYRS = person years of exposure

Additional documentation added: 9 November 2003

DATA KKM;

INPUT CITY AGE CASES PYRS @@;

LAGE=LOG((AGE-15)/35);

LPYRS=LOG(PYRS);

CARDS;

0 20 1 172675 1 20 4 181343

0 30 16 123065 1 30 38 146207

0 40 30 96216 1 40 119 121374

0 50 71 92051 1 50 221 111353

0 60 102 72159 1 60 259 83004

0 70 130 54722 1 70 310 55932

0 80 133 32185 1 80 226 29007

0 90 40 8328 1 90 65 7538

;

%MACRO POISSON(DATASET,COUNT,NUMBER,XVAR);

%* &count = response/count of the number of cases;

%* &number = person years of exposure;

%* &xvar = predictor variables;

PROC IML;

/* select response variables, predictors and offset */

USE &DATASET;

READ ALL VAR{&XVAR} INTO XX; NR=NROW(XX); NC=NCOL(XX); NV=NC+1;

READ ALL VAR{&COUNT} INTO Y;

READ ALL VAR{&NUMBER} INTO P;

/* initialize BETA vector and set up design matrix */

BETA=J(NV,1,0);

ONE=J(NR,1,1);

XX=ONE||XX;

DO ITER=1 TO 20 UNTIL (ADLL<.00001);

EB=EXP(XX*BETA); * estimated RATE;

PEB=P#EB; * predicted number of cases;

* = person-years X RATE;

LL=ONE`*(Y#LOG(PEB)-PEB);

LL0=ONE`*(Y#LOG(Y)-Y);

DLL=XX`*(-PEB+Y);

ADLL=MAX(ABS(DLL));

TWO=J(1,NV,1);

DDLL=-XX`*((PEB*TWO)#(XX));

BETA=BETA-INV(DDLL)*DLL; * update estimate;

BETAP=BETA`;

print BETAP (|FORMAT=8.4|) ITER ADLL (|FORMAT=12.4|);

DEV=2*(LL0-LL);

VAR=-INV(DDLL);

SE=SQRT(VECDIAG(VAR));

ZVALUE=BETA/SE;

LOWER=BETA-1.96*SE;

UPPER=BETA+1.96*SE;

CHISQ=ZVALUE##2;

PVALUE=2*PROBNORM(-ABS(ZVALUE));

END;

FINAL=BETA||SE||LOWER||UPPER||CHISQ||PVALUE;

print FINAL (|colname={ESTIMATE SE LOWER UPPER CHISQUARE PVALUE}

rowname={intercept &xvar} FORMAT=12.4|);

print ITER ;

print DEV (| FORMAT=12.4 |) LL (| FORMAT=12.4 |) LL0 (| FORMAT=12.4 |);

%MEND;

%POISSON(KKM,CASES,PYRS,CITY LAGE)

BETAP ITER ADLL

-0.9983 0.0009 0.0014 1 1385394.0000

-1.9937 0.0032 0.0053 2 509192.4582

-2.9814 0.0096 0.0157 3 186857.4228

-3.9485 0.0266 0.0437 4 68281.0472

-4.8631 0.0707 0.1167 5 24670.1512

-5.6579 0.1763 0.2958 6 8655.4209

-6.2430 0.3814 0.6737 7 2835.8993

-6.6197 0.6290 1.2675 8 930.6167

-6.8890 0.7648 1.8734 9 355.1142

-7.0416 0.7992 2.2156 10 107.8144

-7.0748 0.8030 2.2859 11 21.2470

-7.0760 0.8031 2.2883 12 0.7735

-7.0760 0.8031 2.2883 13 0.0009

-7.0760 0.8031 2.2883 14 0.0000

FINAL

ESTIMATE SE LOWER UPPER CHISQUARE PVALUE

INTERCEPT -7.0760 0.0476 -7.1694 -6.9826 22070.6964 0.0000

CITY 0.8031 0.0522 0.7008 0.9054 236.9187 0.0000

LAGE 2.2883 0.0627 2.1654 2.4112 1331.6379 0.0000

ITER