Partial Least Squares:
Consider a multiple regression of a column Ynx1 on a matrix Xnxp where Y’Y=1 and X’X=Ipxp. If Y is a matrix then I center and scale so that Y’Y=I as well.
I want to find a vector Vpx1 with V’V=1, such that the nx1 column vector XV is as highly correlated with Y as possible. If the correlation is high enough, then I can regress Y on vector XV instead of on matrix X getting a single scalar coefficient c. There would then be a px1 vector b of “regression coefficients” given by b=cV.
If Ynxm is a matrix, I want to find Wmx1 with W’W=1 such that the column vectors (linear combinations of the columns of X and the columns of Y), XV and YW are as highly correlated as possible. This differs from principal component regression which does the singular value decomposition only on X. Because of the centering and scaling above, this means I want to maximize (XV)’(YW), i.e. V’X’YW. By definition, then, I want to find the first singular value and the left and right singular vectors of X’Y.
Now back to the case in which Y is a column vector and Y’Y=1, then the singular value decomposition gives X’Y = LDR’ where, because Y is just a column vector, R’ is the scalar 1 and D is the largest (and only) singular value. Things are a little different if Y is a matrix. In that case D is a diagonal matrix. With Y a vector with the centering and scaling, X’Y is a vector of correlations between matrix X and vector Y. The length of vector X’Y is D = the square root of Y’XX’Y, so to normalize, we divide by this length getting L=X’Y/sqrt(Y’XX’Y) and clearly X’Y=(X’Y/sqrt(Y’XX’Y))*sqrt(Y’XX’Y)*1 which is our desired L*D*R’. This means that L=X’Y/sqrt(Y’XX’Y) is the left singular vector of X’Y and so XL is the linear combination XV we sought at the start. Regressing Y on XV=XL we get coefficient c and thus the predicted Y is XLc=(X)(Lc)=Xb so that multiplying L by c gives the vector b of coefficients relating Y to X-> predicted Y = X b = X(cL).
In this SAS program, I modify the example in the PROC PLS documentation to initially center and scale, then I show with IML the calculations described above to verify that PROC PLS uses the results above to compute the solutions.
ods html close; ods listing; ods listing gpath="%sysfunc(pathname(work))";
datadata;
input x1 x2 y;
x1 = (x1-0.65627)/2.12473;
x2=(x2-0.21468)/1.78008;
y =(y-0.31225)/0.76742;
datalines;
3.37651 2.30716 0.75615
0.74193 -0.88845 1.15285
4.18747 2.17373 1.42392
0.96097 0.57301 0.27433
-1.11161 -0.75225 -0.25410
-1.38029 -1.31343 -0.04728
1.28153 -0.13751 1.00341
-1.39242 -2.03615 0.45518
0.63741 0.06183 0.40699
-2.52533 -1.23726 -0.91080
2.44277 3.61077 -0.82590
;
proccorrcov;
run;
procreg; model Y = X1 X2;
run;
procpls data=data nfac=1 method=rrr;
model y = x1 x2/solution;
run;
procpls data=data nfac=1 method=pcr;
model y = x1 x2/solution;
run;
procpls data=data nfac=1 method=pls; ** compare to IML below;
model y = x1 x2/solution;
run;
prociml; use data;
read all var{X1 X2} into X;
read all var{Y} into Y;
print X Y;
XPY=X`*Y; XPX=X`*X; YPY=Y`*Y;
print XPX XPY YPY; ** check centering and scaling, compute XPY ;
callsvd(L, D, R, XPY); print L D R; ** get left eigenvector L;
guess = sqrt(Y`*X*X`*Y); print guess; ** check that D is sqrt(YPX times XPY);
delta = inv((X*L)`*(X*L))*(X*L)`*Y;
print delta; ** regress Y on XL coefficient delta;
Ldelta=L*delta; print Ldelta; ** convert back to betas;