This assignment asks you to match a set of points P=(p1,…,pn) to another set of points Q=(q1,…,qn) by finding an optimal transformation T. Similar problems are commonly faced in image processing, image retrieval, computer vision, and robotics.
1. Consider the version of this problem in two dimensions, and consider finding a linear transformation T(x)=Ax, where A is a 2x2 matrix
A=a11a12a21a22
Write out the expression f(A) that computes the sum of squared distances between pi and Aqi in terms of the variable matrix A. Then, compute the partial derivatives of this expression with respect to a11, a12, a21, and a22. You may perform the partial derivatives individually, or if you are feeling up to working with matrix derivatives, use the facts that ∇A(uTAv)= uvT and ∇AuTATA u=2AuuT for any vectors u, v.
2. Assemble the partial derivatives that you obtained in question 1 into matrix form:
∂f∂a11∂f∂a12∂f∂a21∂f∂a22(A)
and use matrix algebra to solve for the value(s) of A that are critical points of f(A). What matrices do you have to assume are invertible?
3. If you wished to solve this problem using Newton’s method, how fast would it converge? Think about the form of the gradient, and give the tightest answer that you can for this particular problem.
4. Now consider the constraint that the matrix must be symmetric (i.e., A=AT). Write down the condition involving Lagrange multipliers that must be satisfied by a solution to this problem (hint: there should be four variables, one constraint, and one Lagrange multiplier). Rearrange the expression so that A is on the left hand side and an expression involving l (and no other optimization variables) is on the right hand side. Consider how this expression differs from your answer for question 2. In light of this difference and the symmetricity constraint, what is the role of l? (You are not asked to solve for the optimal symmetric matrix)
5. Now consider the constraint that the matrix must be orthogonal (i.e., ATA=I). Write down the condition involving Lagrange multipliers that must be satisfied by a solution to this problem (hint: there should be four variables, four constraints, and four Lagrange multipliers).
6. Challenge question: solve the condition you wrote in question 5 to produce the optimal orthogonal transformation matrix(ces). Hint: write out the derivatives of the constraints as products of A with constant matrices. What is the geometric interpretation of the solution(s)?
1. f(A) = sumi ||pi – Aqi||2= sumi (piTpi + qiTATAqi – 2 piTAqi )
2. Ñf(A) = sumi [2 A qiqiT – 2 piqiT ] = 0 implies A = (sumi piqiT) (sumj qjqjT)-1. Here we have to have the matrix sumj qjqjT be non-singular, which essentially means that all of the qi’s need to span a 2-dimensional subspace of R2.
3. Newton’s method would converge in one step. Notice that the gradient is a linear function of each element of A, and so the Hessian matrix is constant. Also recall that Newton’s method, when applied to minimization, constructs a best-fit quadratic form to a function and solves for the optimal step for the quadratic form. Since the quadratic form is exact in this case, the optimum is reached in a single step.
4. The constraint A=AT can be written as g(A) = A12-A21=0 (because the constraints on A11 and A22 are vacuous). Taking the matrix derivative of g with respect to A, we have
Ñg(A) = [0 -1]
[1 0]
So the Lagrange equation that needs to be solved is Ñf(A*) + l* Ñg = 0.
We need to find A* = (sum piqiT ) (sum qiqiT)-1 + ½ l* Ñg (sum qiqiT)-1
so that A*12 = A*21.
To solve this equation, let Let B = (sum qiqiT)-1 and C= sum piqiT.
A* = C B + ½ l* [0 -1;1 0] B
Look at off-diagonal elements of CB: C11 B12 + C12 B22 and C21 B11 + C22 B21
Look at off diagonal elements of [0 -1;1 0]B: -B22 and B11.
Solve for l* so that the off diagonal elements are equal: C11 B12 + C12 B22 – ½ lambda*B22 = C21 B11 + C22 B21 + ½ l*B11
=>
2(C11 B12 + C12 B22 - C21 B11 - C22 B21 ))/ (B22+B11) = l*
A21*= A12* = [B11(C11 B12 + C12 B22)– B22(C21 B11 + C22 B21 ))]/ (B22+B11)
The role of the Lagrange multiplier lambda is to “resist” the breaking of symmetry by adding in an anti-symmetric component to A, namely dg/dA B.
5. The orthogonality constraint ATA=I gives the constraints
g1(A) = A112 + A212-1 = 0
g2(A) = A222 + A122-1 = 0
g3(A) = A11A12 + A21A22 = 0
g4(A) = A12A11 + A22A21 = 0
(note that constraints 3 and 4 are identical)
Ñg1(A) = [2 A11 0] = A [ 2 0; 0 0]
[2 A21 0]
Ñg2(A) = [0 2A12] = A [ 0 0; 0 2]
[0 2A22]
Ñg3(A) = Ñg4(A) = [A12 A11] = A [0 1; 1 0]
[A22 A21]
So we need to solve for Lagrange multipliers l1, l2, l3, l4 such that the condition
Ñf(A*) + l1 Ñg1 (A*)+ l2 Ñg2(A*) + l3 Ñg3(A*) + l4 Ñg4 (A*) = 0
and all of the gi(A*) =0 are satisfied.
6. A* (2 sumi qiqiT + [2l1, l3+l4; l3+l4, 2l2]) = 2 sumi piqiT
Since it doesn’t really matter what l3 and l4 are individually, let’s just say l3=l4.
A* (sumi qiqiT + [l1, l3; l3, l2]) = sumi piqiT
Let B = sumi qiqiT and C = sumi piqiT
Since A* is invertible and orthogonal, we can multiply both sides by A*T to get
B + [l1, l3; l3, l2] = A*T C.
So [l1, l3; l3, l2] = A*T C - B. Notice that the diagonal of this matrix can be any arbitrary value and the condition is satisfied. The only curious component is the off-diagonal, which can be any arbitrary value but needs to be equal on both the upper left and lower right. So,
l3 = A*11 C12 + A*21 C22 - B12 = A*12 C11 + A*22 C21 – B21.
Note that B is symmetric, so the B term drops out.
l3-B12 = A*11 C12 + A*21 C22 = A*12 C11 + A*22 C21.
Now combine this constraint with the three equalities
A*112 + A*212 = 1
A*222 + A*122 = 1
A*11A*12 + A*21A*22 = 0
Rearrange equality 3 to obtain A*11 = - A*21A*22 / A*12 and replace it in equality 1:
A*212 (A*222 + A*122) = A*122
Since A*222 + A*122 = 1, we get
A*212 = A*122
And hence A*112 = A*222 , which means either (a) A*21 = A*12 and A*11 = - A*22, or (b) A*21 = -A*12 and A*11 = A*22.
Let’s take case (a). The l3-B12 equality states that -A*22 C12 + A*12 C22 = A*12 C11 + A*22 C21, which implies that 0=A*12 (C11- C22) + A*22 (C21+ C12), which means that A*22/ A*12 = (C21+ C12)/ (C22- C11). This gives us a ratio, for which we should choose a scale such that the magnitude of the vector (A*22, A*12) is 1. The remaining elements of A* can then be determined using the case’s conditions. A similar equation is given for case b). In fact, the resulting matrix A* in case (a) is a mirror+ rotation and in case (b) it is a pure rotation.