Matrix Differential Calculus

所需积分/C币:14 2013-12-12 16:33:13 98KB PDF

matrix Differential Calculus
Proof The proof is similar to(and simpler than) for Lemma 2.3 Example 2.5. If all the entries of matrix X are distinct then a Tr(X!A) aTr(AX A (28 OX OX Proof. The proof is similar to(and simpler than) for Lemma 2.3 Example 2.6. If all the entries of matrix X are distinct then: a Tr(X AX) a Tr(XX A) (A+AX a Proof. The method of proof is again as in the above lemmas, but let us go through it to convince ourselves. we know that Tr(XAx)=∑x1(AX)n=∑x1∑a1kxk k Now we calculate a/axpg element-wise to obtain ∑ XiiaikxI opa i,j, k Xjiaxpa 1xk)+∑ Xpa ∑ x1aaip+△ pk xka (A X)pa +(AX) Example 2.7. if all the entries of matrix X are distinct then a Tr(X XA) a Tr(XAX) aⅩ X(A+A) 2.10) Proof We note that Ir(XAX)=∑xi(AX)1=∑x 1j△akX Now we calculate a/ axa element-wise to obtain L,ji A aik xik x订 xpa xk)+∑ pa xpjajg+> aakxnk (XApa+(Xa)pa Example 2.8. If all the entries of matrix X are distinct then a Tr(X A) [X-1AX-1]7 (2.11) 3 Proof the proof can be found in the next section after appropriate simplifying theory has been introduced Example 2.9. If all entries of X are distinct and A is a constant then aTr(X AXY) aX A XY+AXY Proof the proof is very similar to the ones above but I have written it out here for concreteness. We note that r( X AXY)=∑x1(AXY)=∑x1 ajkxkly j, k, L Now we calculate a/axpg element-wise to obtain ∑ i,, k, l Xjl xpa akxk1y1)+∑(akxk1 L,,k, Xilin ait> ankXkIy ∑yu∑(X);(amp+2apk∑xkLy k ∑(XA)n/(Y)i+∑an 在nk(XY)ka (A XY pa +(AXY)pa Example 2.10. If all the entries of X and y are distinct and a and w are given matrices, then OTr((XYO W)A) X (A'OWY, and oIr(XY⊙W)A) =X7(AT⊙W) Y Proof. once again we find the proof is just like the ones above and I have just included this one too for the sake of concreteness we observe that Tr((XY O W)A=>(XYijWijaji=2xikykjwijaji Now we calculate a/axpg element-wise to obtain ∑ Vajupjaip=[(Aow)Y7p which completes the proof of the first part. The proof of the second part is similar Example 2.11. If we define f(X,Y)=Tr(XY⊙W)(XYeW))=‖(XY◎W), af(X,Y aX=2(XYOWOw)Y, and f(X, 2XT(XY⊙W⊙W) OY Proof Once again we trudge laboriously to write out the proof. by this time we suspect there should be a more intelligent symbolic method to proceed with these matrix calculus proofs rather than with such labor. Soon we shall employ the symbolic methods entirely af(, y) aijkl xjkykixjlyli axa axa ∑ aykixjl py +∑ OkixjlyLiw ji axpo ∑xpky1ya1h+2> Vaixplyli uni ik 2∑( XYOWOW)mY [2(XYoW⊙W)Y7]pe which completes the claim for the first case. Similarly we have af(x,r) aijkl Xjkykixjlylw y X认k p jaYla 22xjpxjlyla 2X7(XYoW⊙W」 which completes the proof. Example 2.12. If f(X)= 1 AXl, where A is m x n, X is n x r and other terms are of appropriate dimensions, then af(X) aX Proof: the proof is almost trivial, we have af(x) a2ii (ax)i pq 0∑ k aike x py ∑ aip for all From the formulas above, we see that the matrix of derivatives of f(X)assumes the form 1 a la 1a2 aX 1 an 1 an from which we immediately conclude our proof. Example 2.13. If f(X)= Tr(X log x)(where the log is elementwise, and not the matrix logarithm function) then af(x) X aX +log Proo af(x) aii xijlog 0x4 x ax axii log alog x沙+2axp pg tp +log xap The claim follows immediately now a trivial observation is furnished by the following example Example 2. 14. If f(X)=1 g(X)1, where g(X)=[g(xii)l, then af(X) [ ag(xij) Proof the proof is trivial and depends only on the assumptions that the elements of the matrix X are independent of each other, and that g(X)is an elementwise function of X Now we have a useful and interesting lemma Example 2. 15. If f(X)=Tr(A log(Bx))then af(X) dB(A②(BX) Proof We begin as usual by writing out the problem in elementwise form af(x) aijaji log (BX)ji o∑ j log( k bjkxki) aja(BX)ja Derivative nonzero only for k= p ∑b1(A②(BX)i Hence the proof is done. Another interesting lemma is given below Example 2. 16. If f(x)=1 Alog(bX)i then a=B471 af(X) BX Proof This time again the proof is routine. We shall go through it for the sake of completeness f(X)o∑;(Alog(BX)n axa xpa Dijk aik dlog(Bx))kj x C2订j a认klog(∑1bkx arpa b (BX)ka Derivative nonzero for l= p,j=q ∑ k (BX) (A111 (BX)k A711 BX k >A17 BX pg. A more general lograrithmic derivative now Example 2.17. If f(X)=Tr(X Alog(BX)) then af(X) B AX +alog (bx) OX BX Proof we can apply the product rule ito be developed clearly] we have af(x) aii(XA)ilog(BX) ax (X A)i alog(BX) kicks 1∑log(BX)xab bT/at ∑ 、BX+∑ aaj log(BX)jb [B(AXO BXlab+ [Alog(blab That completes our proof Example 2.18. If f(X)=Tr(AX log(BX))then af(X) X14 B aX BX +A log(X b Proof Proof is just like that for Lemma 2.17 Lemma Function/(X) Derivative df(X) X(Symmetric X) Xii if i= j 2Xifi≠j 2.3 Tr(XA)(Symmetric X) A+a-diag(A 24 Tr(XA), Tr(AX A Tr(XTA). Tr(AXT A Tr(X AX), Tr(XX A (A+AX Tr(X XA), Tr(XAX!) X(A+A) 2.8 Tr(X-A XA X Tr(X AXY) A XY+AXY 2.10-i Tr(XY⊙W)A) (A⊙W)Yr 2.10i With respect to Y X(A7⊙W) 2.⊥1 Tr((XYoW)(XY⊙W)) 2(XY⊙WoW)YT 2.11ⅱi With respect to y 2X1(XY⊙W⊙W) 2.12 11AX1 A111 2.13 Tr(XlogX) 2+log 2.14 1 9(X)1 where g(X)=[g(xij) g(X 2.15 A log(Bx)) B(AO(BX) 2.16 1 AlOg(BX)l bTA 2.17 Tr(X Alog (BX)) AX + Alog(Bx 2.18 Tr(AX log(BX) TAX+Alog(BX) B Table 1: Summary of matrix calculus lemmas Function f(X) Derivative d x) r(XA), Tr(AX), Tr(XA), Tr(AX) diag(a) Tr(X AX), Tr(XX A), Tr(X XA), Tr(XAXI 2diag(AX)=2(A⊙X) Tr(XAXY 2 diag (aXr) Tr(XYO W)A diag((aow)yT)=diag((Y O W)A Table 2: Matrix calculus Lemmas with Diagonal X Table 2 shows some sample derivatives when the matrix X is diagonal. We observe that these derivatives are simply the diagonals of the general matrices above. One may arrive at these results easily setting xii =0, i+j in the general derivations After that slew oflaborious derivations, it should be quite evident that we require a better formalism The aim of the next section is to summarize the rudiments of such a formalism, and i hope it will help the reader obtain a quick start into the area 3 Matrix differential calculus The material in this section is based on the useful book: Matrix Differential Calculs by magnus and Neudecker [1999] Let f be d scalar function of an nx 1 vector x. The derivative of x is denoted as Df(x), and we write D f(x)=(D1f(x),., Dn.f(x)) af(x If f: Rn-Rm(i. e, f is an m x l vector function of the vector x), then the derivative of f(also called the Jacobian matrix) is the m x n matrix Df(x) af( Cx These ideas can be generalized to matrix functions of matrices or to tensor functions of tensors, but we shall restrict our attention either scalar functions of matrices and vector functions of vectors Now to find the jacobian matrix of a function either we could proceed elementwise as we did in the previous section or use a more principled approach. It is patently evident how laborious and error prone the low-level approach of the previous section is. We will therefore not evaluate each of the partial derivatives elementwise, but instead endeavor to find the differential, and using the differential we will be able to read off the derivative directly From the first identification theorem [See magnus and Neudecker, 1999, Theorem 5.6, pp 123] we know that there is a one-to-one correspondence between the differential of f and its Jacobian. The theorem tells us that d f (x)=A(x)dx e Df(x)=A(x) whereby knowing the differential immediately yields the Jacobian. As an aside we mention here that if F(X is a matrix function then cⅤecF(X))=A(X)dvec(X)令DF(X)=A(X) For more details please also see [see magnus and Neudecker, 1999 Table 2, pp 176 from which we know that if f is a scalar function of a matrix variable X then df(X)=Tr(a dX)=vec(a) aTr(AX) A OX Lemma 3. 1(Product rule). IfX andy are any matrices then prove that (XY=(dXY+X(dr) Proof we have ld(xr)lpa=d(xy)pa d(xpk yka) >(dxp)yka +xpk(dy k [(dx)Ypa i [x(dr)lpa Hence the proof is complete Example 3. 2. If F(X)=AXY, where a and Y are constants. Then dF(X)=A(dXY d vec (F(X))=(Y A)dvec(X) DF(X=Y A Example 3.3. Let f(x)=a'x where a is a vector of constants. Then df(x)=adx, thus Df(x)=a Example 3. 4. Let f(x)=xTAx, where a is square and constant. Then d f(x)=d(x ax)=(dx) Ax +x Adx =x A dx+x Adx x(a+adx Df(x)=x(A+A') Example 3.5. If f(x)=a g(x)then, d f(x)=a dg(x=a Dg(xdx →Df(x)=aD9(x) Example 3.6. If f(x)=g(xh(x) then cf(x)=d(g(xh(x)) (dg(x)h(x)+g(x)(dh(x)) h(x)dg(x)+g(x)Dh(x)dx h(x)dg(x)dx+g(x)Dh(x)dx =Df(x)=h(x)Dg(x)+9(x)Dh(x) Example 3.7. If f(x)=x Ag(x) then d f (x =d(x Ag( (dx)Ag(x)+x Adg(x) g(x)a dx+x ADg(x)dx =Df(x)=g(x)A+xAdg(x) Example 3. 8. If f(x)=g(x)Ag(x) then df(x)=d(g1(x)g(x)=(9(x))Ag(x)+91(x)Ag(x) g(x)A Dg(x) dx+g(x)ADg(x)dx Df(x)=g(x)(A+a)Dg(x) From the previous section and generalizing the examples above we note that d Tr(X)= tr(dx)which implies that D1=I. We now derive some of the results of the previous section using this technique and observe that the procedure becomes more elegant. Example 3.9. If f(X)= Tr(X A) then df(X)=dTr(XA)=dTr(AXT) Tr(A(dX)) Tr(dXA)= Ir(a dX) af(X) aX 0


评论 下载该资源后可以进行评论 2

ygys1234 只有13页,0星
那个戴黑色棒球帽的姑娘 很好地一本工具书

关注 私信 TA的资源