matrixcalc Đạo hàm ma trận PDF
matrixcalc Đạo hàm ma trận PDF
matrixcalc Đạo hàm ma trận PDF
Matrix calculus
From too much study, and from extreme passion, cometh madnesse.
− Isaac Newton [168, §5]
while the second-order gradient of the twice differentiable real function with respect to its
vector argument is traditionally called the Hessian ;
∂ 2f (x) ∂ 2f (x) ∂ 2f (x)
∂x21 ∂x1 ∂x2 · · · ∂x 1 ∂xK
∂ 2f (x) ∂ 2f (x) ∂ 2f (x)
2
· · · K
∇ f (x) , ∂x2
..
∂x1 ∂x 2
.. 2
∂x 2 ∂x
..
K ∈ S (1861)
..
2 . . . .
∂ f (x) ∂ 2f (x) ∂ 2f (x)
∂xK ∂x1 ∂xK ∂x2 · · · ∂x 2
K
Dattorro, Convex Optimization Euclidean Distance Geometry 2ε, Mεβoo, v2015.07.21. 577
578 APPENDIX D. MATRIX CALCULUS
where the gradient of each real entry is with respect to vector x as in (1860).
The gradient of real function g(X) : RK×L→ R on matrix domain is
∂g(X) ∂g(X)
· · · ∂g(X)
∂X11 ∂X12 ∂X1L
∂g(X) ∂g(X) ∂g(X)
· · ·
∇g(X) , ∈ RK×L
∂X21 ∂X22 ∂X2L
.. .. ..
. . .
∂g(X) ∂g(X) ∂g(X)
∂XK1 ∂XK2 ··· ∂XKL
(1866)
£
∇X(:,1) g(X)
∇X(:,2) g(X)
= ∈ RK×1×L
..
.
¤
∇X(:,L) g(X)
where gradient ∇X(:, i) is with respect to the i th column of X . The strange appearance of
(1866) in RK×1×L is meant to suggest a third dimension perpendicular to the page (not
D.1 The word matrix comes from the Latin for womb ; related to the prefix matri- derived from mater
meaning mother.
D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 579
and so on.
Because gradient of the product (1874) requires total change with respect to change in
each entry of matrix X , the Xb vector must make an inner product with each vector in
that second dimension of the cubix indicated by dotted line segments;
a1 0
0 a1
· ¸
b1 X11 + b2 X12
∈ R2×1×2
∇X (X Ta) Xb =
a
2 0 b1 X21 + b2 X22
0 a2 (1878)
· ¸
a1 (b1 X11 + b2 X12 ) a1 (b1 X21 + b2 X22 )
= ∈ R2×2
a2 (b1 X11 + b2 X12 ) a2 (b1 X21 + b2 X22 )
= abTX T
where the cubix appears as a complete 2 × 2 × 2 matrix. In like manner for the second
term ∇X (g) f
b1 0
b2 0
· ¸
X11 a1 + X21 a2
∈ R2×1×2
∇X (Xb) X Ta =
0 b1 X12 a1 + X22 a2 (1879)
0 b2
= X TabT ∈ R2×2
The solution
∇X aTX 2 b = abTX T + X TabT (1880)
can be found from Table D.2.1 or verified using (1873). 2
To disadvantage is a large new but known set of algebraic rules (§A.1.1) and the fact
that its mere use does not generally guarantee two-dimensional matrix representation of
gradients.
Another application of the Kronecker product is to reverse order of appearance in
a matrix product: Suppose we wish to weight the columns of a matrix S ∈ RM ×N , for
example, by respective entries wi from the main diagonal in
w1 0
.. ∈ SN
W , . (1883)
T
0 wN
A conventional means for accomplishing column weighting is to multiply S by diagonal
matrix W on the right-hand side:
w1 0
.. = S(: , 1)w1 · · · S(: , N )wN ∈ RM ×N
£ ¤
S W = S . (1884)
0T wN
To reverse product order such that diagonal matrix W instead appears to the left of S :
for I ∈ SM (Law)
S(: , 1) 0 0
.
0 S(: , 2) . .
T
S W = (δ(W ) ⊗ I ) ∈ RM ×N (1885)
. .. . ..
0
0 0 S(: , N )
s ◦ y = δ(s)y (1888)
sT ⊗ y = ysT
(1889)
s ⊗ y T = sy T
D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 583
∇X g f (X)T , h(X)T = ∇X f T ∇f g + ∇X hT ∇h g
¡ ¢
(1892)
· ¸ · ¸
1 0 ε 0
∇x g f (x)T , h(x)T = T
(A + AT )(f + h)
¡ ¢
(A + A )(f + h) + (1895)
0 ε 0 1
· ¸ µ· ¸ · ¸¶
1+ε 0 x1 εx1
∇x g f (x)T , h(x)T = T
¡ ¢
(A + A ) + (1896)
0 1+ε εx2 x2
where ek is the k th standard basis vector in RK while el is the l th standard basis vector in
RL . The total number of partial derivatives equals KLM N while the gradient is defined
in their terms; the mn th entry of the gradient is
∂gmn (X) ∂gmn (X) ∂gmn (X)
· · ·
∂X11 ∂X12 ∂X1L
∂gmn (X) ∂gmn (X) ∂g mn (X)
∂X ∂X · · · ∂X ∈ RK×L
∇gmn (X) = 21 22 2L (1899)
.. .. ..
. . .
∂gmn (X) ∂gmn (X)
∂XK1 ∂XK2 · · · ∂g∂X
mn (X)
KL
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y
g(X + t Y ) = g(X) + t dg (X) + o(t2 ) (1909)
which is the first-order Taylor series expansion about X . [235, §18.4] [166, §2.3.4]
Differentiation with respect to t and subsequent t-zeroing isolates the second term of
expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation equivalent
to individually differentiating and zeroing every entry gmn (X + t Y ) as in (1906). So the
directional derivative of g(X) : RK×L→ RM ×N in any direction Y ∈ RK×L evaluated at
X ∈ dom g becomes ¯
→Y d ¯¯
dg (X) = g(X + t Y ) ∈ RM ×N (1910)
dt ¯t=0
[294, §2.1, §5.4.5] [35, §6.3.1] which is simplest. In case of a real function g(X) : RK×L→ R
→Y
dg (X) = tr ∇g(X)T Y
¡ ¢
(1932)
D.2 Although Y is a matrix, we may regard it as a vector in RKL .
586 APPENDIX D. MATRIX CALCULUS
υ ✡T
✡
f (α + t y) ✡
✡
✡
(α , f (α))✡
∇x f (α)
✡ f (x)
υ , ✡
→∇x f (α)
1 ✡
2 df(α) ✡
∂H
In case g(X) : RK → R
→Y
dg (X) = ∇g(X)T Y (1935)
Unlike gradient, directional derivative does not expand dimension; directional
derivative (1910) retains the dimensions of g . The derivative with respect to t makes
the directional derivative resemble ordinary calculus (§D.2); e.g, when g(X) is linear,
→Y
dg (X) = g(Y ). [266, §7.2]
of its gradient is the gradient magnitude. (1935) For a real function of real variable, the
directional derivative evaluated at any point in the function domain is just the slope of
that function there scaled by the real direction. (confer §3.6)
Directional derivative generalizes our one-dimensional notion of derivative to a
multidimensional domain. When direction Y coincides with a member of the standard
Cartesian basis ek eTl (60), then a single partial derivative ∂g(X)/∂Xkl is obtained from
directional derivative (1908); such is each entry of gradient ∇g(X) in equalities (1932)
and (1935), for example.
→X−X ⋆
df (X) ≥ 0 (1911)
⋄
Such a vector is
∇x f (x)
υ= (1914)
→∇x f (x)
1
2 df(x)
∇ ∂g∂X
mn (X)
11
∇ ∂g∂X
mn (X)
12
··· ∇ ∂g∂X
mn (X)
1L
∂gmn (X)
∇ ∂g∂X
mn (X)
∇ ∂g∂X
mn (X)
∇ ··· ∈ RK×L×K×L
∇ 2 gmn (X) = ∂X21 22 2L
.. .. ..
. . .
∇ ∂g∂X
mn (X)
K1
∇ ∂g∂X
mn (X)
K2
··· ∇ ∂g∂X
mn (X)
KL
(1918)
∂∇gmn (X) ∂∇gmn (X) ∂∇gmn (X)
∂X11 ∂X12 ··· ∂X1L
∂∇gmn (X) ∂∇gmn (X) ∂∇gmn (X)
···
=
∂X21 ∂X22 ∂X2L
.. .. ..
. . .
∂∇gmn (X) ∂∇gmn (X) ∂∇gmn (X)
∂XK1 ∂XK2 ··· ∂XKL
∇ ∂g(X)
∂X11 ∇ ∂g(X)
∂X12 ··· ∇ ∂g(X)
∂X1L
∂g(X)
∇ ∂X21 ∇ ∂g(X) ··· ∇ ∂g(X)
2
∇ g(X) T1
= ..
∂X22
..
∂X2L
.. ∈ RK×L×M ×N ×K×L (1920)
. . .
∇ ∂g(X)
∂XK1 ∇ ∂g(X)
∂XK2 ··· ∂g(X)
∇ ∂XKL
Assuming the limits exist, we may state the partial derivative of the mn th entry of g with
respect to the kl th and ij th entries of X ;
∂ 2g1N (X)
PP 2
∂ 2g12 (X)
∂ g11 (X) PP PP
Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
i,j k,l ∂Xkl ∂Xij i,j k,l i,j k,l
2
P P ∂ 2g21 (X) PP 2
∂ g22 (X) PP ∂ g2N (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· ∂Xkl ∂Xij Ykl Yij
=
i,j k,l i,j k,l i,j k,l
(1928)
.. .. ..
. . .
P P ∂ 2gM 1 (X) PP ∂ 2gM 2 (X) P P ∂ 2gMN (X)
∂Xkl ∂Xij Ykl Yij ∂Xkl ∂Xij Ykl Yij ··· Ykl Yij
∂Xkl ∂Xij
i,j k,l i,j k,l i,j k,l
590 APPENDIX D. MATRIX CALCULUS
Yet for all X ∈ dom g , any Y ∈ RK×L , and some open interval of t ∈ R
→Y 1 2 →Y2
g(X + t Y ) = g(X) + t dg (X) + t dg (X) + o(t3 ) (1930)
2!
which is the second-order Taylor series expansion about X . [235, §18.4] [166, §2.3.4]
Differentiating twice with respect to t and subsequent t-zeroing isolates the third term of
the expansion. Thus differentiating and zeroing g(X + t Y ) in t is an operation equivalent
to individually differentiating and zeroing every entry gmn (X + t Y ) as in (1927). So
the second directional derivative of g(X) : RK×L → RM ×N becomes [294, §2.1, §5.4.5]
[35, §6.3.1]
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) ∈ RM ×N
2
(1931)
dt t=0
which is again simplest. (confer (1910)) Directional derivative retains the dimensions of g .
→Y
dg (X) = tr ∇g(X)T Y
¡ ¢
(1932)
→Y ³ ¢T ´
µ
→Y
¶
dg (X) = tr ∇X tr ∇g(X)T Y Y = tr ∇X dg (X)T Y
2
¡
(1933)
à !
→Y µ ³ ¢T ´T
¶ →Y
3 T 2 T
¡
dg (X) = tr ∇X tr ∇X tr ∇g(X) Y Y Y = tr ∇X dg (X) Y (1934)
→Y
dg (X) = Y T ∇ 2 g(X)Y
2
(1936)
→Y ¢T
dg (X) = ∇X Y T ∇ 2 g(X)Y Y
3
¡
(1937)
and so on.
D.1. DIRECTIONAL DERIVATIVE, TAYLOR SERIES 591
→Y 1 2 →Y2 1 3 →Y3
g(X + µY ) = g(X) + µ dg (X) + µ dg (X) + µ dg (X) + o(µ4 ) (1938)
2! 3!
or on some open interval of kY k2
→Y −X 1 →Y2 −X 1 →Y3 −X
g(Y ) = g(X) + dg(X) + dg (X) + dg (X) + o(kY k4 ) (1939)
2! 3!
which are third-order expansions about X . The mean value theorem from calculus is what
insures finite order of the series. [235] [43, §1.1] [42, App.A.5] [215, §0.4] These somewhat
unbelievable formulae imply that a function can be determined over the whole of its domain
by knowing its value and all its directional derivatives at a single point X .
→Y
d 2 ¯¯
¯
dg (X) = 2 ¯ g(X + t Y ) = 2X −1 Y X −1 Y X −1
2
(1941)
dt t=0
→Y
d3 ¯
¯
dg 3(X) = 3 ¯¯ g(X + t Y ) = −6X −1 Y X −1 Y X −1 Y X −1 (1942)
dt t=0
Let’s find the Taylor series expansion of g about X = I : Since g(I ) = I , for kY k2 < 1
(µ = 1 in (1938))
If Y is small, (X + Y )−1 ≈ X −1 − X −1 Y X −1 . 2
D.1.8.1 first-order
Removing evaluation at t = 0 from (1910),D.4 we find an expression for the directional
derivative of g(X) in direction Y evaluated anywhere along a line {X + t Y | t ∈ R}
intersecting dom g
→Y d
dg (X + t Y ) = g(X + t Y ) (1945)
dt
In the general case g(X) : RK×L→ RM ×N , from (1903) and (1906) we find
d
tr ∇X gmn (X + t Y )T Y = gmn (X + t Y )
¡ ¢
(1946)
dt
which is valid at t = 0, of course, when X ∈ dom g . In the important case of a real
function g(X) : RK×L→ R , from (1932) we have simply
d
tr ∇X g(X + t Y )T Y = g(X + t Y )
¡ ¢
(1947)
dt
d
∇X g(X + t Y )T Y = g(X + t Y ) (1948)
dt
tr ∇X g(X + t Y )T Y = tr 2wwT(X T + t Y T )Y
¡ ¢ ¡ ¢
(1949)
T T T
= 2w (X Y + t Y Y )w (1950)
d d T
g(X + t Y ) = w (X + t Y )T (X + t Y )w (1951)
dt dt ¡
= wT X T Y + Y TX + 2t Y T Y w
¢
(1952)
T T T
= 2w (X Y + t Y Y )w (1953)
tr ∇X g(X + t Y )T Y = 2wT¡(X T Y + t Y T Y )w ¢
¡ ¢
= 2 tr wwT(X T + t Y T )Y
tr ∇X g(X)T Y = 2 tr wwTX T Y (1954)
¡ ¢ ¡ ¢
⇔
∇X g(X) = 2XwwT
2
D.1.8.2 second-order
Likewise removing the evaluation at t = 0 from (1931),
→Y
2 d2
dg (X + t Y ) = g(X + t Y ) (1955)
dt2
we can find a similar relationship between second-order gradient and second derivative: In
the general case g(X) : RK×L→ RM ×N from (1924) and (1927),
³ ¢T ´ d2
tr ∇X tr ∇X gmn (X + t Y )T Y Y = 2 gmn (X + t Y )
¡
(1956)
dt
In the case of a real function g(X) : RK×L→ R we have, of course,
³ ¢T ´ d2
tr ∇X tr ∇X g(X + t Y )T Y Y = 2 g(X + t Y )
¡
(1957)
dt
From (1936), the simpler case, where real function g(X) : RK → R has vector argument,
d2
Y T ∇X2 g(X + t Y )Y = g(X + t Y ) (1958)
dt2
K×K
Setting Y to a member of {ek eT
l ∈R | k , l = 1 . . . K } , and employing a property (39)
of the trace function we find
∇ 2 g(X)kl = ∇h(X)kl = − X −1 ek eT −1
∈ RK×K
¡ ¢
l X (1965)
2
From all these first- and second-order expressions, we may generate new ones
by evaluating both sides at arbitrary t (in some open interval) but only after the
differentiation.
d −1
x , ∇x 1T δ(x)−1 1 (1966)
dx
For A a scalar or square matrix, we have the Taylor series [80, §3.6]
∞
A
X 1 k
e , A (1967)
k!
k=0
D.2.1 algebraic
∇x (Ax − b) = AT
∇x xTA − bT = A
¡ ¢
∇x xTAx + 2xTB y + y TC y = A + AT x + 2B y
¡ ¢
¡ ¢
T T
∇x (x
¡ + y) A(x + y) = (A +¢A )(x + y)
∇x2 xTAx + 2xTB y + y TC y = A + AT
∇X aTX −1 b = −X −T abT X −T
confer
∂X −1
∇X (X −1 )kl = = −X −1 ek eT
l X −1
, (1901)
∂Xkl
(1965)
∇x aTxTxb = 2xaT b ∇X aTX TXb = X(abT + baT )
algebraic continued
d
dt (X + tY ) = Y
d T
dt B (X + t Y )−1 A = −B T (X + t Y )−1 Y (X + t Y )−1 A
d T
dt B (X + t Y )−TA = −B T (X + t Y )−T Y T (X + t Y )−TA
d T
dt B (X + t Y )µ A = ... , −1 ≤ µ ≤ 1, X , Y ∈ SM +
d2
dt2
B T (X + t Y )−1 A = 2B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
3
d
dt3
B T (X + t Y )−1 A = −6B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
d
(X + t Y )TA(X + t Y ) = Y TAX + X TAY + 2 t Y TAY
¡ ¢
dt ¡
2
d
(X + t Y )TA(X + t Y ) = 2 Y TAY
¢
dt2¡ ¢−1
d T
dt (X¡+ t Y ) A(X + t Y ) ¢−1 T ¢−1
= − (X + t Y ) A(X + t Y ) (Y AX + X TAY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
d
dt ((X + t Y )A(X + t Y )) = YAX + XAY + 2 t YAY
d2
dt2
((X + t Y )A(X + t Y )) = 2 YAY
2 T 2 T T T T
∇vec X tr(A XBX ) = ∇vec X vec(X) (B ⊗ A) vec X = B ⊗ A + B ⊗ A
D.2. TABLES OF GRADIENTS AND DERIVATIVES 597
D.2.3 trace
∇x µ x = µI ∇X tr µX = ∇X µ tr X = µI
d −1
∇x 1T δ(x)−1 1 = dx x = −x−2 ∇X tr X −1 = −X −2T
∇x 1 δ(x) y = −δ(x)−2 y
T −1
∇X tr(X −1 Y ) = ∇X tr(Y X −1 ) = −X −T Y TX −T
d µ
dx x = µx µ−1 ∇X tr X µ = µX µ−1 , X ∈ SM
∇X tr X j = jX (j−1)T
¢T
∇x (b − aTx)−1 = (b − aTx)−2 a ∇X tr (B − AX)−1 = (B − AX)−2 A
¡ ¢ ¡
k−1 ¢T
∇X tr(Y X k ) = ∇X tr(X k Y ) = X i Y X k−1−i
P¡
i=0
∇X tr (X + Y )T (X + Y ) = 2(X + Y ) = ∇X kX + Y k2F
¡ ¢
trace continued
d d
dt tr g(X + t Y ) = tr dt g(X + t Y ) [219, p.491]
d
dt tr(X + t Y ) = tr Y
d
dt tr j(X + t Y ) = j tr j−1(X + t Y ) tr Y
d
tr(X + t Y )j = j tr (X + t Y )j−1 Y
¡ ¢
dt (∀ j)
d
dt tr((X + t Y )Y ) = tr Y 2
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = k tr (X + t Y )k−1 Y 2 ,
¡ ¢ ¡ ¢
dt dt k ∈ {0, 1, 2}
k−1
d d
tr (X + t Y )k Y = tr(Y (X + t Y )k ) = tr (X + t Y )i Y (X + t Y )k−1−i Y
¡ ¢ P
dt dt
i=0
d
tr¡(X + t Y )−1 Y ¢ = − tr¡ (X + t Y )−1 Y (X + t Y )−1 Y ¢
¡ ¢ ¡ ¢
dt
d
dt tr¡B T (X + t Y )−1 A¢ = − tr¡B T (X + t Y )−1 Y (X + t Y )−1 A ¢
d
dt tr¡B T (X + t Y )−TA ¢ = − tr B T (X + t Y )−T Y T (X + t Y )−TA
d
dt tr B T (X + t Y )−k A = ... , k > 0
d
tr B T (X + t Y )µ A = ... , −1 ≤ µ ≤ 1, X , Y ∈ SM
¡ ¢
dt +
d2
tr B T (X + t Y )−1 A = 2 tr B T (X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 A
¡ ¢ ¡ ¢
dt2
d
(X + t Y )TA(X + t Y ) = tr Y TAX + X TAY + 2 t Y TAY
¡ ¢ ¡ ¢
dt tr ¡
d2
tr (X + t Y )TA(X + t Y ) = 2 tr Y TAY
¢ ¡ ¢
dt2 ³
d
¢−1 ´
+ t Y )TA(X + t Y )
¡
dt tr (X³¡
T
¢−1 T ¢−1 ´
(Y AX + X AY + 2 t Y TAY ) (X + t Y )TA(X + t Y )
T
¡
= − tr (X + t Y ) A(X + t Y )
d
dt tr((X + t Y )A(X + t Y )) = tr(YAX + XAY + 2 t YAY )
d2
dt2
tr((X + t Y )A(X + t Y )) = 2 tr(YAY )
D.2. TABLES OF GRADIENTS AND DERIVATIVES 599
d
dx log x = x−1 ∇X log det X = X −T
∂X −T −1 T
∇X2 log det(X)kl = = − X −1 ek eT
¡ ¢
l X , confer (1918)(1965)
∂Xkl
d
dx log x−1 = −x−1 ∇X log det X −1 = −X −T
d
dx log x µ = µx−1 ∇X log detµ X = µX −T
µ
∇X log det X = µX −T
1
∇x log(aTx + b) = a aTx+b ∇X log det(AX + B) = AT(AX + B)−T
d
dt log det(X + t Y ) = tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y ) = − tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(X + t Y )−1 = − tr ((X + t Y )−1 Y )
d2
dt2
log det(X + t Y )−1 = tr ((X + t Y )−1 Y (X + t Y )−1 Y )
d
dt log det(δ(A(x
³ + t y) + a)2 + µI) ´
−1
= tr (δ(A(x + t y) + a)2 + µI) 2δ(A(x + t y) + a)δ(Ay)
600 APPENDIX D. MATRIX CALCULUS
D.2.5 determinant
d
dt det(X + t Y ) = det(X + t Y ) tr((X + t Y )−1 Y )
d2
det(X + t Y ) = det(X + t Y )(tr 2 (X + t Y )−1 Y − tr((X + t Y )−1 Y (X + t Y )−1 Y ))
¡ ¢
dt2
d
dt det(X + t Y )−1 = − det(X + t Y )−1 tr((X + t Y )−1 Y )
d2
dt2
det(X + t Y )−1 = det(X + t Y )−1 (tr 2 ((X + t Y )−1 Y ) + tr((X + t Y )−1 Y (X + t Y )−1 Y ))
d
dt detµ (X + t Y ) = µ detµ (X + t Y ) tr((X + t Y )−1 Y )
D.2.6 logarithmic
Matrix logarithm.
d
dt log(X + t Y )µ = µY (X + t Y )−1 = µ(X + t Y )−1 Y , XY = YX
d
dt log(I − t Y )µ = −µY (I − t Y )−1 = −µ(I − t Y )−1 Y [219, p.493]
D.2. TABLES OF GRADIENTS AND DERIVATIVES 601
D.2.7 exponential
Matrix exponential. [80, §3.6, §4.5] [348, §5.4]
T T T
∇X etr(Y X)
= ∇X det eY X
= etr(Y X)
Y (∀ X , Y )
T
XT T
YT
∇X tr eY X = eY Y T = Y T eX
∇x 1T eAx = ATeAx
1
∇x log(1T e x ) = ex
1T e x
µ ¶
2 T x 1 x 1 x xT
∇x log(1 e ) = T x δ(e ) − T x e e
1 e 1 e
k k
µ ¶
Q 1
1 Q 1
∇x xik = xik 1/x
i=1 k i=1
k k
µ ¶µ ¶
1
1 1
1
∇x2 δ(x)−2 − (1/x)(1/x)T
Q Q
xik = − xik
i=1 k i=1 k
d tY
dt e = etY Y = Y etY
d X+ t Y
dt e = eX+ t Y Y = Y eX+ t Y , XY = YX
d 2 X+ t Y
dt2
e = eX+ t Y Y 2 = Y eX+ t Y Y = Y 2 eX+ t Y , XY = YX
d j tr(X+ t Y )
e = etr(X+ t Y ) tr j(Y )
dt j