# PRML课后习题及答案 108 收藏

Pattern recognition and Machine learning Solutions to the exercises: Tutors edition Markus Svensen and Christopher M. Bishop Copyright C 2002-2008 This is the solutions manual (tutors Edition) for the book Pattern Recognition and Machine learning (PRML; published by Springer in 2006). This release was created March 20, 2008. Any future releases (e.g. with corrections Lo errors)will be announced on the Prml web-Site(see below)and published via Springer PLEASE DO NOT DISTRIBUTE Most of the solutions in this manual are intended as a resource for tutors teaching courses based on prml and the value of this resource would be greatly diminished if was to become generally available. All tutors who want a copy should contact Springer directly The authors would like to express their gratitude to the various people who have provided feedback on pre-releases of this document The authors welcome all comments, questions and suggestions about the solutions as well as reports on (potential)errors in text or formulae in this document: please send any such feedback to prml-fb@microsoft.com Further information about prml is available from tp: //research. microsoft. com/cmbishop/PRML Contents Contents Chapter 1: Pattern Recognilion 7 Chapter 2: Density estimation 8 Chapter 3: Linear Models for Regression 62 Chapter 4: Linear Models for Classification 78 Chaptcr 5: Ncural Networks 93 Chapter 6: Kernel Methods 114 Chapter 7: Sparse Kernel machines 128 Chapter 8: Probabilistic Graphical models 136 Chapter 9: Mixture Models 149 Chapter 10: Variational Inference and EM 163 Chapter 11: Sampling Methods 198 Chaptcr 12: Latent Variables ....207 Chapter 13: Sequential Data Chapter 14: Combining Models 246 6 CONTENTS Solutions 11-14 Chapter 1 Pattern Recognition 1.1 Substituting(1. 1)into(1.2) and then differentiating with rcspcct to u wc obtain 0. Re-arranging terms then gives the required result 1.2 For the regularized sum-of-squares error function given by (1.4) the corresponding linear equations are again obtained by differentiation, and take the same form as (1. 122), but with Aij replaced by Aij, given b A;+入 1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginal probability of g an apple is giv p(arp(r)+plabip(b)+plalgiplg 0.2+-×0.2+×0.6=0.34 (3) where the conditional probabilities are obtained from the proportions of apples in each boX To find the probabilily that the box was green, given that the ruit we selected was an orange, we can use Bayes'theorem polo lgpl po The denominator in(4 is given b 0)= polyp(r)+plolb)p(b)+p(olg)plg) 0.2+×0.2+×0.6=0.36 from which we obt p(glo 3 0.61 =10×0 1.4 We are often interested in finding the most probable value for some quantity. In the case of probability distributions over discrete variables this poses little problem However, for continuous variables there is a subtlety arising from the nature of prob ability densities and the way they transform under non-linear changes of variable 8 Consider first the way a function f(a) behaves when we change to a new variable y where the two variables arc rclated by m=g(g). This defines a ncw function of y given by Suppose f (a) has a mode(i. e a maximum) at so that f()=0. The correspond ing mode of f(y) will occur for a value y obtained by differentiating both sides of with respect to y f(①)=f(9()g()=0. Assuming g(9)f0 at the mode, then f'lg())=0. However, we know that f(a)=0, and so we see that the locations of the mode expressed in terms of each of the variables w and y are related by -g(y), as one would expect. Thus, finding a mode with respect to the variable is completely equivalent to first transforming to the variable y, then finding a mode with respect to y, and then transforming back to Now consider the behaviour of a probability density pr(a)under the change of vari- ables c=g(y), where the density with respect to the new variable is Py(y) and is given by ((1.27). Let us write g(y)=sg(y)l where sc[1,+1]. Then(1. 27) can be written P/(y)=p2(9(y)sg(9) Differentiating both sides with respect to y then gives )=8p2(g(y){g(y)}2+spn(g(y)g"(y) Due to the presence of the second term on the right hand side of (9)the relationship -g() no longer holds. Thus the value of obtained by maximizing Pr () will not be the value obtained by transforming to py(y) then maximizing with respect to y and then transforming back to . This causes modes of densities to be dependent on the choice of variables. in the case of linear transformation the second term on the right hand side of (9) vanishes, and so the location of the maximum transforms according to2=g(g) This effect can be illustrated with a simple example, as shown in Figure 1. We begin by considcring a Gaussian distribution pa(r)ovcr r with mcan l=6 and standard deviation g=1, shown by the red curve in Figure l. Next we draw a sample of N=50, 000 points from this distribution and plot a histogram of their values, which as expected agrees with the distribution pa (a) Now consider a non-lincar changc of variables from m to y given by x=9(y)=ln(y)-n(1-y)+5 (10) The inverse of this function is given by +exp(2+5 Solutions 151.6 Figure 1 Example of the transformation of the mode of a density under a non- linear change of variables, illus y(9y) trating the different behaviour com- pared to a simple function. See the text for details 0.5 0 which is a logistic sigmoid function, and is shown in Figure I by the blue curve. If we simply transform pa(a)as a function of c we obtain the green curve pm(g (y)) shown in Figure I, and we see that the mode of the density pa(a)is transformed via the sigmoid function to the mode of this curve. However, the density over y transforms instead according to(1. 27) and is shown by the magenta curve on the left side of the diagram. note that this has its mode shifted relative to the mode of the green curve To confirm this result we take our sample of 50, 000 values of evaluate the corre- sponding values of y using (11), and then plot a histogram of their values. We see that this histogram matches the magenta curve in Figure I and not the green curve 1.5 Expanding the squarc wc havc Ef(x)-E[(x))]=Ef(x)2-2f(x)E(a)]+E[f(x)]2 Ef (a)-2EIf (E[f(c)+Elf(a) (x)2]-Ef(x)2 as required 1.6 The definition of covariance is given by(1.41)as cov, y=Ely]-E[] Using(1.33)and the fact that p(, y)=p(ap(y) when a and y are independent, we obtain =∑∑m(,yzy ∑)∑0y oey 10 Solutions 1-l. 8 and hence cov[, 3=0. The case where 2 and y are continuous variables is analo- gous, with(1.33)rcplaccd by(1.4)and thc sums rcplaccd by integrals 1.7 The transformation from Cartesian to polar coordinates is defined by rcos e y r sin (13) and hencc wc havc 2 2+y2=r2 where we have uscd thc well-known trigonometric result(2.177). Also the Jacobian of the change of variables is easily seen to be 0:c0x os6-rsin e sin g r cos e where again we have used(2. 177). Thus the double integral in(1.125)becomes 2丌 2o2 rdr de (14 0 l (15) 0 丌exp (2)(22) (16 (17) whcrc wc havc uscd thc changc of variables r=1. Thus Finally, using the transformation y=I-u, the integral of the Gaussian distribution becomes w(alp y=/=(2 (2丌σ require d 1. 8 From the definition(1.46 of the univariate Gaussian distribution, we have 2 =/( 1) d (18) TO

...展开详情

• 10
资源
• 3
粉丝
• 等级 PRML课后习题及答案 49积分/C币 立即下载
1/127                     49积分/C币 立即下载