SIGGRAPH2011年论文 realtime performance-based facial animation

所需积分/C币:25 2012-03-01 10:56:42 19.08MB PDF
收藏 收藏 2
举报

利用KINECT实现人脸表情动画实时演算和形状融合。利用kinect实时获取人脸的2D纹理和3D深度图,在电脑中利用表情的优先权重,来实现实时的动画
盘 Accumulated scans Accumulated Kinect Raw Depth Maps 3D Model Morphable model, manual Markup eneric Template Non-rigid ICP cumu Kinect Raw Images User-specific Expressions Figure 3: Acquisition of user expressions for offine model build ing Aggregating multiple scans under slight head rotation reduces Example-based noise and fills in missing data Facial Rigging )-@自9 Generic Blendshapes we can use existing blendshape animations, that are ubiquitous in movie and game production, to define the dynamic expression pri- 合@导 ors. The underlying hypothesis here is that the blendshape weights of a human facial animation sequence provide a sufficient level of User-specific Blendshapes abstraction wo enable expression transfer between different charac- ters. Finally, the output generated by our algorithm, a temporal se Figure 5: Offline pre-processing for building the user-specific ex quence of blendshape weights, can be directly imported into com pression model. Pre-defined example poses of the user with known mercial animation tools, thus facilitating integration into existing blendshape weights are scanned and registered to a template mesh production workflows to yield a set of user-specific expressions. An optimization solves for the user-specific blendshapes that maintain the semantics of a Acquisition Hardware. All input data is acquired using the generic blendshape model. The inset shows how manually selected Kinect system, i.e. no other hardware such as laser scanners is re feature correspondences guide the reconstruction of user-specific quired for user-specific model building. The Kinect supports si- expressions multaneous capture of a 2D color image and a 3d depth map at 30 frames per second, based on invisible infrared projection (Figure 4 Essential benefits of this low-cost acquisition device include ease of essing step by adapting a generic blendshape model with a small deployment and sustained operability in a natural environment. The set of expressions performed by the user. These expressions are user is neither required to wear any physical markers or specialized captured with the Kinect prior to online tracking and reconstructed Makeup, nor is the per formance adversely affected by intrusive light using a morphable model combined with non-rigid alignment meth projections or clumsy hardware contraptions. However, these key ods. Figure 5 summarizes the different steps of our algorithm for dvantages come at the price of a substantial degradation in data building the facial expression model. We omit a detailed description quality compared to state-of-the-art performance capture systems of previous methods that are integrated into our algorithm. Please based on markers and/or active lighting. Ensuring robust process refer to the cited papers for parameter settings and implementation ing given the low resolution and high noise levels of the input data details is the primary challenge that we address in this paper Data Capture. To customize the generic blendshape rig, we 2 Facial Expression Model record a pre-defined sequence of example expressions performe by the user. Since single depth maps acquired with the Kinect e central component of our tracking algorithm is a facial expres hibit high noise levels, we aggregate multiple scans over time using sion model that provides a low-dimensional representation of the the method described in [Weise et al. 2008](see Figure 3). The user user's expression space. We build this model in an offline prepro is asked to perform a slight head rotation while keeping the expres ion fixed (s ).B face to the uis rotational motion has the additional benefit of alleviating reconstruction bias introduced by the spatially fixed infrared dot pattern projected by the Kinect. We use the method of [viola and Jones 2001] to detect the face in the first frame of the acquisition and accumulate the acquired color images to obtain the skin texture using Poisson reconstruction [Perez et al. 2003 Figure 4: The Kinect simultaneously captures a 640 x 400 color Expression Reconstruction. We use the morphable model of Blanz and Vetter [1999] to represent the variations of different hu nage and corresponding depth map at 30 Hertz, computed via tri man faces in neutral expression. This linear PCa model is first reg angulation of an infrared projector and camera istered towards the recorded neutral pose to obtain a high-quality normalized 0.1 max ti-ti- weight o frame rigid tracking mask Figure 6: The colored region on the left indicates the portion of the Figure 7: Robustly tracking the rigid motion of the face is crucial face used for rigid tracking. The graph on the rig ht illustrates how for expression reconstruction. Even with large occlusions and fast temporal filtering adapts to the speed of motion motion, we can reliably track the user's global pose the smoothed vector as weighted average in a window of size k as Template mesh that roughly matches the geometry of the users face We then warp this template to each of the recorded expressions us ing the non-rigid registration approach of [Li et al. 2009]. To im- 飞、Y/t=3 prove registration accuracy, we incorporate additional texture con- straints in the mouth and eye regions. For this purpose, we man ually mark features as illustrated in Figure 5. The integration of where ti- denotes the vector at frame i-3. The weights u'j are ned as constraints is straightforward and easily extends the frame of [Li et al. 2009] with positional constraints 们=e-小Hm3x11,句lt4-t=l∥ with a constant H that we empirically determine independently for rolation and translation based on the noise level of a salic pose. We Blendshape Reconstruction. We represent the dynamics of fa use a window size of k-5 for all our experiments cial expressions using a generic blendshape rig based on Ekman's Facial Action Coding System(FACS)[1978]. To generate the full Scaling the lime scale with the Maximun variation in the temporal set of blendshapes of the user we employ example-based facial rig window ensures that less averaging occurs for fast motion while ging as proposed by Li et al. [2010]. This method takes as input a high-frequency jitter is effectively removed from the estimated rigid generic blendshape model, the reconstructed example expressions pose(Figure 6, right). As shown in the video, this leads to a stable and approximate blendshape weights that specify the appropriate reconstruction when the user is perfectly still, while fast and jerky linear combinaLion of blendshapes for each expression. Since th motion can still be recovered accurately ser is asked to perform a fixed set of expressions, these weights are manually determined once and kept constant for all users. Given Non-rigld Tracking. Given the rigid pose, we now need to esti this data, example-based facial rigging performs a gradient-space mate the blendshape weights that capture the dynamics of the facial optimization to reconstruct the set of user-specific blendshapes that expression of the recorded user. Our goal is Lo reproduce the users best reproduce the example expressions(Figure 5). We use th performance as closely as possible, while ensuring that the recon- same generic blendshape model with m =39 blendshapes in all structed animation lies in the space of realistic human facial expres our examples sions. Since blendshape parameters are agnostic to realism and can easily produce nonsensical shapes, parameter fitting using geome 3 Realtime Tracking try and texture constraints alone will typically not produce satisfac tory results, in particular if the input data is corrupted by noise(see Figure &) Since human visual interpretation of facial imagery is The user-specific blendshape model defines a compact parameter highly sophisticated, even small tracking errors can quickly lead to space suitable for realtime tracking. We decouple the rigid from visually disturbing artifacts the non-rigid motion and directly estimate the rigid transform of the user's face before performing the optimization of blendshape 3.1 Statistical Model weights. We found that this decoupling not only simplifies the for mulation of the optimization, but also leads to improved robustness of the trackin We prevent unrealistic face poses by regularizing the blendshape weights with a dynamic expression prior computed from a set of existing blendshape animations A=(A1,., Au. Each anima Rigid Tracking. We align the reconstructed mesh of the previous tion A, is a sequence of blendshape weight vectors a; E R that frame with the acquired depth map of the current frame using ICP sample a continuous path in the m-dimensional blendshape space with point-plane constraints. To stabilize the alignment we use We exploit temporal coherence of these paths by considering a win pI dow of n consecutive frames, yielding an effective prior for both the gion from the registration as this part of the face typically exhibits geometry and the motion of the tracked user the strongest deformations as illustrated in figure 7 this results in robust tracking even for large occlusions and extreme facial expres MAP Estimation. Let Di=Gi, li) be the input data at the sions. We also incorporate a temporal filter to account for the high- current frame i consisting of a depth map Gi and a color im- frequency flickering of the Kinect depth maps. The filter is based age I;. We want to infer from Di the most probable blendshape on a sliding window that dynamically adapts the smoothing coef- weights xi E R for the current frame given the sequence X ficients in the spirit of the exponentially weighted moving average Xi-1,.,Xi-n of the n previously reconstructed blendshape vec method [Roberts 1959] to reduce high frequency noise while avoid LOTS. Dropping the index i for notational brevity we formulate this ing disturbing temporal lags. We independently filter the translation inference problem as a maximum a posteriori(MAP)estimation vector and quaternion representation of the rotation. For a transla- tion or quaternion vector ti at the current time frame i, we compute x= arg maxp(xD, Xn) where p(1 denotes the conditional probability. Using Bayes'rule we obtain x'= arg max p(Dx, Xn)p(x, Xn) 全會會 suning that d is conditionally independent of Xn given x, we can write x≈ arg max p(Dx)p(x,Xn likelihood Prior Distribution. To adequately capture the nonlinear structure of the dynamic expression space while still enabling realtime per formance, we represent the prior term p(x, Xn)as a Mixtures of Probabilistic Principal Component Analyzers(MPPCA)[Tipping and Bishop 1999b]. Probabilistic principal component analysis (PPCA)(see [Tipping and Bishop 1999a])defines the probabil ity density function of some observed data x E R by assuming that x is a linear function of a latent variable z e with s> t 詹圆會 1.e., x= Cz+H+e, 6) input data without prior with prior where z NN(O, I)is distributed according to a unit Gaussian, Figure 8: Without the animation prior tracking inaccuracies lead C E R is the matrix of principal components, u is the mean to visually disturbing self-intersections. Our solution significantly vector,and E w(o, a I) is a Gaussian-distributed noise variable reduces these artifacts. Even when tracking is not fully accurate as The probability density of x can then be written as in the bottom row, a plausible pose is reconstructed (x=N(xu, CC +oD USing this formulation, we define the prior in Equation 5 as a Likelihood Distribution. By assuming conditional indepen- weighted combination of K gaussians dence, we can model the likelihood distribution in Equation 5 as the product P(Dx)=pG xp(Ix). The two factors capture the alignment of the blendshape Inodel with the acquired depth nap p(x,Xn)-∑丌N(x,Xn|k,CAC+n.(8) and texture image, respeclively. We represent the distribuLion of each likelihood term as a product of Gaussians, treating each vertex of the blendshape model independently with weights Tk. This representation can be interpreted as a reduced-dimension Gaussian mixture model that attempts to model Let v be the number of vertices in the template mesh and B E the high-dimensional animation data with locally linear manifolds y x m the blendshape matrix. Each column of b defines a blend modeled with ppca shape base mesh such that Bx generates the blendshape represen- tation of the current pose. We denote with vi =(Bx)i the i-th ver- tex of the reconstructed mesh. The likelihood term p(Gx)models Learning the Prior. The unknown parameters in Equation 8 are a geometric registration in the spirit of non-rigid ICP by assuming the means uk, the covariance matrixes CK Ck, the noise parame a gaussian distribution of the per-vertex point-plane distances ters k, and the relative weights Tk of each PPCa in the mixture Inode. We learn Chese paraineters using the expectation maxiiniza tion(eM)algorithm based on the given blendshape animation se A. To increase the robustness of these computations, we P(Gx)=II n(vi-v) exp(一 ), 2-1(2丌geo )号 estimate the MPPCa in a latent space of the animation sequences L using principal component analysis. By keeping 99% of the to tal variance we can reduce the dimensionality of the training data where ni is the surface normal at Vi, and vi is the corresponding by two-thirds allowing a more stable learning phase with the em closest point in the depth map g algorithm. Equation can thus be rewritten as The likelihood term p(Ix)models texture registration. Since we acquire the users face texture when building the facial expression p(x,Xn)=∑丌N(x,X1Pk+,PMP model(Figure 3), we can integrate model-based optical How con straints [Decarlo and Metaxas 2000], hy formulating the likelihood function using per-vertex Gaussian distributions as where M =(Ck Ck +okn is the covariance matrix in the latent space, Pis the principal component matrix, and u the mean vector Since the em algorithm converges to local minima. we run the al- p(x)=l2ra2 exp(- 2a2 2一),(11) gorithm 50 times with random initialization to improve the learnin accuracy. We use 20 Gaussians to model the prior distribution and we use one-third of the latent space dimension for the ppca d where pi is the projection of vi into the image 1,vli is the gradient mension. More details on the implementation of the em algorithm of I at pi, and pi is the corresponding point in the rendered texture can be found in [Mclachlan and Krishnan 1996 image 3.2 Optimization In order to solve the map problem as defined by equation 5 we minimize the negative logarithm,1.e 堡窗回 x'=arg min-InP(Gx)-Inp(Ix)Inp(x, Xn).(12) Discarding constants, we write 會@最 x= arg min Egeo+ Eim prior (13) mplx, xn) ∑ln3(v,-v)‖,and (15) 量量 ∑|l The parameters o geo and im model the noise level of the data that conTrols the emphasis of the geonetry and inage likelihood terMs relative to the prior term. Since our system provides realtime feed back, we can experimentally determine suitable values that achieve stable tracking performance. For all our results we use the same settings geo= 1 and oim =0.45 The optimization of equation 13 can be performed efficiently using an iterative gradient solver, since the gradients can be computed tracked analytically(see the derivations in the Appendix). In addition, we input data virtual avatars expression model precompute the inverse covariance matrices and the determinants of the MPPCa during the offine learning phase. We use a gradi ent projection algorithm based on the limited memory BFGS solver [Lu et al. 1994] in order to enforce that the blendshape weights are between 0 and 1. The algorithm converges in less that 6 iterations as we can use an efficient warm starting with the previous solution We then update the closest point correspondences in Figec and Fim 每最 and re-compute the MAP estimation. We found that 3 iterations of this outer loop are sufficient for convergence blendshape base meshes 4 Results Figure 9: The user's facial expressions are reconstructed and mapped to different target characters in realtime, enabling inter We present results of our realtime performance capture and anima active animations and virtual conversations controlled by the per- tion system and illustrate potential applications. The output of the formance of the tracked user. The smile on the green character's tracking optimization is a continuous stream of blendshape weight base mesh gives it a happy countenance for the entire animation vectors x; that drive the digital character. Please refer to the ac companying video to beller appreciate the lacial dynamics of the animated characters and the robustness of the tracking, figures 1 Statistics. We use 15 user-specific expressions to reconstruct 39 and 9 illustrates how our system can be applied in interactive appli- blendshapes for the facial expression model. Manual markup of cations where the user controls a digital avatar in realtime. blend texture constraints for the initial offline model building requires ap- chape weights can be transmitted in realtime to enable virtual en counters in cyberspace. Since the blendshape representation facili- proximately 2 minutes per expression. Computing the expression model given the user input takes less than 10 minutes. We pre tates animation transfer, the avatar can either be a digital representa- compute the gaussian mixture model that defines the dynamic ex- tion of the user himself or a different humanoid character, assuming pression prior from a total of 9, 500 animation frames generated on compatible expression space the generic template model by an animation artist. Depending on the size of the temporal window, these computations take between While we build the user-specific blendshape model primarily for 10 and 20 minutes realtime tracking, our technique offers a simple way to create per sonalized blendshape rigs that can be used in traditional animation Our online system achieves sustained framerates of 20 Hertz with tools. Since the Kinect is the only acquisition device required, gen a latency below 150 ms. Data acquisition, preprocessing, rigid reg erating facial rigs becomes accessible for non-professional users istration, and display take less than 5 ms. Nonrigid registration 匾应 input dala expression model Figure ll: Difficult tracking configurations. Right: despite the occlusions by the hands, our algorithm successfully tracks the rigid input data geometry only texture only geometry+texture motion and the expression of the user: Left: with more occlusion or very fast motion, tracking can fail Figure 10: The combination of geometric and texture-based tration is essential for realtime tracking To isolate the effects individual components, no animation prior is used in this example tracked model more closely matches the performing user. What the prior achieves in any case is that the reconstructed pose is plau- sible, even if not necessarily close to the input geometrically(see including constraint setup and gradient uptimization require 45 ms also Figure 8). We argue that this is typically much more tolerable per frame. All timing measurements have been done on a Intel I7 than generating unnatural or even physically impossible poses that 2. 8 Ghz with 8 GBytes of main memory and a ATI Radeon HD 4850 could severely degrade the visual perception of the avatar. In addi graphics card tion,our approach is scalable in the sense that if the reconstructed animation does not well represent certain expressions of the user, 5 Evaluation we can manually correct the sequence using standard blendshape animation tools and add the corrected sequence to the training data We focus our evaluation on the integration of 2D and 3D input data set. This allows to successively improve the animation prior in a and the effect of animation training data. we also comment on bootstrapping manner. For the temporal window Xn used in the limitations and draw backs of our approach animation prior, we found a window size of3< n 5 to yield good results in general. Longer temporal spans raise the dimen- signality and lead to increased temporal smoothing. If the window Geometry and texture figure 10 evaluates the interplay be is too small, temporal coherence is reduced and discontinuities in tween the geometry and texture information acquired with the the tracking data can lead to artifacts Kinect Tracking purely based on ge- Limitations. The resolution of the acquisition system lir ometry as proposed in [Weise amount of geometric and motion detail that can be tracked for each et al. 2009] is not successful due user, hence slight differences in expressions will not be captured to the high noise level of the adequately. This limitation is aggravated by the wide-angle lens Kinect data. Integrat del of the Kinect installed to enable full-body capture which confines based optical flow constraints the face region to about 160x 160 pixels or less than 10%o of the reduces temporal jitter and sta- Lotal image area. As a resull, our systen cannot recover sInall-scal WEise et al. 200 bilizes the reconstruction, In our wrinkles or very subtle movements. We also currently do not model riments, only the co cyes, teeth, tongue, or hair tion of both modalities yielded satisfactory results. Compared to purely image-based tracking as e. g in [Chai et al. 2003, direct ac In our current implementation, we require user support during pre cess to 3D geometry offers two main benefits: We can significant processing in the form of manual markup of lip and eye features improve the robustness of the rigid pose estimation in particular for lo register the generic TeMplate with the recorded training poses non-frontal views(see also Figure 7). In addition, the expression (see Figure 5). In future work, we want to explore the potential of template mesh generated during preprocessing much more closely generic active appearance models similar to [Cootes et al. 2001]to matches the geometry of the user, which further improves track automate this step of the offline processing pipeline as well ing accuracy. Figure II shows difficult tracking configurations and provides an indication of the limits of our algorithm While offering many advantages as discussed in Section 1. 2, the blendshape representation also has an inherent limitation: The number of blendshapes is a tradeoff between ex pressiveness of the Animation Prior Figure 12 studies the effectiveness of our prob model and suitability for tracking. Too few blendshapes may re abilistic tracking algorithm when varying the amount of training sult in user expressions that cannot be represented adequately by data used for the reconstruction. The figure illustrates that if the the pose space of the model. Introducing additional blendshapes training data does not contain any sequences that are sufficiently to the rig can circumvent this problem, but too many blendshapes close to the captured performance, the reconstruction can differ may result in a different issue: Since blendshapes may become ap- substantially from the acquired data. with more training data, the proximately linearly dependent, there might not be a unique set of 堡回园园回 Acknowledgements. We are grateful to Lee Perry-Smith for ding the fa template, Dan burke for sculpting the CG characters, and Cesar Bravo, Steven Mclellan, David rodrigues, and volker Helzle for the animations. We thank Gabriele Fanelli for our valuable discussions, duygu Ceylan and Mario Deuss for being actors, and Yuliy schwarzburg for proof- reading the paper. This research is supporled by Swiss National Science Foundation grant 20PA2IL-129607 Appendi We derive the gradients for the optimization of e 13.Th 鱼霞會會會會 energy terms for geometry registration geo and optical fow Lim can both be written in the form input data 100% ∫(ⅹ I Ax-b Figure 12: Effect of different amounts of training data on the per hence the gradients can easily be computed analytically as formance of the tracking algorithm. We successively delete blend- shapes from the input animation sequences, which removes entire af (x) A(Ax-b) (18) portions of the expression space. With only 25%of the blendshapes dx in the training data the expressions are not reconstructed correctly The prior term is of the form blendshape weights for a given expression. This can potentially re ∑丌N(x,xnk,∑k) sult in unstable tracking due to overfitting of the noisy data. While the prior prevents this instability, a larger number of blendshapes re where 2k is the covariance matri> The Gaussians quires a larger training database and negatively affects performance W(x, Xnu:, 2k)model the combined distribution of the current blendshape vector x E R and the n previous vectors Xn 6 Conclusion hence the k are matrices of dimension(n +i)m m+1)m Since we are only interested in the gradient with respect to x, we We have demonstrated that high-quality performance-driven fa- can discard all components that do not depend on this variable. We cial animation in realtime is possible even with a low-coSt, non- split the mean vectors as uk =(uk, uk), corresponding to x and ntrusive, markerless acquisition system. We show the potential of Xn, respeclively. We can write the inverse of 2kas our system for applications in human interaction, live virtual TV shows, and computer gaming Ak B (m×m)(m×n:m) (20) Robust realtime tracking is achieved by building suitable user C (m×m)(mm×nm) specific blendshape models and exploiting the different character- islics of the acquired 2D image and 3D deplh Inap data for regis tration. We found that learning the dynamic expression space from with Bk=Ck. We then obtain for the gradient of the prior energy existing animations is essential. Combining these animation priors ith effective geometry and texture registration in a single MAP es- aPrior timation is our key contribution to achieve robust tracking even for highly noisy input data. While foreseeable technical advances in ∑k=1xkN(x,Xn|4k,∑k)(x-)rAk+(Xn-12)Ck acquisition hardware will certainly improve data quality in coming ars, numerous future applications, e.g. in multi-people tracking, 乙k=17kN(x,Xnux,∑k acquisition with mobile devices, or performance capture in diffi The complete gradient is the sum of the three energy gradients de- cult lighting conditions, will produce even worse data and will thus rived above systematic framework for addressing these challenging prublellf 9 put even higher demands on robustness. Our algorithm provides 9(x)=dEgeo a+ ax ae prior ax ax (22) We believe that our system enables a variety of new applications and can be the basis for substantial follow-up research. We cur rently focus on facial acquisition and ignore other important as pects of human communication, such as hand gestures, which pose interesting technical challenges due to complex occlusion patterns Enhancing the tracking performance using realtime speech analy sis, or integrating secondary effects such as simulation of hair are further areas of future research that could help increase the realism of the generated virtual performances. More fundamentally, being able to deploy our system at a massive scale can enable interesting new research in human communication and paves the way for new interaction metaphors in performance-based game play References LIH, ROIVAINEN P. and FORCHEiMERR. 1993. 3-d motion estimation in model-based facial image coding. PAM15,545 aleXaNDER,O. ROGERS. M. LAMBETH W. CHANG. M AND DEBEVEC, P. 2009. The digital emily project: photoreal facial modeling and animation. ACM/GGRAPH 2009 Courses LI.H.. ADAMSB. GUIBAS. L.J. AND PAULY.M. 2009. Robust single-view geometry and motion reconstruction. ACM Trans BEELER, T, BICKEL, B, BEARDSLEY, P, SUMNER, B, AND Graph.28,175:1-175:10. GROSS,M. 2010. High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29, 40: 1-40: 9 LI, H, WEISE, T, AND PAULY, M. 2010. Example-based facial rigging. ACM Trans. Graph. 29, 32: 1-32: 6 BLACK, M.J., AND YACOOB, Y. 1995. Tracking and recognizing rigid and non-rigid facial motions using local parametric models LiN, I.-C, AND OUHYOUNG, M. 2005. Mirror mocap: Automatic of image motion In Iccv.374-38 and efficient capture of dense 3d facial motion parameters from video. The visual Computer 21, 6, 355-372 BLANZ. V, AND VETTER, T.1999. A morphable model for the LOU, H, AND CHAL, J. 2010. Example-based human motion synthesis of 3d faces. In Proc. SIGGRAPH 99 denoising IEEE Trans. on visualization and Computer graphics BORShUKOV.G. PipoNI D. larseN. O. LEWIS.J.P. AND 16,870-879 TEMPELAAR-LIETZ, C. 2005. Universal capture - image-based facial animation for the matrix reloaded. In siggraph 2005 LU, P, NOCEDAL, J, ZHU, C, BYRD, R. H, AND BYRD,R. H 1994. A limited-memory algorithm for bound constrained opti- Courses mization. SIAM Journal on Scientific Computing BRADLEY. D. HEIDRICH. W. POPA. T. AND SHEFFER. A MA. W.-C. HaWkins. T. Peers. P. CHABERT. C.-F.. WEISS 2010. High resolution passive facial performance capture. ACM M., AND DEBEVEC, P. 2007. Rapid acquisition of specular and Tran. Graph.29,41:1-41:10 diffuse normal maps from polarized spherical gradient illumina CHAI,J.ⅹ.,ⅹIAO,J., AND HODGINS,J.2003.Ⅴ ISion- based tion In EUROGRAPHICS Symposium on Rendering control of 3d facial animation In sca MCLACIILAN, G. J AND KRISIINAN, T. 1996. The EM Algo- CHUANG, E, AND BREGLER, C. 2002. Performance driven facial rithm and Extensions. Wiley-Interscience animation using blendshape interpolation. Tech. rep, Stanford PEREZ, P, GANGNET, M, AND BLAKL, A. 2003. Poisson image University editing. ACM Trans. Graph. 22, 313-318 COOTES. T. EDWARDS. G. AND TAYLOR. C. 2001. Active PIGHIN. F. AND LEWIS. J. P. 2006. Performance-driven facial appearance models. PAM/ 23, 681-685 imation In ACm siggraPh 2006 Courses COVELL, M. 1996. Eigen-points: Control-point location using PIGHIN, F, SZELISKI,R, AND SALESIN, D. 1999. Resynthesiz principle component analyses. In FG 96 ing facial animation through 3d model-based tracking. ICCV 1, 143-150. DECARLO, D. AND METAXAS, D. 1996. The integration of optical flow and deformable models with applications to human ROBERTS, S. 1959. Control chart tests based on geometric moving face shape and motion estimation. In CVPR averages. In Tech nometrics. 239250 DECARLO, D, AND METAXAS, D. 2000. Optical flow constraints TIPPING, M.E., AND BISHOP, C.M. 1999. Probabilistic principal on deformable models with applications to face tracking. IJCV component analysis. Journal of the royal statistical society 38.99-127 Series b EKMAN, P, AND FRIESEN, W. 1978. Facial Action Coding Sys- TIPPING, M.E., AND BISHOP, C. M. 19g9. Mixtures of proba tem: A Technique for the Measurement of Facial Movement. bilistic principal component analyzers. Neural Computation 11 Consulting Psychologists Press VIOLA, P, AND JONES, M. 2001. Rapid object detection using a ESSA, I. BASU, S, DARRELL, T, AND PENTLAND, A. 1996 boosted cascade of simple features. In CVPR Modeling, tracking and interactive animation of faces and heads using input from video. In Proc. Computer Animation WEISE T.FIBE. B. AND GOOL I.v. 2008. Accurate and robust registration for in-hand modeling In CVPR FURUKAWA, Y, AND PONCE, J. 2009. Dense 3d motion capture WEISE. T. GOOL.L.V. AND PAULY. M. 2009. Face/off for human faces. In CVPR Live facial puppetry. In SCa GROCHOW, K. MARTIN. S. L. HERTZMANNA. AND WILLIAMS. L. 1990. Performance-driven facial animation In POPOVIC, Z. 2004. Sty le-based inverse kinematics. ACM Trans Comp Graph( Proc. SIGGRAPH 90) Graph.23,522-531 WilSON. C.A. GHOSH.A. Peers P. CHiANG.-Y. BUSCH GuenteR. B. GRIMM. C. WOOD. D. MALVAR H. AND J, AND DEBEVEC, P. 2010. Temporal upsampling of per- PIGHIN, F. 1993. Making faces. IEEE Computer Graphics formance geometry using photometric alignment ACM Trans and Applications 13, 6-8 Graph.29,17:1-17:11 IKEMOTO. L. ARIKANO. AND FORSYTH. D. 2009. General ZHANG, S, AND HUANG, P. 2004. High-resolution, real-Line 3d izing motion edits with gaussian processes. ACM Trans. graph shape acquisition. In CvPR Workshop 28.1:1-1:12 ZHANG, L, SNAVELY, N, CURLESs, B, AND SEITZ, S. M LAU, M. CHAL, J, XU, Y.Q., AND SHUM,H -Y. 2007. Face 2004. Spacetime faces: high resolution capture for modelin poser: interactive modeling of 3d facial expressions using model and animation. ACM Trans. Graph. 23, 548-558 riors. In sCa

...展开详情
试读 9P SIGGRAPH2011年论文 realtime performance-based facial animation
立即下载 低至0.43元/次 身份认证VIP会员低至7折
    一个资源只可评论一次,评论内容不能少于5个字
    pengshupan 有一定的参考价值
    2013-12-31
    回复
    sunyaqqhl 英文版的 基本内容有 后边参考文献什么的·没有 做论文翻译的别下 单纯看看可以
    2013-04-19
    回复
    ddfsfa siggragh文章,但是不全。
    2012-12-01
    回复
    关注 私信 TA的资源
    上传资源赚积分,得勋章
    最新推荐
    SIGGRAPH2011年论文 realtime performance-based facial animation 25积分/C币 立即下载
    1/9
    SIGGRAPH2011年论文 realtime performance-based facial animation第1页
    SIGGRAPH2011年论文 realtime performance-based facial animation第2页
    SIGGRAPH2011年论文 realtime performance-based facial animation第3页

    试读已结束,剩余6页未读...

    25积分/C币 立即下载 >