Alocalregionbasedapproachtoliptracking资源-CSDN文库

62 浏览量 2021-02-09 11:41:25 上传评论收藏 763KB PDF 举报

本篇研究论文的标题为《基于局部区域的嘴唇追踪方法》，作者是Yiu-ming Cheung、Xin Liu和Xinge You。该论文发表于2012年的《模式识别》期刊，具体的内容概述和关键词涉及到嘴唇追踪技术的发展。嘴唇追踪技术是唇读系统中的关键技术，有着广泛的应用前景，比如在视听语音识别、唇读、面部表情分析、人机界面等领域。这篇论文提出了一种基于局部区域的嘴唇追踪方法，该方法包含两个阶段：首先是针对第一帧嘴唇图像进行嘴唇轮廓提取，随后是利用提取的嘴唇轮廓对后续帧进行追踪。该方法首先构建了一个局部化的颜色活动轮廓模型，假设目标的前景和背景在颜色空间内是局部不同的。在第一阶段，研究者通过查找唇部周围的半椭圆形曲线来初始化轮廓，并计算曲线演化的局部能量，以便将唇部图像分割成唇部和非唇部区域。然后，利用带有几何约束的16点可变形模型（Wang等人，2004年提出）来完成嘴唇轮廓的提取。在第二阶段，提出了一种基于先前帧提取的嘴唇轮廓的动态选择局部区域半径的方法来实现嘴唇追踪。所提出的方法不仅适应嘴唇运动，而且对于牙齿、舌头和黑洞的出现具有鲁棒性。广泛实验表明，与现有方法相比，所提出的嘴唇追踪算法具有高效性。为了详细说明，本篇论文的主要知识点可以展开如下： 1. 唇读技术的应用领域：论文提到唇读技术的潜在应用领域包括视听语音识别、面部表情分析和人机界面等。这些应用表明，唇读技术能够帮助改进人与计算机的交互方式，增强计算机理解人类通信的能力。 2. 局部区域嘴唇追踪方法的两阶段：研究者设计了一个新的追踪方法，该方法特别强调了嘴唇追踪的两个关键阶段，即嘴唇轮廓提取和随后的追踪。 3. 局部化颜色活动轮廓模型：这是一个用于初步嘴唇轮廓提取的关键组件。该模型的构建基于对前景和背景颜色差异的局部识别，这有助于提高轮廓提取的准确性。 4. 半椭圆形曲线的利用：研究者采用了半椭圆形曲线作为初始演化曲线，这是为了更好地适应唇部在图像中可能出现的形状。 5. 可变形模型的应用：论文中利用了一种带有几何约束的可变形模型来完成嘴唇轮廓的提取。该模型能够适应嘴唇的不同形态，提高轮廓提取的效果。 6. 动态选择局部区域半径：这是论文提出的一个创新点，通过根据前一帧提取的嘴唇轮廓动态选择局部区域的半径，来更精确地追踪嘴唇的运动。 7. 抗干扰性能：所提出的追踪方法对牙齿、舌头和黑洞等干扰因素具有鲁棒性，意味着它在实际应用中能够更好地处理复杂的嘴唇周围环境。 8. 实验验证：论文通过广泛的实验验证了所提方法与现有方法相比具有更高的效率。实验结果为研究提供了可信度，并为相关领域的研究和应用提供了参考。总结来说，这篇论文为嘴唇追踪技术的研究和应用做出了重要贡献。通过介绍一种新的追踪算法，它在提高追踪准确性的同时，也增强了算法的鲁棒性，为该领域内的研究者和开发者提供了新的视角和工具。

资源推荐

资源详情

资源评论

Author's personal copy

A local region based approach to lip tracking

Yiu-ming Cheung

, Xin Liu

, Xinge You

Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China

Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan, China

article info

Article history:

Received 11 May 2011

Received in revised form

9 February 2012

Accepted 20 February 2012

Available online 5 March 2012

Keywords:

Lip tracking

Localized color active contour model

Semi-ellipse

Local region

Deformable model

abstract

Lip tracking has played a signiﬁcant role in a lip reading system. In this paper, we present a local region

based approach to lip tracking, which consists of two phases: (i) lip contour extraction for the ﬁrst lip

frame, and followed by (ii) lip tracking in the subsequent lip frames. Initially, we construct a loc alized

color active color model provided that the foreground and background regions around the object are

locally different in color space. In th e ﬁrst phase, we ﬁnd a combined semi-ellipse around the lip as the

initial evolving curve and compute the localized energies for curve evolution such that the lip image is

separated into lip and non-lip regions. Then, we utilize a 16-point deformable model (Wang et al., 2004

[20]) with geometric constraint to achieve lip contour extraction. In the second phase, we present a

dynamic selection of the radius of local regions associated with the extracted lip contour of the

previous frame to realize lip tracking. The proposed approach not only adapts to the lip movement, but

it is also robust against the appearance of teeth, tongue and black hole. Extensive experiments show the

efﬁciency of the proposed lip tracking algorithm in comparison with the existing methods.

1. Introduction

Lip contour tracking (simply called lip tracking hereinafter)

has received wide attention in recent years because of its

potential applications in a variety of areas such as audio–visual

speech recognition (AVSR) [1], lip reading [2,3], facial expression

analysis [4], human computer interfaces [5] and so forth.

Although various visual tracking methods have been developed

in the literature, e.g., see [6], these methods are usually utilized to

track the object positions, which may not be suitable for deter-

mining the variations of the lip contours. In fact, it is a non-trivial

task to track the lip movements accurately due to its elastic shape

and non-rigid motion, the large variations caused by different

speakers, lighting conditions, low contrast between the lip and

skin, teeth or tongue effect, and so forth.

In the past years, a few techniques have been proposed

towards lip tracking with the focus on segmentation of lip regions

or extraction of lip contours, which can be roughly classiﬁed into

two categories: the edge-based approaches and the region-based

approaches. The former basically utilizes the low level spatial

cues such as edge and color information to track the lip move-

ment. For instance, Zhang et al. [7] applied hue and edge

information to achieve the mouth localization and segmentation.

Eveon et al. [8] detected six key points, through which the ﬁtting

shapes connecting these points were obtained according to the

edge information and color cues. In general, these two techniques

work well under a desired environment, but their performances

may deteriorate if the lips are glossy or their exists image noise.

Moreover, Kass et al. [9], Delmas et al. [10] and Freedman et al. [5]

introduced the applications of active contour model (ACM, i.e.,

snake) to detect the edge of the lip boundary via gradient descent

technique. Unfortunately, this type of active contours often

converge to the wrong result when the lip edges are indistinct

or the lip is very similar to the skin region. Subsequently, Barnard

et al. [11] integrated the edge-based ACM with 2D pattern

matching technique to drive the energy minimizing spline onto

the expected lip contours. However, such a method just employs a

combination of two semi-elliptical shapes to model the lip shape,

which may not ﬁt the actual lip boundary quite well.

In contrast, the region-based approaches mainly utilize the

regional statistic characteristics to realize lip tracking. Typical

examples include deformable template (DT) [12–14], region-

based ACM [15,16], active shape model (ASM) [17–19], and active

appearance model (AAM) [2]. The DT algorithm utilizes a regional

cost function to partition a lip image into the lip and non-lip

regions via a parametric template, which represents the lip shape

properly. The pioneering work introduced by Yuille [12] shows a

lip template speciﬁed by a set of parameters, and these para-

meters are altered via an energy minimizing process so that the

lip template can match the lip boundary gradually. Later, Liew

et al. [13] addressed a different lip template and extended Yuille’s

Contents lists available at SciVerse ScienceDirect

journal homepage: www.elsevier.com/locate/pr

Pattern Recognition

doi:10.1016/j.patcog.2012.02.024

Corresponding author. Tel.: þ852 34115155.

E-mail addresses: ymc@comp.hkbu.edu.hk (Y.-m. Cheung),

xliu@comp.hkbu.edu.hk (X. Liu), youxg@hust.edu.cn (X. You).

Pattern Recognition 45 (2012) 3336–3347

Author's personal copy

work by introducing a new cost function to realize lip contour

extraction in color images, while Tian et al. [14] utilized a

symmetrical DT to model the lip shape and formulated the color

distribution inside the closed mouth region as a Gaussian mixture

to regularize the DT. In general, the tracking performance of this

kind of methods will be degraded if a lip shape is evidently

irregular or when the mouth opens widely. The region-based ACM

algorithm featuring on minimizing a regional energy function

always outperforms the edge-based ACM for lip images with

weak edges or without edges. For instance, Chiou et al. [15]

modiﬁed the original ACM by adding eight radial vectors within

the lip region to regularize the active contours driving to the lip

boundary. Wakasugi et al. [16] applied the separability of regional

color intensity distributions with ACM to achieve lip contour

extraction. Nevertheless, it has been found that these methods

often suffer from the complex components in oral cavity and are

highly dependent on the parameter initialization. The ASM

approach adopts a set of landmark points to describe the lip

shape, and these points are controlled within a few modes

derived from a training data set. For example, Luettin et al. [17]

applied a set of manually labeled points with ASM to train the

possible lip shapes. Sum et al. [18] presented an optimization

procedure from a point-based model using ASM for extracting the

lip contours. Nguyen et al. [19] integrated multi-features of lip

regions with ASM to learn lip shapes. The AAM algorithm

proposed by Matthews et al. [2] is an extension of ASM algorithm

incorporating the eigenanalysis in gray-level case. Often, the ASM

and AAM are both quite laborious to establish a training data set

with manually cautious calibration and perform a training pro-

cess to determine the lip shapes. Meanwhile, these methods may

not be able to provide a good match to those lip shapes that are

quite distinct from the training data. It is therefore unsuitable for

the robust lip tracking applications from a practical viewpoint.

In recent years, lip image analysis in color space, e.g., CIELAB,

CIELUV and HSV, has received much attention as the color can

provide additional signiﬁcant information that is not available in

gray-level cases. Wang et al. [20] generated probability map of lip

region in color space via fuzzy clustering method incorporating

shape function (FCMS) and developed an iterative point-driven

optimization scheme to ﬁt the lip boundary based on pre-

generated probability map. Subsequently, Leung et al. [21] further

extended the above work with an elliptic shape function to

segment the lip region in color space. Similar and related works

can be found in [22,23]. It is found that this kind of methods can

signiﬁcantly simplify the detection and location of the lip regions.

Nevertheless, as the distributions of skin, tongue and lip may

overlap and diversify among different speakers, it may make such

a method inaccurate and unstable to achieve lip segmentation or

lip contour extraction, particularly in the case of mouth opening

widely. Meanwhile, the implementation of these methods often

suffers from the appearance of tongue or black hole as shown in

Fig. 1, although multiple pre-processing procedures can reduce

the teeth effect.

More recently, Eveno et al. [24] attempted to combine the

merits of the above-stated approaches and proposed a jumping

snake with a parametric model composed of four cubic curves to

achieve lip tracking. It is effective in most cases, but which is

highly dependent on pre-and-post-processing techniques and

adjustment process to make the model match the lip shape

appropriately. Differing from the above region-based approaches,

Jian et al. [25] addressed a modiﬁed attractor-guided particle

ﬁltering framework to track the lip contours. Unfortunately, such

a method needs to segment a set of representative lip contours

manually as the shape priors in advance. Furthermore, Ong et al.

[26] proposed a learnt data-driven approach via linear predictors

to track the lip movements, but which needs a data set composed

of different types of lip shapes in advance. Further, this method,

as well as the one in [25], involves the complicated iterative

learning to match the lip shape, whose computation is time-

consuming.

Thus far, almost all the region-based approaches involve the

globally statistical characteristics. Subsequently, their perfor-

mance may deteriorate upon the appearance of teeth, tongue or

black hole. Until very recently, when object in an image has

heterogeneous statistics or complex components, it is found that

the localized active contour model (LACM) [27], which utilizes the

local statistical characteristics, can generally achieve a better

segmentation result as shown in Figs. 2 and 3(d). Nevertheless,

this model highly depends on the appropriate selection of

correlative parameters. Often, the improper parameters, e.g.,

ulterior evolving curve with small local radius or proper evolving

curve with large local radius, could lead to erroneous extractions

as shown in Fig. 3(c). In addition, Ref. [27] does not consider the

prior knowledge about color information, which actually provides

more information to improve the extraction performance, espe-

cially when the images are shadowed, shaded and highlighted

[28,29].

In this paper, we present a local region based approach to lip

tracking with two phases: (i) lip contour extraction for the ﬁrst

lip frame, and followed by (ii) lip tracking in the subsequent lip

frames. Initially, we introduce a new kind of active contour

model, namely localized color active color model (LCACM), pro-

vided that the foreground and background regions around the

object are locally different in color space. In the ﬁrst phase, we

ﬁnd a combined semi-ellipse around the ﬁrst lip image as initial

evolving curve and compute the localized energies for curve

evolution such that the lip image is separated into lip and non-

lip regions. Then, we utilize a 16-point deformable model [20]

with geometric constraint to achieve lip contour extraction. In the

second phase, we present a dynamic selection of the radius of

local regions associated with the extracted lip contour of the

previous frame to realize lip tracking. The proposed approach is

adaptive to lip movement, and robust against the appearance of

Upper lip

lower lip

tougue

teeth

black hole

skin

Fig. 1. A lip region incorporated the appearance of teeth, tongue and black hole in

oral cavity.

interestin

ect

evolving curve

local interior

local exterior

Fig. 2. Graphical representation of the active contour model: (a) evolving curve

with diverging directions along the arrow; (b) the description of local interior and

local exterior region.

Y.-m. Cheung et al. / Pattern Recognition 45 (2012) 3336–3347 3337

剩余11页未读，继续阅读

评论收藏

内容反馈

weixin_38651786

粉丝: 7
资源: 915

A local region based approach to lip tracking

An improved region-based model with local statistical features for image segmentation

Motion planning and tracking control for an acrobot based on a rewinding approach

spatiograms versus histograms for region-based tracking

A Lip Sync Expert Is All You Need for Speech to Lip Generation

夏普lip-32u0402a 32GE220_POWER1.PDF

wav2lip高清虚拟数字人生成代码

wav2lip训练数据预处理综合工具.zip

Easy-Wav2Lip-v8.2.zip

一种基于LIP模型的新算法

人脸识别、行人ReID图像分割

唇形同步模型文件 wav2lip_gan.pth

LIPSERVICE

voice-and-lip-sync-in-pytorch-web-app-colab:Colab中的语音和口型同步Web应用

lip_lqr.slx.zip

Lipgloss

论文研究-基于LIP和RSC的超立方体网络单播容错路由算法.pdf

understanding lip sysn

an_introduction_to_wavelet_analysis

Aescripts Auto Lip-Sync 1.03

lip-datasets.zip

exp-schp-201908261155-lip.pth

lip_唇语识别_Eclipse_

Python-LipReading使用3D架构进行CrossAudioVisual识别

wav2lip checkpoint-path相关文件

脂肪酶测定SOP_LIP临床意义_检验科生化项目SOP.pdf

lip2wav-dataset

本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形，设置面部区域.zip

LipReading, LipReading读你的嘴唇 ！.zip

最新资源

LipReading, LipReading读你的嘴唇！.zip