Supervisedsparsepatchcodingtowardsmisalignment-robustfacerecognition资源-CSDN文库

191 浏览量 2021-02-21 00:28:44 上传评论收藏 844KB PDF 举报

在当今信息技术迅猛发展的时代，人脸识别技术作为计算机视觉领域的重要分支，已经广泛应用于安全验证、身份识别以及人机交互等多个领域。然而，人脸识别面临着诸多挑战，其中空间错位问题尤为突出，它可能会因为拍摄角度、姿势变动等因素造成面部图像的局部失真，严重影响人脸识别的准确性。《Supervised sparse patch coding towards misalignment-robust face recognition》这篇研究论文提出了一种新的方法来解决这一问题。该论文首先对传统的人脸识别方法存在的局限性进行了分析。在传统的算法中，如果训练数据和测试数据都可能被空间错位污染，则识别效果会大打折扣。为了解决这一问题，论文提出了一种受监督的稀疏补丁编码框架，目的是实现一种对空间错位具有鲁棒性的人脸识别方法。在这一框架中，每个库中的人脸图像被表示为一系列原始位置和尺度的补丁，以及空间错位位置和尺度的补丁。对于每个给定的探测人脸图像，则被均匀地划分为一组局部补丁。然后，论文提出利用所有库图像的补丁稀疏重建每个探测图像补丁，并且为了强制选择的补丁的稀疏性，在重建所有探测图像补丁的同时进行正则化处理。此外，论文中提出了使用‘1-范数最小化’来获得补丁的重建系数，这些系数被用于融合各个补丁的主体信息，以此来识别探测人脸。通过这种方法，所构建的受监督稀疏编码框架为实现对空间错位具有鲁棒性的人脸识别提供了一种独特解决方案。这项工作的主要特点如下： 1. 无模型学习过程。该框架是无模型的，意味着无需复杂的模型学习过程，即可以直接应用于对错位敏感的人脸识别场景。 2. 对空间错位具有鲁棒性。由于采用了稀疏编码的方法，即便在测试过程中遇到了空间错位的情况，算法也能够有效工作。 3. 对图像遮挡具有鲁棒性。即使面部图像被遮挡，该方法也能够通过稀疏重建的方式抑制遮挡带来的负面影响。 4. 即使库图像存在空间错位，该方法仍然有效。这意味着即使在非理想条件下收集到的训练图像，此方法也能够适应并保持识别的准确性。为了验证该方法的有效性，研究者们在三个标准的人脸数据集上进行了广泛的人脸识别实验。结果显示，所提出的受监督稀疏编码框架显著优于传统的人脸识别方法。这一成果不仅推动了人脸识别技术的发展，也为解决其他类似的模式识别问题提供了借鉴。本篇研究论文所提出的方法，是当前学术界面对复杂人脸识别场景时，寻找有效技术突破的重要成果。通过对稀疏编码技术和受监督学习方法的创新结合，不仅增强了人脸识别系统的鲁棒性，还拓展了该技术在现实世界中的应用前景。随着这一领域的不断深入研究，我们有理由相信，未来的人脸识别技术将能够更加精准和安全地服务于社会的各个层面。

资源推荐

资源详情

资源评论

Supervised sparse patch coding towards misalignment-robust face recognition

Congyan Lang

⇑

, Songhe Feng

, Bin Chen

, Xiaotong Yuan

Department of Computer Science and Engineering, Beijing Jiaotong University, Beijing, China

Department of Electrical and Computer Engineering, National University of Singapore, Singapore

article info

Article history:

Available online 19 June 2012

Keywords:

Face recognition

Spatial misalignment

Image occlusions

Sparse coding

Misalignment robust

Supervised sparse coding

Dual sparsity

Collective sparse reconstructions

abstract

We address the challenging problem of face recognition under the scenarios where both training and test

data are possibly contaminated with spatial misalignments. A supervised sparse coding framework is

developed in this paper towards a practical solution to misalignment-robust face recognition. Each gal-

lery face image is represented as a set of patches, in both original and misaligned positions and scales, and

each given probe face image is then uniformly divided into a set of local patches. We propose to sparsely

reconstruct each probe image patch from the patches of all gallery images, and at the same time the

reconstructions for all patches of the probe image are regularized by one term towards enforcing sparsity

on the subjects of those selected patches. The derived reconstruction coefﬁcients by ‘

-norm minimiza-

tion are then utilized to fuse the subject information of the patches for identifying the probe face. Such a

supervised sparse coding framework provides a unique solution to face recognition with all (Here, we

emphasize ‘‘all’’ because some conventional algorithms for face recognition possess partial of these char-

acteristics.) the following four characteristics: (1) the solution is model-free, without the model learning

process, (2) the solution is robust to spatial misalignments, (3) the solution is robust to image occlusions,

and (4) the solution is effective even when there exist spatial misalignments for gallery images. Extensive

face recognition experiments on three benchmark face datasets demonstrate the advantages of the pro-

posed framework over holistic sparse coding and conventional subspace learning based algorithms in

terms of robustness to spatial misalignments and image occlusions.

1. Introduction

Face recognition has been motivated by both its scientiﬁc val-

ues and potential applications in the practice of computer vision

and machine learning. This problem has been extensively studied

and much progress has been achieved during the past decades.

As a standard preprocessing step for face recognition, face align-

ment and cropping are generally applied in automatic face recogni-

tion systems, and face images are typically aligned according to the

positions of corresponding eyes [10–12]. The main purpose of face

alignment is to build the semantic correspondences between the

pixels of different images and eventually to classify by matching

the pixels with identical semantic meaning.

Unfortunately, the images may not be accurately aligned, and

the pixels for the same facial landmarks may not be strictly

matched. Practical systems, or even manual face cropping, may

bring considerable image misalignments, including translations,

scaling, and rotation. These transformations can consequently

make discrepant the semantics of two pixels in different images

but at the same position. This discrepancy may inversely affect im-

age similarity measurement, and consequently degrade face recog-

nition performance. Thus it is a challenging problem to recognize

faces under scenarios with spatial misalignments, where the mar-

gins between subjects tend to be more ambiguous.

In the literature, there exist some attempts to analyze and

tackle this type of problems. Shan et al. [13] showed that the effect

of spatial misalignments can be alleviated to some extent by add-

ing virtual gallery samples with artiﬁcial spatial misalignments.

Yang et al. [20] proposed a solution to improve algorithmic robust-

ness to image misalignments with ubiquitously supervised sub-

space learning. Xu et al. [19] proposed a solution based on the

so-called Spatially constrained Earth Mover’s Distance (SEMD),

which is more robust against spatial misalignments than the tradi-

tional distance measures (e.g., Euclidean distance). Recently, Wang

et al. [16] provided a novel and efﬁcient algorithm for face recog-

nition under scenarios with spatial misalignments by solving a

constrained ‘

-norm optimization problem, which minimizes the

error between the misalignment-amended image and the image

reconstructed from the given subspace along with its principal

complementary subspace. However, the spatial misalignment

problem is still far from being solved, since: (1) most of these

methods focus on the global features of face images, yet typically,

http://dx.doi.org/10.1016/j.jvcir.2012.06.002

⇑

Corresponding author.

E-mail addresses: cylang@bjtu.edu.cn (C. Lang), shfeng@bjtu.edu.cn (S. Feng),

g0800415@nus.edu.sg (B. Chen), eleyuanx@nus.edu.sg (X. Yuan).

J. Vis. Commun. Image R. 24 (2013) 103–110

Contents lists available at SciVerse ScienceDirect

J. Vis. Commun. Image R.

journal homepage: www.elsevier.com/locate/jvci

the global features are much more sensitive to spatial misalign-

ments compared with local features; and (2) the patch-based

method [19] is proposed for misalignment-robust face recognition,

however, it is not robust to image occlusions.

For face recognition task, the face images generally need ﬁrst be

aligned and cropped out from the original images, which may con-

tain background objects, and one naive way is to ﬁx the locations of

two eyes in a ﬁxed-size image rectangle. For practical systems,

however, the positions of the two eyes may need be automatically

located by face alignment algorithms [5] or eye detectors [17],soit

is inevitable that there may exist localization errors, namely spatial

misalignments. These spatial misalignments include four compo-

nents, translations in horizontal and vertical directions (T

; T

scaling (S), and rotation (h). When spatial misalignment occurs,

the usage of global features typically leads to substantially differ-

ent data distribution compared with data without such spatial mis-

alignments. Fig. 1 shows such a demonstration, where the ﬁve

nearest neighbors of a misaligned face image are considerably dif-

ferent from those of well-aligned face image, if measured based on

Euclidean distance and with global features. This observation moti-

vates us to utilize orderless local patch based image representa-

tion, which is generally more robust to spatial misalignments

compared with global features.

Recent research shows that sparse coding appears to be biolog-

ically plausible as well as empirically effective for image process-

ing and pattern classiﬁcation [15,14]. Especially, Wright et al.

[18] exploited the classiﬁcation potentials of sparse representa-

tion/coding in face recognition problem. In [18], each probe image

is sparsely reconstructed from an over-complete dictionary, whose

bases are the gallery samples and bases for noises, by solving a

general ‘

-norm optimization problem. This solution is learning

free, and robust to image occlusions, it is however intuitively sen-

sitive to spatial misalignments [18].

Motivated by above observations, we present a supervised

sparse coding framework for face recognition under the scenarios

with possible spatial misalignments for both gallery and probe

images. As spatial misalignments often lead to large divergence

among images from the same subject, the global features, e.g., a

vector concatenating gray-level values of all pixels, may lack of en-

ough discriminating power for recognition purpose. Instead, if an

image is considered as a set of orderless local patches, then this

bag-of-patch representation shall be less sensitive to spatial misa-

lignments compared with global features. In this work, each gallery

image is partitioned into local patches at both original and misa-

ligned positions as well as scales. To mitigate the affect of noise

in extracting patches at misaligned positions and scales, we throw

Fig. 1. The neighboring samples comparison between the well-aligned and misaligned face images. It is observed that the neighboring samples may change substantially