dlib_face.zip资源-CSDN文库

共23个文件

tlog：6个

log：2个

pdb：2个

4星 · 超过85%的资源需积分: 10 85 浏览量 2016-12-21 10:38:59 上传评论 1 收藏 27.08MB ZIP 举报

【dlib_face.zip】是一个包含使用dlib库进行人脸识别和对齐的项目，适用于Visual Studio 2013。dlib是一个强大的C++工具包，主要用于机器学习和图像处理任务，由戴维·马库斯（David Kirkpatrick）开发。在本项目中，dlib的面部识别功能被用于检测和对齐人脸，这是计算机视觉领域中的常见应用场景，如人脸识别、表情分析和3D建模。 dlib库中的关键组件是它的机器学习算法，尤其是支持向量机（SVM）和深度学习模型。在这个项目中，可能使用了预先训练好的HOG（Histogram of Oriented Gradients）特征分类器或预训练的深度学习模型，如深度多尺度特征映射（Dlib's landmark detection model），来进行面部检测。HOG特征是一种描述图像局部形状的统计方法，而Dlib的深度学习模型则能更准确地检测和定位人脸的关键点，例如眼睛、鼻子和嘴巴的位置。在VS2013中，项目可能包括了以下文件结构： 1. 源代码文件（cpp或h）：包含主程序逻辑，调用dlib的API进行面部检测和对齐。 2. 预处理模型文件（.dat）：这是预先训练好的面部关键点检测模型，用于找到面部特征点。 3. 示例图像文件：可能包含一些示例图片，用于测试程序。 4. Makefile或项目设置文件：用于编译和链接dlib库以及其他依赖项。使用步骤通常包括： 1. 加载图像：读取输入图像，这可能使用了dlib的image_loader类。 2. 面部检测：应用预训练的面部检测器（如HOG或深度学习模型）来找到图像中的脸部。 3. 关键点定位：对每个检测到的人脸，使用预训练的关键点检测器找到5个或68个特征点。 4. 对齐：根据检测到的关键点，使用几何变换（如仿射变换）将所有人脸对齐到一个标准坐标系，以便于后续分析。此项目对于开发者来说，提供了实践dlib库以及了解人脸识别和对齐流程的良好机会。同时，由于它能在VS2013上直接运行，所以对于Windows环境下的C++开发者来说，是个方便的起点。通过这个项目，开发者可以学习如何集成dlib到自己的项目中，以及如何利用其强大的机器学习功能来解决实际问题。此外，dlib的高效性和灵活性使其在学术研究和工业应用中都备受青睐。

资源推荐

资源详情

资源评论

收起资源包目录

dlib_face.zip （23个子文件）

dlib_face

KazemiCVPR14.pdf 4.81MB

dlib_face.sln 973B

Release

dlib_face.exe 844KB

dlib_face.pdb 6.39MB

dlib_face

2008_007676.jpg 107KB

dlib_face.vcxproj.filters 950B

dlib_face.vcxproj.user 165B

dlib_face.cpp 6KB

Release

dlib_face.obj 12.35MB

vc120.pdb 5.07MB

dlib_face.log 2KB

dlib_face.Build.CppClean.log 147B

dlib_face.tlog

cl.command.1.tlog 702B

cl.read.1.tlog 88KB

link.read.1.tlog 3KB

link.write.1.tlog 360B

link.command.1.tlog 1KB

cl.write.1.tlog 376B

dlib_face.lastbuildstate 168B

dlib.lib 23.22MB

dlib_face.vcxproj 4KB

dlib_face.sdf 47.81MB

dlib_face.v12.suo 20KB

One Millisecond Face Alignment with an Ensemble of Regression Trees

Vahid Kazemi and Josephine Sullivan

KTH, Royal Institute of Technology

Computer Vision and Active Perception Lab

Teknikringen 14, Stockholm, Sweden

{vahidk,sullivan}@csc.kth.se

Abstract

This paper addresses the problem of Face Alignment for

a single image. We show how an ensemble of regression

trees can be used to estimate the face’s landmark positions

directly from a sparse subset of pixel intensities, achieving

super-realtime performance with high quality predictions.

We present a general framework based on gradient boosting

for learning an ensemble of regression trees that optimizes

the sum of square error loss and naturally handles missing

or partially labelled data. We show how using appropriate

priors exploiting the structure of image data helps with ef-

ﬁcient feature selection. Different regularization strategies

and its importance to combat overﬁtting are also investi-

gated. In addition, we analyse the effect of the quantity of

training data on the accuracy of the predictions and explore

the effect of data augmentation using synthesized data.

1. Introduction

In this paper we present a new algorithm that performs

face alignment in milliseconds and achieves accuracy supe-

rior or comparable to state-of-the-art methods on standard

datasets. The speed gains over previous methods is a con-

sequence of identifying the essential components of prior

face alignment algorithms and then incorporating them in

a streamlined formulation into a cascade of high capacity

regression functions learnt via gradient boosting.

We show, as others have [8, 2], that face alignment can

be solved with a cascade of regression functions. In our case

each regression function in the cascade efﬁciently estimates

the shape from an initial estimate and the intensities of a

sparse set of pixels indexed relative to this initial estimate.

Our work builds on the large amount of research over the

last decade that has resulted in signiﬁcant progress for face

alignment [9, 4, 13, 7, 15, 1, 16, 18, 3, 6, 19]. In particular,

we incorporate into our learnt regression functions two key

elements that are present in several of the successful algo-

rithms cited and we detail these elements now.

Figure 1. Selected results on the HELEN dataset. An ensemble

of randomized regression trees is used to detect 194 landmarks on

face from a single image in a millisecond.

The ﬁrst revolves around the indexing of pixel intensi-

ties relative to the current estimate of the shape. The ex-

tracted features in the vector representation of a face image

can greatly vary due to both shape deformation and nui-

sance factors such as changes in illumination conditions.

This makes accurate shape estimation using these features

difﬁcult. The dilemma is that we need reliable features to

accurately predict the shape, and on the other hand we need

an accurate estimate of the shape to extract reliable features.

Previous work [4, 9, 5, 8] as well as this work, use an it-

erative approach (the cascade) to deal with this problem.

Instead of regressing the shape parameters based on fea-

tures extracted in the global coordinate system of the image,

the image is transformed to a normalized coordinate system

based on a current estimate of the shape, and then the fea-

tures are extracted to predict an update vector for the shape

parameters. This process is usually repeated several times

until convergence.

The second considers how to combat the difﬁculty of the

inference/prediction problem. At test time, an alignment al-

gorithm has to estimate the shape, a high dimensional vec-

tor, that best agrees with the image data and our model of

shape. The problem is non-convex with many local optima.

Successful algorithms [4, 9] handle this problem by assum-

ing the estimated shape must lie in a linear subspace, which

can be discovered, for example, by ﬁnding the principal

components of the training shapes. This assumption greatly

reduces the number of potential shapes considered during

inference and can help to avoid local optima. Recent work

[8, 11, 2] uses the fact that a certain class of regressors are

guaranteed to produce predictions that lie in a linear sub-

space deﬁned by the training shapes and there is no need

for additional constraints. Crucially, our regression func-

tions have these two elements.

Allied to these two factors is our efﬁcient regression

function learning. We optimize an appropriate loss func-

tion and perform feature selection in a data-driven manner.

In particular, we learn each regressor via gradient boosting

[10] with a squared error loss function, the same loss func-

tion we want to minimize at test time. The sparse pixel set,

used as the regressor’s input, is selected via a combination

of the gradient boosting algorithm and a prior probability on

the distance between pairs of input pixels. The prior distri-

bution allows the boosting algorithm to efﬁciently explore

a large number of relevant features. The result is a cascade

of regressors that can localize the facial landmarks when

initialized with the mean face pose.

The major contributions of this paper are

1. A novel method for alignment based on ensemble of

regression trees that performs shape invariant feature

selection while minimizing the same loss function dur-

ing training time as we want to minimize at test time.

2. We present a natural extension of our method that han-

dles missing or uncertain labels.

3. Quantitative and qualitative results are presented that

conﬁrm that our method produces high quality predic-

tions while being much more efﬁcient than the best

previous method (Figure 1).

4. The effect of quantity of training data, use of partially

labeled data and synthesized data on quality of predic-

tions are analyzed.

2. Method

This paper presents an algorithm to precisely estimate

the position of facial landmarks in a computationally efﬁ-

cient way. Similar to previous works [8, 2] our proposed

method utilizes a cascade of regressors. In the rest of this

section we describe the details of the form of the individual

components of the cascade and how we perform training.

2.1. The cascade of regressors

To begin we introduce some notation. Let x

∈ R

the x, y-coordinates of the ith facial landmark in an image I.

Then the vector S = (x

, x

, . . . , x

)

∈ R

denotes the

coordinates of all the p facial landmarks in I. Frequently,

in this paper we refer to the vector S as the shape. We use

(t)

to denote our current estimate of S. Each regressor,

(·, ·), in the cascade predicts an update vector from the

image and

(t)

that is added to the current shape estimate

(t)

to improve the estimate:

(t+1)

(t)

+ r

(I,

(t)

) (1)

The critical point of the cascade is that the regressor r

makes its predictions based on features, such as pixel in-

tensity values, computed from I and indexed relative to the

current shape estimate

(t)

. This introduces some form of

geometric invariance into the process and as the cascade

proceeds one can be more certain that a precise semantic

location on the face is being indexed. Later we describe

how this indexing is performed.

Note that the range of outputs expanded by the ensemble

is ensured to lie in a linear subspace of training data if the

initial estimate

(0)

belongs to this space. We therefore do

not need to enforce additional constraints on the predictions

which greatly simpliﬁes our method. The initial shape can

simply be chosen as the mean shape of the training data

centered and scaled according to the bounding box output

of a generic face detector.

To train each r

we use the gradient tree boosting algo-

rithm with a sum of square error loss as described in [10].

We now give the explicit details of this process.

2.2. Learning each regressor in the cascade

Assume we have training data (I

, S

), . . . , (I

, S

)

where each I

is a face image and S

its shape vector.

To learn the ﬁrst regression function r

in the cascade we

create from our training data triplets of a face image, an

initial shape estimate and the target update step, that is,

(0)

, ∆S

(0)

) where

∈ {1, . . . , n} (2)

(0)

∈ {S

, . . . , S

}\S

and (3)

∆S

(0)

= S

−

(0)

(4)

for i = 1, . . . , N. We set the total number of these triplets to

N = nR where R is the number of initializations used per

image I

. Each initial shape estimate for an image is sam-

pled uniformly from {S

, . . . , S

} without replacement.

From this data we learn the regression function r

(see

algorithm 1), using gradient tree boosting with a sum of

square error loss. The set of training triplets is then updated

to provide the training data, (I

(1)

, ∆S

(1)

), for the next

regressor r

in the cascade by setting (with t = 0)

(t+1)

(t)

+ r

(t)

) (5)

∆S

(t+1)

= S

−

(t+1)

(6)

This process is iterated until a cascade of T regressors

, r

, . . . , r

T −1

are learnt which when combined give a

sufﬁcient level of accuracy.

As stated each regressor r

is learned using the gradi-

ent boosting tree algorithm. It should be remembered that

a square error loss is used and the residuals computed in

the innermost loop correspond to the gradient of this loss

function evaluated at each training sample. Included in

the statement of the algorithm is a learning rate parame-

ter 0 < ν ≤ 1 also known as the shrinkage factor. Set-

ting ν < 1 helps combat over-ﬁtting and usually results in

regressors which generalize much better than those learnt

with ν = 1 [10].

Algorithm 1 Learning r

in the cascade

Have training data {(I

(t)

, ∆S

(t)

)}

i=1

and the learning

rate (shrinkage factor) 0 < ν < 1

1. Initialise

(I,

(t)

) = arg min

γ∈R

i=1

k∆S

(t)

− γk

2. for k = 1, . . . , K:

(a) Set for i = 1, . . . , N

= ∆S

(t)

− f

k−1

(t)

)

(b) Fit a regression tree to the targets r

giving a weak

regression function g

(I,

(t)

(I,

(t)

) = f

k−1

(I,

(t)

) + ν g

(I,

(t)

)

3. Output r

(I,

(t)

) = f

(I,

(t)

)

2.3. Tree based regressor

The core of each regression function r

is the tree based

regressors ﬁt to the residual targets during the gradient

boosting algorithm. We now review the most important im-

plementation details for training each regression tree.

2.3.1 Shape invariant split tests

At each split node in the regression tree we make a decision

based on thresholding the difference between the intensities

of two pixels. The pixels used in the test are at positions u

and v when deﬁned in the coordinate system of the mean

shape. For a face image with an arbitrary shape, we would

like to index the points that have the same position rela-

tive to its shape as u and v have to the mean shape. To

achieve this, the image can be warped to the mean shape

based on the current shape estimate before extracting the

features. Since we only use a very sparse representation of

the image, it is much more efﬁcient to warp the location

of points as opposed to the whole image. Furthermore, a

crude approximation of warping can be done using only a

global similarity transform in addition to local translations

as suggested by [2].

The precise details are as follows. Let k

be the index

of the facial landmark in the mean shape that is closest to u

and deﬁne its offset from u as

δx

= u −

Then for a shape S

deﬁned in image I

, the position in I

that is qualitatively similar to u in the mean shape image is

given by

= x

i,k

δx

(7)

where s

and R

are the scale and rotation matrix of the sim-

ilarity transform which transforms S

S, the mean shape.

The scale and rotation are found to minimize

j=1

− (s

i,j

+ t

(8)

the sum of squares between the mean shape’s facial land-

mark points,

’s, and those of the warped shape. v

is sim-

ilarly deﬁned. Formally each split is a decision involving 3

parameters θ = (τ, u, v) and is applied to each training and

test example as

h(I

(t)

, θ) =

(

1 I

) − I

) > τ

0 otherwise

(9)

where u

and v

are deﬁned using the scale and rotation

matrix which best warp

(t)

S according to equation (7).

In practice the assignments and local translations are de-

termined during the training phase. Calculating the similar-

ity transform, at test time the most computationally expen-

sive part of this process, is only done once at each level of

the cascade.

评论收藏

内容反馈

芯光智能

2020-01-13

官网上的东西
fhgogo

2018-03-17

官网上的东西！
AllyLi0224

2018-08-13

有参考价值

watersink

粉丝: 5531
资源: 85

dlib_face.zip

dlib_face_recognition_resnet_model_v1.dat-free.zip

dlib_face_recognition_resnet_model_v1.zip

dlib_face_recognition_resnet_model_v1.dat.zip

dlib_face_recognition_resnet_model_v1.dat.zip_ResNet_dlib_face r

dlib_face_recognition.zip

shape_predictor_68_face_landmarks_for_dlib.zip

shape_predictor_68_face_landmarks.zip

dlib-19.17.99-cp37-cp37m-win_amd64.zip

shape_predictor_68_face_landmarks.dat.zip 人脸识别68个特征点检测数据库

Dlib_FaceLandmark_Detector_1.3.0.unitypackage.zip

Dlib_face_cut-master.zip

cnn_face_recogntion.zip

基于dlib的人脸检测模型.zip

Python_Dlib_Face_Recognition.zip

Dlib_face_recognition_from_camera-master.zip

face_recongnition.zip

camera_test_sys.zip

pycdc、pycdas工具(最新2024.06.04编译)，Python3.9-3.12可用的反编译工具(exe转py)

AFSim软件全套工具集下载

编译器（gcc、g++）

C/C++中文参考手册离线最新版

Qt （高仿Visio）流程图组件开发，源码分享

Qt5.9 C++开发指南.pdf 及示例源码

C/C++中文帮助文档

mingw-w64-install.exe

Qt、QCustomPlot、实时波形绘制、实时曲线绘制

GitKrakenSetup-6.5.1 版本，包括win和linux

QT7.0.2，2022.05最新版本，包含openssl1.1.1和WebEngine等

2023蓝桥杯C++A组省赛真题

最新资源