dlib_face.zip资源-CSDN文库

共23个文件

tlog：6个

log：2个

pdb：2个

人脸对齐

4星 · 超过85%的资源需积分: 10 36 浏览量 2016-12-21 10:38:59 上传评论 1 收藏 27.08MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

dlib_face.zip （23个子文件）

dlib_face

KazemiCVPR14.pdf 4.81MB

dlib_face.sln 973B

Release

dlib_face.exe 844KB

dlib_face.pdb 6.39MB

dlib_face

2008_007676.jpg 107KB

dlib_face.vcxproj.filters 950B

dlib_face.vcxproj.user 165B

dlib_face.cpp 6KB

Release

dlib_face.obj 12.35MB

vc120.pdb 5.07MB

dlib_face.log 2KB

dlib_face.Build.CppClean.log 147B

dlib_face.tlog

cl.command.1.tlog 702B

cl.read.1.tlog 88KB

link.read.1.tlog 3KB

link.write.1.tlog 360B

link.command.1.tlog 1KB

cl.write.1.tlog 376B

dlib_face.lastbuildstate 168B

dlib.lib 23.22MB

dlib_face.vcxproj 4KB

dlib_face.sdf 47.81MB

dlib_face.v12.suo 20KB

One Millisecond Face Alignment with an Ensemble of Regression Trees

Vahid Kazemi and Josephine Sullivan

KTH, Royal Institute of Technology

Computer Vision and Active Perception Lab

Teknikringen 14, Stockholm, Sweden

{vahidk,sullivan}@csc.kth.se

Abstract

This paper addresses the problem of Face Alignment for

a single image. We show how an ensemble of regression

trees can be used to estimate the face’s landmark positions

directly from a sparse subset of pixel intensities, achieving

super-realtime performance with high quality predictions.

We present a general framework based on gradient boosting

for learning an ensemble of regression trees that optimizes

the sum of square error loss and naturally handles missing

or partially labelled data. We show how using appropriate

priors exploiting the structure of image data helps with ef-

ﬁcient feature selection. Different regularization strategies

and its importance to combat overﬁtting are also investi-

gated. In addition, we analyse the effect of the quantity of

training data on the accuracy of the predictions and explore

the effect of data augmentation using synthesized data.

1. Introduction

In this paper we present a new algorithm that performs

face alignment in milliseconds and achieves accuracy supe-

rior or comparable to state-of-the-art methods on standard

datasets. The speed gains over previous methods is a con-

sequence of identifying the essential components of prior

face alignment algorithms and then incorporating them in

a streamlined formulation into a cascade of high capacity

regression functions learnt via gradient boosting.

We show, as others have [8, 2], that face alignment can

be solved with a cascade of regression functions. In our case

each regression function in the cascade efﬁciently estimates

the shape from an initial estimate and the intensities of a

sparse set of pixels indexed relative to this initial estimate.

Our work builds on the large amount of research over the

last decade that has resulted in signiﬁcant progress for face

alignment [9, 4, 13, 7, 15, 1, 16, 18, 3, 6, 19]. In particular,

we incorporate into our learnt regression functions two key

elements that are present in several of the successful algo-

rithms cited and we detail these elements now.

Figure 1. Selected results on the HELEN dataset. An ensemble

of randomized regression trees is used to detect 194 landmarks on

face from a single image in a millisecond.

The ﬁrst revolves around the indexing of pixel intensi-

ties relative to the current estimate of the shape. The ex-

tracted features in the vector representation of a face image

can greatly vary due to both shape deformation and nui-

sance factors such as changes in illumination conditions.

This makes accurate shape estimation using these features

difﬁcult. The dilemma is that we need reliable features to

accurately predict the shape, and on the other hand we need

an accurate estimate of the shape to extract reliable features.

Previous work [4, 9, 5, 8] as well as this work, use an it-

erative approach (the cascade) to deal with this problem.

Instead of regressing the shape parameters based on fea-

tures extracted in the global coordinate system of the image,

the image is transformed to a normalized coordinate system

based on a current estimate of the shape, and then the fea-

tures are extracted to predict an update vector for the shape

parameters. This process is usually repeated several times

until convergence.

The second considers how to combat the difﬁculty of the

inference/prediction problem. At test time, an alignment al-

gorithm has to estimate the shape, a high dimensional vec-

tor, that best agrees with the image data and our model of

shape. The problem is non-convex with many local optima.

Successful algorithms [4, 9] handle this problem by assum-

ing the estimated shape must lie in a linear subspace, which

can be discovered, for example, by ﬁnding the principal

components of the training shapes. This assumption greatly

reduces the number of potential shapes considered during

inference and can help to avoid local optima. Recent work

[8, 11, 2] uses the fact that a certain class of regressors are

guaranteed to produce predictions that lie in a linear sub-

space deﬁned by the training shapes and there is no need

for additional constraints. Crucially, our regression func-

tions have these two elements.

Allied to these two factors is our efﬁcient regression

function learning. We optimize an appropriate loss func-

tion and perform feature selection in a data-driven manner.

In particular, we learn each regressor via gradient boosting

[10] with a squared error loss function, the same loss func-

tion we want to minimize at test time. The sparse pixel set,

used as the regressor’s input, is selected via a combination

of the gradient boosting algorithm and a prior probability on

the distance between pairs of input pixels. The prior distri-

bution allows the boosting algorithm to efﬁciently explore

a large number of relevant features. The result is a cascade

of regressors that can localize the facial landmarks when

initialized with the mean face pose.

The major contributions of this paper are

1. A novel method for alignment based on ensemble of

regression trees that performs shape invariant feature

selection while minimizing the same loss function dur-

ing training time as we want to minimize at test time.

2. We present a natural extension of our method that han-

dles missing or uncertain labels.

3. Quantitative and qualitative results are presented that

conﬁrm that our method produces high quality predic-

tions while being much more efﬁcient than the best

previous method (Figure 1).

4. The effect of quantity of training data, use of partially

labeled data and synthesized data on quality of predic-

tions are analyzed.

2. Method

This paper presents an algorithm to precisely estimate

the position of facial landmarks in a computationally efﬁ-

cient way. Similar to previous works [8, 2] our proposed

method utilizes a cascade of regressors. In the rest of this

section we describe the details of the form of the individual

components of the cascade and how we perform training.

2.1. The cascade of regressors

To begin we introduce some notation. Let x

∈ R

the x, y-coordinates of the ith facial landmark in an image I.

Then the vector S = (x

, x

, . . . , x

)

∈ R

denotes the

coordinates of all the p facial landmarks in I. Frequently,

in this paper we refer to the vector S as the shape. We use

(t)

to denote our current estimate of S. Each regressor,

(·, ·), in the cascade predicts an update vector from the

image and

(t)

that is added to the current shape estimate

(t)

to improve the estimate:

(t+1)

(t)

+ r

(I,

(t)

) (1)

The critical point of the cascade is that the regressor r

makes its predictions based on features, such as pixel in-

tensity values, computed from I and indexed relative to the

current shape estimate

(t)

. This introduces some form of

geometric invariance into the process and as the cascade

proceeds one can be more certain that a precise semantic

location on the face is being indexed. Later we describe

how this indexing is performed.

Note that the range of outputs expanded by the ensemble

is ensured to lie in a linear subspace of training data if the

initial estimate

(0)

belongs to this space. We therefore do

not need to enforce additional constraints on the predictions

which greatly simpliﬁes our method. The initial shape can

simply be chosen as the mean shape of the training data

centered and scaled according to the bounding box output

of a generic face detector.

To train each r

we use the gradient tree boosting algo-

rithm with a sum of square error loss as described in [10].

We now give the explicit details of this process.

2.2. Learning each regressor in the cascade

Assume we have training data (I

, S

), . . . , (I

, S

)

where each I

is a face image and S

its shape vector.

To learn the ﬁrst regression function r

in the cascade we

create from our training data triplets of a face image, an

initial shape estimate and the target update step, that is,

(0)

, ∆S

(0)

) where

∈ {1, . . . , n} (2)

(0)

∈ {S

, . . . , S

}\S

and (3)

∆S

(0)

= S

−

(0)

(4)

for i = 1, . . . , N. We set the total number of these triplets to

N = nR where R is the number of initializations used per

image I

. Each initial shape estimate for an image is sam-

pled uniformly from {S

, . . . , S

} without replacement.

From this data we learn the regression function r

(see

algorithm 1), using gradient tree boosting with a sum of

square error loss. The set of training triplets is then updated

to provide the training data, (I

(1)

, ∆S

(1)

), for the next

regressor r

in the cascade by setting (with t = 0)

(t+1)

(t)

+ r

(t)

) (5)

∆S

(t+1)

= S

−

(t+1)

(6)

This process is iterated until a cascade of T regressors

, r

, . . . , r

T −1

are learnt which when combined give a

sufﬁcient level of accuracy.

As stated each regressor r

is learned using the gradi-

ent boosting tree algorithm. It should be remembered that

a square error loss is used and the residuals computed in

the innermost loop correspond to the gradient of this loss

function evaluated at each training sample. Included in

the statement of the algorithm is a learning rate parame-

ter 0 < ν ≤ 1 also known as the shrinkage factor. Set-

ting ν < 1 helps combat over-ﬁtting and usually results in

regressors which generalize much better than those learnt

with ν = 1 [10].

Algorithm 1 Learning r

in the cascade

Have training data {(I

(t)

, ∆S

(t)

)}

i=1

and the learning

rate (shrinkage factor) 0 < ν < 1

1. Initialise

(I,

(t)

) = arg min

γ∈R

i=1

k∆S

(t)

− γk

2. for k = 1, . . . , K:

(a) Set for i = 1, . . . , N

= ∆S

(t)

− f

k−1

(t)

)

(b) Fit a regression tree to the targets r

giving a weak

regression function g

(I,

(t)

(I,

(t)

) = f

k−1

(I,

(t)

) + ν g

(I,

(t)

)

3. Output r

(I,

(t)

) = f

(I,

(t)

)

2.3. Tree based regressor

The core of each regression function r

is the tree based

regressors ﬁt to the residual targets during the gradient

boosting algorithm. We now review the most important im-

plementation details for training each regression tree.

2.3.1 Shape invariant split tests

At each split node in the regression tree we make a decision

based on thresholding the difference between the intensities

of two pixels. The pixels used in the test are at positions u

and v when deﬁned in the coordinate system of the mean

shape. For a face image with an arbitrary shape, we would

like to index the points that have the same position rela-

tive to its shape as u and v have to the mean shape. To

achieve this, the image can be warped to the mean shape

based on the current shape estimate before extracting the

features. Since we only use a very sparse representation of

the image, it is much more efﬁcient to warp the location

of points as opposed to the whole image. Furthermore, a

crude approximation of warping can be done using only a

global similarity transform in addition to local translations

as suggested by [2].

The precise details are as follows. Let k

be the index

of the facial landmark in the mean shape that is closest to u

and deﬁne its offset from u as

δx

= u −

Then for a shape S

deﬁned in image I

, the position in I

that is qualitatively similar to u in the mean shape image is

given by

= x

i,k

δx

(7)

where s

and R

are the scale and rotation matrix of the sim-

ilarity transform which transforms S

S, the mean shape.

The scale and rotation are found to minimize

j=1

− (s

i,j

+ t

(8)

the sum of squares between the mean shape’s facial land-

mark points,

’s, and those of the warped shape. v

is sim-

ilarly deﬁned. Formally each split is a decision involving 3

parameters θ = (τ, u, v) and is applied to each training and

test example as

h(I

(t)

, θ) =

(

1 I

) − I

) > τ

0 otherwise

(9)

where u

and v

are deﬁned using the scale and rotation

matrix which best warp

(t)

S according to equation (7).

In practice the assignments and local translations are de-

termined during the training phase. Calculating the similar-

ity transform, at test time the most computationally expen-

sive part of this process, is only done once at each level of

the cascade.

评论收藏

内容反馈

芯光智能

2020-01-13

官网上的东西
fhgogo

2018-03-17

官网上的东西！
AllyLi0224

2018-08-13

有参考价值

watersink

粉丝: 5155
资源: 87

dlib_face.zip

dlib_face_recognition_resnet_model_v1.dat-free.zip

face.zip

人脸识别用的dlib包.zip

dlib_face_recognition.zip

dlib安装包.zip

face_recognition之dlib

dlib_face_recognition_resnet_model_v1.zip

dlib_face_recognition_resnet_model_v1.dat.zip

dlib_face_recognition_resnet_model_v1.dat.zip_ResNet_dlib_face r

shape_predictor_68_face_landmarks_for_dlib.zip

dlib_face_recognition-master.zip

dlib-19.7.zip

dlib-face-detector-face-alignment

dlib_Model.zip

dlib-19.19.zip

shape_predictor_68_face_landmarks.zip

dlib-19.17.99-cp37-cp37m-win_amd64.zip

shape_predictor_68_face_landmarks.dat.zip 人脸识别68个特征点检测数据库

Dlib_FaceLandmark_Detector_1.3.0.unitypackage.zip

Dlib_face_cut-master.zip

dlib-19.17.zip

dlib-19.18.zip

dlib-19.13.zip

dlib-master.zip_c dlib_deeplearning_dlib_dlib master_dlib master

dlib_whl包.zip

cnn_face_recogntion.zip

基于dlib的人脸检测模型.zip

Python_Dlib_Face_Recognition.zip

Dlib_face_recognition_from_camera-master.zip

最新资源