DPM目标算法_dpm目标检测算法资源-CSDN文库

DPM算法

需积分: 50 160 浏览量 2014-05-07 14:56:58 上传评论收藏 860KB PDF 举报

资源推荐

资源详情

资源评论

A Discriminatively Trained, Multiscale, Deformable Part Model

Pedro Felzenszwalb

University of Chicago

pff@cs.uchicago.edu

David McAllester

Toyota Technological Institute at Chicago

mcallester@tti-c.org

Deva Ramanan

TTI-C and UC Irvine

dramanan@ics.uci.edu

Abstract

This paper describes a discriminatively trained, multi-

scale, deformable part model for object detection. Our sys-

tem achieves a two-fold improvement in average precision

over the best performance in the 2006 PASCAL person de-

tection challenge. It also outperforms the best results in the

2007 challenge in ten out of twenty categories. The system

relies heavily on deformable parts. While deformable part

models have become quite popular, their value had not been

demonstrated on difﬁcult benchmarks such as the PASCAL

challenge. Our system also relies heavily on new methods

for discriminative training. We combine a margin-sensitive

approach for data mining hard negative examples with a

formalism we call latent SVM. A latent SVM, like a hid-

den CRF, leads to a non-convex training problem. How-

ever, a latent SVM is semi-convex and the training prob-

lem becomes convex once latent information is speciﬁed for

the positive examples. We believe that our training meth-

ods will eventually make possible the effective use of more

latent information such as hierarchical (grammar) models

and models involving latent three dimensional pose.

1. Introduction

We consider the problem of detecting and localizing ob-

jects of a generic category, such as people or cars, in static

images. We have developed a new multiscale deformable

part model for solving this problem. The models are trained

using a discriminative procedure that only requires bound-

ing box labels for the positive examples. Using these mod-

els we implemented a detection system that is both highly

efﬁcient and accurate, processing an image in about 2 sec-

onds and achieving recognition rates that are signiﬁcantly

better than previous systems.

Our system achieves a two-fold improvement in average

precision over the winning system [5] in the 2006 PASCAL

person detection challenge. The system also outperforms

the best results in the 2007 challenge in ten out of twenty

object categories. Figure 1 shows an example detection ob-

tained with our person model.

Figure 1. Example detection obtained with the person model. The

model is deﬁned by a coarse template, several higher resolution

part templates and a spatial model for the location of each part.

The notion that objects can be modeled by parts in a de-

formable conﬁguration provides an elegant framework for

representing object categories [1–3, 6, 10, 12, 13,15,16, 22].

While these models are appealing from a conceptual point

of view, it has been difﬁcult to establish their value in prac-

tice. On difﬁcult datasets, deformable models are often out-

performed by “conceptually weaker” models such as rigid

templates [5] or bag-of-features [23]. One of our main goals

is to address this performance gap.

Our models include both a coarse global template cov-

ering an entire object and higher resolution part templates.

The templates represent histogram of gradient features [5].

As in [14, 19, 21], we train models discriminatively. How-

ever, our system is semi-supervised, trained with a max-

margin framework, and does not rely on feature detection.

We also describe a simple and effective strategy for learn-

ing parts from weakly-labeled data. In contrast to computa-

tionally demanding approaches such as [4], we can learn a

model in 3 hours on a single CPU.

Another contribution of our work is a new methodology

for discriminative training. We generalize SVMs for han-

dling latent variables such as part positions, and introduce a

new method for data mining “hard negative” examples dur-

ing training. We believe that handling partially labeled data

is a signiﬁcant issue in machine learning for computer vi-

sion. For example, the PASCAL dataset only speciﬁes a

bounding box for each positive example of an object. We

treat the position of each object part as a latent variable. We

also treat the exact location of the object as a latent vari-

able, requiring only that our classiﬁer select a window that

has large overlap with the labeled bounding box.

A latent SVM, like a hidden CRF [19], leads to a non-

convex training problem. However, unlike a hidden CRF,

a latent SVM is semi-convex and the training problem be-

comes convex once latent information is speciﬁed for the

positive training examples. This leads to a general coordi-

nate descent algorithm for latent SVMs.

System Overview Our system uses a scanning window

approach. A model for an object consists of a global “root”

ﬁlter and several part models. Each part model speciﬁes a

spatial model and a part ﬁlter. The spatial model deﬁnes a

set of allowed placements for a part relative to a detection

window, and a deformation cost for each placement.

The score of a detection window is the score of the root

ﬁlter on the window plus the sum over parts, of the maxi-

mum over placements of that part, of the part ﬁlter score on

the resulting subwindow minus the deformation cost. This

is similar to classical part-based models [10, 13]. Both root

and part ﬁlters are scored by computing the dot product be-

tween a set of weights and histogram of gradient (HOG)

features within a window. The root ﬁlter is equivalent to a

Dalal-Triggs model [5]. The features for the part ﬁlters are

computed at twice the spatial resolution of the root ﬁlter.

Our model is deﬁned at a ﬁxed scale, and we detect objects

by searching over an image pyramid.

In training we are given a set of images annotated with

bounding boxes around each instance of an object. We re-

duce the detection problem to a binary classiﬁcation prob-

lem. Each example x is scored by a function of the form,

(x) = max

β · Φ(x, z). Here β is a vector of model pa-

rameters and z are latent values (e.g. the part placements).

To learn a model we deﬁne a generalization of SVMs that

we call latent variable SVM (LSVM). An important prop-

erty of LSVMs is that the training problem becomes convex

if we ﬁx the latent values for positive examples. This can

be used in a coordinate descent algorithm.

In practice we iteratively apply classical SVM training to

triples (!x

", . . ., !x

") where z

is selected

to be the best scoring latent label for x

under the model

learned in the previous iteration. An initial root ﬁlter is

generated from the bounding boxes in the PASCAL dataset.

The parts are initialized from this root ﬁlter.

2. Model

The underlying building blocks for our models are the

Histogram of Oriented Gradient (HOG) features from [5].

We represent HOG features at two different scales. Coarse

features are captured by a rigid template covering an entire

detection window. Finer scale features are captured by part

Image pyramid HOG feature pyramid

Figure 2. The HOG feature pyramid and an object hypothesis de-

ﬁned in terms of a placement of the root ﬁlter (near the top of the

pyramid) and the part ﬁlters (near the bottom of the pyramid).

templates that can be moved with respect to the detection

window. The spatial model for the part locations is equiv-

alent to a star graph or 1-fan [3] where the coarse template

serves as a reference position.

2.1. HOG Representation

We follow the construction in [5] to deﬁne a dense repre-

sentation of an image at a particular resolution. The image

is ﬁrst divided into 8x8 non-overlapping pixel regions, or

cells. For each cell we accumulate a 1D histogram of gra-

dient orientations over pixels in that cell. These histograms

capture local shape properties but are also somewhat invari-

ant to small deformations.

The gradient at each pixel is discretized into one of nine

orientation bins, and each pixel “votes” for the orientation

of its gradient, with a strength that depends on the gradient

magnitude at that pixel. For color images, we compute the

gradient of each color channel and pick the channel with

highest gradient magnitude at each pixel. Finally, the his-

togram of each cell is normalized with respect to the gra-

dient energy in a neighborhood around it. We look at the

four 2 × 2 blocks of cells that contain a particular cell and

normalize the histogram of the given cell with respect to the

total energy in each of these blocks. This leads to a 9 × 4

dimensional vector representing the local gradient informa-

tion inside a cell.

We deﬁne a HOG feature pyramid by computing HOG

features of each level of a standard image pyramid (see Fig-

ure 2). Features at the top of this pyramid capture coarse

gradients histogrammed over fairly large areas of the input

image while features at the bottom of the pyramid capture

ﬁner gradients histogrammed over small areas.

剩余7页未读，继续阅读

评论收藏

内容反馈

luxiankao

粉丝: 0
资源: 8

DPM目标算法

DPM目标检测

DPM检测算法

DPM算法实现源码，环境是matlab2014下

DPM算法Windows环境

DPM目标检测训练关键代码

利用Opencv3.2中的DPM模块，实现DPM算法对行人的检测，一个完整的实现，含有注释和完整的工程文件，环境为win7+vs2013

DPM算法实现：voc-release5(Win7+matlab2016b可运行

DPM算法实现的行人检测

DPM 检测算法 FFDL 中 FFT 运算的简单解释

目标检测的DPM模型在windows下的实现（matlab)

目标检测DPM模型- matlab win32/64版本

DPM 目标检测 4.01（Windows写）

基于DPM与神经网络后验的人脸关键点检测算法.pdf

DPM算法源码在windows下matlab中使用的修改版

windows下可直接运行的DPM代码

DPM voc-release5 编译好的mexw64文件

windows下matlab训练DPM

omp算法matlab代码-dpm-voc-5-windows-matlab:dpm-voc-5-windows-matlab

DPM原文及原码

DPM模型（行人检测xml文件）

cascade检测器+DPM-C++实现

omp算法matlab代码-revised_DPM:修订版_DPM

DPM.zip_数值算法/人工智能_Visual_C++_

DPM的部件距离算法(在Ubuntu12 ia86下)

基于随机模型的适应性DPM算法在Linux下的实现.pdf

MOT16-det-dpm-raw.zip_DPM_RAW_dpm raw_python_目标跟踪

最新资源