End-To-End_People_Detection_CVPR_2016

End-To-End

需积分: 10 158 浏览量 2016-10-25 08:59:11 上传评论收藏 1.09MB PDF 举报

资源推荐

资源详情

资源评论

End-to-end people detection in crowded scenes

Russell Stewart

, Mykhaylo Andriluka

1,2

, and Andrew Y. Ng

Stanford University, USA

Max Planck Institute for Informatics, Germany

Abstract

Current people detectors operate either by scanning an

image in a sliding window fashion or by classifying a dis-

crete set of proposals. We propose a model that is based

on decoding an image into a set of people detections. Our

system takes an image as input and directly outputs a set of

distinct detection hypotheses. Because we generate predic-

tions jointly, common post-processing steps such as non-

maximum suppression are unnecessary. We use a recur-

rent LSTM layer for sequence generation and train our

model end-to-end with a new loss function that operates

on sets of detections. We demonstrate the effectiveness of

our approach on the challenging task of detecting people in

crowded scenes

1. Introduction

In this paper we propose a new architecture for detecting

objects in images. We strive for an end-to-end approach that

accepts images as input and directly generates a set of object

bounding boxes as output. This task is challenging because

it demands both distinguishing objects from the background

and correctly estimating the number of distinct objects and

their locations. Such an end-to-end approach capable of di-

rectly outputting predictions would be advantageous over

methods that ﬁrst generate a set of bounding boxes, evalu-

ate them with a classiﬁer, and then perform some form of

merging or non-maximum suppression on an overcomplete

set of detections.

Sequentially generating a set of detections has an im-

portant advantage in that multiple detections on the same

object can be avoided by remembering the previously gen-

erated output. To control this generation process, we use

a recurrent neural network with LSTM units. To produce

intermediate representations, we use expressive image fea-

The implementation is publicly available at

https://github.

com/Russell91/ReInspect

tures from GoogLeNet that are further ﬁne-tuned as part of

our system. Our architecture can thus be seen as a “decod-

ing” process that converts an intermediate representation of

an image into a set of predicted objects. The LSTM can be

seen as a “controller” that propagates information between

decoding steps and controls the location of the next out-

put (see Fig.

2 for an overview). Importantly, our trainable

end-to-end system allows joint tuning of all components via

back-propagation.

One of the key limitations of merging and non-maximum

suppression utilized in [

6, 17] is that these methods typ-

ically don’t have access to image information, and in-

stead must perform inference solely based on properties of

bounding boxes (e.g. distance and overlap). This usually

works for isolated objects, but often fails when object in-

stances overlap. In the case of overlapping instances, im-

age information is necessary to decide where to place boxes

and how many of them to output. As a workaround, several

approaches proposed specialized solutions that speciﬁcally

address pre-deﬁned constellations of objects (e.g. pairs of

pedestrians) [

5, 23]. Here, we propose a generic architec-

ture that does not require a specialized deﬁnition of object

constellations, is not limited to pairs of objects, and is fully

trainable.

We speciﬁcally focus on the task of people detection as

an important example of this problem. In crowded scenes

such as the one shown in Fig.

1, multiple people often oc-

cur in close proximity, making it particularly challenging to

distinguish between nearby individuals.

The key contribution of this paper is a trainable, end-to-

end approach that jointly predicts the objects in an image.

This lies in contrast to existing methods that treat predic-

tion or classiﬁcation of each bonding box as an indepen-

dent problem and require post-processing on the set of de-

tections. We demonstrate that our approach is superior to

existing architectures on a challenging dataset of crowded

scenes with large numbers of people. A technical contribu-

tion of this paper is a novel loss function for sets of objects

that combines elements of localization and detection. An-

2325

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余8页未读，立即下载

评论收藏

内容反馈

ture_dream

粉丝: 277
资源: 63

End-To-End_People_Detection_CVPR_2016_paper

最新资源

End-To-End_People_Detection_CVPR_2016_paper

Kulkarni-ReconNet-Non-Iterative-Reconstruction-CVPR-2016-paper.docx

Shou_Temporal_Action_Localization_CVPR_2016_paper(译文)1

Shi_Point-GNN_Graph_Neural_Network_for_3D_Object_Detection_in_a_CVPR_2020_paper.pdf

Mei_Dont_Hit_Me_Glass_Detection_in_Real-World_Scenes_CVPR_2020_paper.pdf

Sultani_Real-World_Anomaly_Detection_CVPR_2018_paper笔记1

计算机视觉论文 Redmon_You_Only_Look_CVPR_2016_paper

Zhang_Cross-Scene_Crowd_Counting_2015_CVPR_paper.pdf

Zeng_Learning_Pyramid-Context_Encoder_Network_for_High-Quality_Image_Inpainting_CVPR_2019_paper.pdf

Zhong_Graph_Convolutional_Label_Noise_Cleaner_Train_a_Plug-And-Play_Action_Classifier_CVPR_2019_paper.pdf

DETR(End-to-End Object Detection with Transformers （CVPR 20)相关代码

Hendricks__Deep_Compositional_Captioning_CVPR_2016_paper

3_Wei_Convolutional_Pose_Machines_CVPR_2016_paper.pdf

Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf

Lin_Semantics-Preserving_Hashing_for_2015_CVPR_paper

Duan_Revisiting_Skeleton-Based_Action_Recognition_CVPR_2022_paper.pdf

Yang_PIXOR_Real-Time_3D_CVPR_2018_paper.pdf

Yang_PIXOR_Real-Time_3D_CVPR_2018_paper.zip

Wang_ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper

2018年aicity挑战赛第一题论文Tang_Single-Camera_and_Inter-Camera_CVPR_2018_paper.pdf

Barath_Five-Point_Fundamental_Matrix_CVPR_2018_paper.pdf

SRFBN_CVPR19-master_SRFBN_nutsyw5_神经网络_超分辨率_超分辨率重建_

ssim-1.1.rar_Local descriptor_SSIM matlab_Shechtman _cvpr_irani

解决win7win8win10装4.8-3.5的.Net framework3.5安装失败问题 附带安装文档

谷歌浏览器axure扩展程序

时序图画图工具-TimeGen3.2安装包

大唐杯习题合集-历年真题模拟题

百度、高德、腾讯、天地图、谷歌、必应、MapBox等地图金字塔切图工具 MapCutter 3.11.2

zotero-pdf-translate-1.0.24（2023年7月10日）

姓名变为拼音.bas

最新资源

解决win7win8win10装4.8-3.5的.Net framework3.5安装失败问题附带安装文档