【免费】多元几何和深度学习检测动态物体1资源-CSDN文库

需积分: 0 140 浏览量 2022-08-03 21:03:14 上传评论收藏 6.41MB PDF 举报

【多元几何与深度学习在动态物体检测中的应用】在机器人技术、自动驾驶以及服务机器人等领域，同时定位与建图（SLAM, Simultaneous Localization and Mapping）是基础且至关重要的技术。传统的SLAM算法通常假设场景是刚性的，即场景中的所有元素都是静止的，但这在实际的现实环境中并不成立，尤其是人来人往或车辆穿梭的场景。这种假设限制了视觉SLAM系统在实际应用中的效果。 DynaSLAM是一种针对动态场景的视觉SLAM系统，由ORB-SLAM2发展而来，增加了对动态物体的检测和背景补全功能。DynaSLAM适用于单目、立体和RGB-D配置，即使在环境变化大、动态物体多的情况下也能保持鲁棒性。该系统能够通过多元几何方法、深度学习技术，或者两者结合来识别运动物体。识别出动态物体后，系统能对被遮挡的静态背景进行补全，从而得到一个静态地图，这对于长期在真实环境中运行的机器人应用是必不可少的。在DynaSLAM中，多元几何方法利用相机捕获的多个视角信息来分析物体的运动，通过对特征点的跟踪和匹配，推断出哪些物体是移动的。而深度学习技术则可以通过预先训练的模型，如卷积神经网络（CNN），来直接识别动态物体。这两种方法各有优势，多元几何方法对于简单的运动模式和结构化的环境有较好的表现，而深度学习则能处理更复杂、多样化的对象和场景。 DynaSLAM系统在公开的单目、立体和RGB-D数据集上进行了评估，研究了不同精度与速度的权衡对系统性能的影响。实验结果显示，DynaSLAM在高度动态的场景中，其定位精度优于标准的视觉SLAM基线，并且能够估计出场景中静态部分的地图，这对于避免碰撞和长期导航至关重要。此外，DynaSLAM的贡献还在于提出了一个动态场景处理的框架，不仅解决了传统SLAM算法在动态环境下的局限，而且为未来的研究提供了新的思路。例如，如何更有效地融合多元几何和深度学习，以及如何在实时性与准确性之间取得更好的平衡。这些挑战对于推动SLAM技术在复杂现实世界中的广泛应用具有重要意义。 DynaSLAM是SLAM领域的一个重要进展，它将传统的几何方法与现代的深度学习技术相结合，以适应并处理动态场景，为服务机器人和自动驾驶等领域的技术发展铺平了道路。随着计算能力的提升和深度学习模型的进一步优化，我们可以期待未来SLAM系统在动态环境中的性能将更加卓越。

资源详情

资源评论

资源推荐

DynaSLAM: Tracking, Mapping and Inpainting in Dynamic Scenes

Berta Bescos, Jos

e M. F

acil, Javier Civera and Jos

e Neira

Abstract— The assumption of scene rigidity is typical in

SLAM algorithms. Such a strong assumption limits the use

of most visual SLAM systems in populated real-world environ-

ments, which are the target of several relevant applications like

service robotics or autonomous vehicles.

In this paper we present DynaSLAM, a visual SLAM system

that, building on ORB-SLAM2 [1], adds the capabilities of dy-

namic object detection and background inpainting. DynaSLAM

is robust in dynamic scenarios for monocular, stereo and

RGB-D conﬁgurations. We are capable of detecting the moving

objects either by multi-view geometry, deep learning or both.

Having a static map of the scene allows inpainting the frame

background that has been occluded by such dynamic objects.

We evaluate our system in public monocular, stereo and

RGB-D datasets. We study the impact of several accuracy/speed

trade-offs to assess the limits of the proposed methodology. Dy-

naSLAM outperforms the accuracy of standard visual SLAM

baselines in highly dynamic scenarios. And it also estimates

a map of the static parts of the scene, which is a must for

long-term applications in real-world environments.

I. INTRODUCTION

Simultaneous Localization and Mapping (SLAM) is a

prerequisite for many robotic applications, for example

collision-less navigation. SLAM techniques estimate jointly

a map of an unknown environment and the robot pose

within such map, only from the data streams of its on-board

sensors. The map allows the robot to continually localize

within the same environment without accumulating drift.

This is in contrast to odometry approaches that integrate the

incremental motion estimated within a local window and are

unable to correct the drift when revisiting places.

Visual SLAM, where the main sensor is a camera, has

received a high degree of attention and research efforts over

the last years. The minimalistic solution of a monocular cam-

era has practical advantages with respect to size, power and

cost, but also several challenges such as the unobservability

of the scale or state initialization. By using more complex

setups, like stereo or RGB-D cameras, these issues are solved

and the robustness of visual SLAM systems can be greatly

improved.

The research community has addressed SLAM from

many different angles. However, the vast majority of the

approaches and datasets assume a static environment. As

This work has been supported by NVIDIA Corporation through the

donation of a Titan X GPU, by the Spanish Ministry of Economy and

Competitiveness (projects DPI2015-68905-P and DPI2015-67275-P, FPI

grant BES-2016-077836), and by the Arag

on regional government (Grupo

DGA T04-FSE).

Berta Bescos, Jos

e M. F

acil, Javier Civera and Jos

e Neira

are with the Instituto de Investigaci

on en Ingenier

ıa de

Arag

on (I3A), Universidad de Zaragoza, Zaragoza 50018, Spain

{bbescos,jmfacil,jcivera,jneira}@unizar.es

(a) Input RGB-D frames with dynamic content.

(b) Output RGB-D frames. Dynamic content has been removed. Occluded

background has been reconstructed with information from previous views.

Fig. 1: Overview of DynaSLAM results for the RGB-D case.

a consequence, they can only manage small fractions of

dynamic content by classifying them as outliers to such static

model. Although the static assumption holds for some robotic

applications, it limits the applicability of visual SLAM in

many relevant cases, such as intelligent autonomous systems

operating in populated real-world environments over long

periods of time.

Visual SLAM can be classiﬁed into feature-based methods

[2], [3], that rely on salient points matching and can only esti-

mate a sparse reconstruction; and direct methods [4], [5], [6],

which are able to estimate in principle a completely dense

reconstruction by the direct minimization of the photometric

error and TV regularization. Some direct methods focus on

the high-gradient areas estimating semi-dense maps [7], [8].

None of the above methods, considered the state of the

art, address the very common problem of dynamic objects

in the scene, e.g., people walking, bicycles or cars. Detecting

and dealing with dynamic objects in visual SLAM reveals

several challenges for both mapping and tracking, including:

1) How to detect such dynamic objects in the images to:

a) Prevent the tracking algorithm from using

matches that belong to dynamic objects.

arXiv:1806.05620v2 [cs.CV] 15 Aug 2018

深度学习+几何判别来检测动态objects

堵塞

胜过

长期

无碰撞

防止跟踪算法使用属于动态对象的匹配。

如何检测动态物体

b) Prevent the mapping algorithm from including

moving objects as part of the 3D map.

2) How to complete the part of the 3D map that is

temporally occluded by a moving object.

Many applications would greatly beneﬁt from progress

along these lines. Among others, augmented reality, au-

tonomous vehicles, and medical imaging. All of them could

for instance safely reuse maps from previous runs. Detecting

and dealing with dynamic objects is a requisite to estimate

stable maps, useful for long-term applications. If the dynamic

content is not detected, it becomes part of the 3D map,

complicating its usability for tracking or relocation purposes.

In this work we propose an on-line algorithm to deal with

dynamic objects in RGB-D, stereo and monocular SLAM.

This is done by adding a front-end stage to the state-of-

the-art ORB-SLAM2 system [1], with the purpose of having

a more accurate tracking and a reusable map of the scene.

In the monocular and stereo cases our proposal is to use a

CNN to pixel-wise segment the a priori dynamic objects

in the frames (e.g., people and cars), so that the SLAM

algorithm does not extract features on them. In the RGB-

D case we propose to combine multi-view geometry models

and deep-learning-based algorithms for detecting dynamic

objects and, after having removed them from the images,

inpaint the occluded background with the correct information

of the scene (Fig. 1).

The rest of the paper is structured as follows: section II

discusses related work, section III gives the details of our

proposal, section IV details the experimental results, and

section V presents the conclusions and lines for future work.

II. RELATED WORK

Dynamic objects are, in most SLAM systems, classiﬁed as

spurious data and therefore neither included in the map nor

used for camera tracking. The most typical outlier rejection

algorithms are RANSAC (e.g., in ORB-SLAM [3], [1]) and

robust cost functions (e.g., in PTAM [2]).

There are several SLAM systems that address more

speciﬁcally the dynamic scene content. Within feature-based

SLAM methods, some of the most relevant are:

• Tan et al. [9] detect changes that take place in the scene

by projecting the map features into the current frame for

appearance and structure validation.

• Wangsiripitak and Murray [10] track known 3D dy-

namic objects in the scene. Similarly, Riazuelo et al.

[11] deal with human activity by detecting and tracking

people.

• More recently, the work of Li and Lee [12] uses

depth edges points, which have an associated weight

indicating its probability of belonging to a dynamic

object.

Direct methods are, in general, more sensitive to dynamic

objects in the scene. The most relevant works speciﬁcally

designed for dynamic scenes are:

• Alcantarilla et al. [13] detect moving objects by means

of a scene ﬂow representation with stereo cameras.

• Wang and Huang [14] segment the dynamic objects in

the scene using RGB optical ﬂow.

• Kim et al. [15] propose to obtain the static parts of the

scene by computing the difference between consecutive

depth images projected over the same plane.

• Sun et al. [16] calculate the difference in intensity

between consecutive RGB images. Pixel classiﬁcation

is done with the segmentation of the quantized depth

image.

All the methods –both feature-based and direct ones–

that map the static scene parts only from the information

contained in the sequence [1], [3], [9], [12], [13], [14], [15],

[16], [17], fail to estimate lifelong models when an a priori

dynamic object remains static, e.g., parked cars or people

sitting. On the other hand, Wangsiripitak and Murray [10],

and Riazuelo et al. [11] would detect those a priori dynamic

objects, but would fail to detect changes produced by static

objects, e.g., a chair a person is pushing, or a ball that

someone has thrown. That is, the former approach succeeds

in detecting moving objects, and the second one in detecting

several movable objects. Our proposal, DynaSLAM, com-

bines multi-view geometry and deep learning in order to

address both situations. Similarly, Anrus et al. [18] segment

dynamic objects by combining a dynamic classiﬁer and

multi-view geometry.

III. SYSTEM DESCRIPTION

Fig. 2 shows an overview of our system. First of all, the

RGB channels pass through a CNN that segments out pixel-

wise all the a priori dynamic content, e.g., people or vehicles.

In the RGB-D case, we use multi-view geometry to im-

prove the dynamic content segmentation in two ways. First,

we reﬁne the segmentation of the dynamic objects previously

obtained by the CNN. Second, we label as dynamic new

object instances that are static most of the time (i.e., detect

moving objects that were not set to movable in the CNN

stage).

For that purpose, it is necessary to know the camera pose,

for which a low-cost tracking module has been implemented

to localize the camera within the already created scene map.

These segmented frames are the ones which are used to

obtain the camera trajectory and the map of the scene.

Notice that if the moving objects in the scene are not within

the CNN classes, the multi-view geometry stage would still

detect the dynamic content, but the accuracy might decrease.

Once this full dynamic object detection and localization

of the camera have been done, we aim to reconstruct the

occluded background of the current frame with static in-

formation from previous views. These synthetic frames are

relevant for applications like augmented and virtual reality,

and place recognition in lifelong mapping.

In the monocular and stereo cases, the images are seg-

mented by the CNN so that keypoints belonging to the a

priori dynamic objects are neither tracked nor mapped.

All the different stages are described in depth in the next

subsections (III-A to III-E).

防止映射算法将移动对象作为3D地图的一部分

剩余7页未读，继续阅读

评论收藏

内容反馈

本本纲目

粉丝: 32
资源: 293

多元几何和深度学习检测动态物体1

评论0

最新资源

多元几何和深度学习检测动态物体1

评论0

基于深度学习的移动物体检测.pdf

基于深度学习的物体检测.pdf

cpp-基于深度学习的算法检测旋转物体例如遥感图像中的物体

基于深度学习的物体检测分割.pdf

基于深度学习的学生空间想象能力培养.pdf

在“数学实验”引领下构建深度学习课堂--以基本不等式的证明为例.pdf

高中数学教学中如何促进学生的深度学习.pdf

github500个机器学习深度学习知识

深度学习之物体检测问题论文综述.docx

Python-深度学习-物体检测实战

机器人打磨碳纤维复合材料构件表面质量研究.pdf

高数期末复习.zip

角的认识教学设计.doc

基于颜色特征、基于形状特征或者基于颜色和形状综合特征图像检索

Python-深度学习-物体检测实战.rar

深度学习-物体检测-YOLO系列.rar

图像物体检测深度学习算法综述.pdf

2014-2019深度学习 物体检测历史的图表

Python-深度学习-物体检测实战视频课程

计算机图像处理与识别技术

青岛版五四制五年级数学上册教案.doc

高职素描教学策略及方法.doc

大班美术课教案《城市楼房》.docx

D数学建模试题.pdf

最新资源

2014-2019深度学习物体检测历史的图表