Multi-View3DObjectRecognitioninPointClouds资源-CSDN文库

需积分: 10 2 浏览量 2017-05-22 22:21:29 上传评论 1 收藏 5.8MB PDF 举报

### 多视图3D物体识别在点云中的应用 #### 摘要与背景介绍本文探讨了一种快速而稳健的多视图3D物体识别方法，该方法旨在处理大规模工业或城市街道数据中的3D物体识别问题。现有的3D物体识别方法通常依赖于复杂的前期分割工作以及3D描述子的训练与匹配，这些过程既耗时又复杂。特别是在处理大型数据集（如工业场景或城市街道扫描）时，这些问题变得更加突出。为了解决这一挑战，本文提出了一种新的识别方法：将3D点云投影到多个视角下的2D深度图像中，从而将3D识别问题转换为一系列2D检测问题。这种方法显著降低了计算复杂度，提高了性能稳定性，并极大提升了识别速度，而且不需要对物体进行预分割或检测器训练。实验结果表明，与当前最先进的几种方法相比，该方法具有明显的优势，在工业和城市街道扫描数据集上的表现尤其出色。 #### 1. 引言 3D物体识别是计算机视觉领域一个极具挑战性的问题，主要因为点云数据的特点，如离散采样、遮挡以及杂乱的场景等。现有方法大多针对小规模数据集，采用3D描述子进行识别；而对于大规模数据集，尤其是城市街道扫描数据，则主要利用机器学习技术选择最合适的3D物体描述方式，以便能够在复杂的城市环境中可靠地识别出目标物体。这些方法通常需要对输入数据进行预分割。相对较少的研究关注工业零件识别，这类数据集中的物体往往更加密集排列，使得分割变得更为困难。无论是在哪个领域，大多数方法都是在3D空间内完成识别任务，包括使用局部3D描述子或进行3D扫描窗口搜索。这两种方法都需要3D描述子或检测器的训练，计算量大且耗时较长。随着3D扫描技术的普及和发展，大规模工业或城市街道数据集包含数百万乃至数十亿个3D点，因此寻找快速而稳健的3D物体识别方法变得尤为重要。 #### 2. 方法概述本研究提出的解决方案是将3D点云投影到多视角的2D深度图像中。具体来说： - **多视角投影**：通过选择多个不同的观察角度，将3D点云转换成一系列2D深度图像。 - **2D检测**：对每个2D深度图像应用现有的2D物体检测技术进行识别。 - **综合分析**：结合所有视角的结果，最终确定3D空间中的物体位置和类别。这种方法的优势在于： - **减少复杂度**：通过将3D问题转换为多个2D问题，大大减少了计算复杂度。 - **提高效率**：避免了对3D描述子或检测器的训练，显著提高了识别速度。 - **增强鲁棒性**：多视角投影可以提供更丰富的信息，有助于提高识别的准确性和鲁棒性。 #### 3. 实验验证为了验证所提方法的有效性，作者进行了大量实验，包括： - **数据集**：选取了多个工业和城市街道的数据集作为测试样本。 - **比较方法**：与当前最先进的几种3D物体识别方法进行了对比。 - **评价指标**：采用准确率、召回率等指标来评估识别效果。实验结果表明，所提方法在各种场景下都表现出了优异的性能，尤其是在处理大规模数据集时优势尤为明显。此外，该方法的高效性和鲁棒性也得到了充分验证。 #### 结论与展望本文提出了一种基于多视图投影的3D物体识别方法，有效地解决了现有技术面临的复杂度高、耗时长等问题。该方法不仅适用于大规模工业或城市街道数据，还能在保持高精度的同时大幅提高识别速度。未来的研究方向可以进一步优化2D检测算法，探索更多样的视角选择策略，以进一步提升识别效果和适用范围。

资源推荐

资源详情

资源评论

Fast and Robust Multi-View 3D Object Recognition in Point Clouds

Guan Pang

University of Southern California

gpang@usc.edu

Ulrich Neumann

University of Southern California

uneumann@usc.edu

Abstract

Recognition of three dimensional (3D) objects in point

clouds is a challenging problem. Existing methods often

require prior segmentation or 3D descriptor training and

matching, both time consuming and complex processes,

especially for large-scale industrial or urban street data.

We describe a new recognition approach that projects a

3D point cloud into several 2D depth images from multiple

viewpoints, transforming the 3D recognition problem into

a series of 2D detection problems. This method reduces

complexity, stabilizes performance, and signiﬁcantly speeds

up the recognition process, without any requirement for

object segmentation or detector training. Experiments

validate the superiority of our method over several state-

of-the-art methods on examples from industrial and street

data scans.

1. Introduction

3D object recognition (ﬁg. 1) in point clouds is a chal-

lenging problem due to discrete sampling, occlusions and

cluttered scenes. Many existing methods focus on small-

scale data [1, 2, 3, 4, 5, 6, 7, 8] using 3D descriptors. A

few others work with large-scale data, mostly urban street

scans [9, 10, 12, 13, 14, 15]. These methods utilize machine

learning to select the best description for a speciﬁc type

of 3D object, so they can be recognized reliably in a large

urban scene, and usually require prior segmentation of input

data. Relatively fewer take on industrial part recognition[14,

15], where objects are often more densely arranged, making

segmentation more difﬁcult. Regardless of domain focus,

most methods perform the recognition process in 3D, either

using 3D local descriptors [1, 9, 16, 2, 3, 4] or exhaustive3D

scanning-window search [14, 17]. Both approaches require

3D descriptor or detector training and are time-consuming

due to the 3-dimensional search. Large-scale industrial

or street data contain 100’s of millions or billions of 3D

points, motivating the search for fast and robust recognition

methods.

Two recent trends motivate our work. Growing avail-

Figure 1. Object recognition from 3D point cloud.

ability and use of 3D scanners has spurred interest in 3D

object recognition. Also, 2D object detection in images

has improved dramatically. These observations motivate a

transformation of the 3D object recognition problem into a

series of 2D detection problems. This 3D-to-2D strategy

is similar to those used for 3D object model retrieval [20,

21, 23], but our target is unsegmented noisy large-scale 3D

point cloud which is much more complex. Our algorithm

for 3D object recognition is based on multi-view projection,

ﬁrst projecting a 3D point cloud into 2D depth images from

multiple viewpoints. Objects are detected in each view

using gradient data, and the 2D detection results are fused

by 3D re-projection to determine object locations. This

algorithm reduces the search complexity from 3D to 2D,

while removing all requirements for object segmentation

or detector training. The multi-view projection process

also stabilizes performance in cluttered and occluded scenes

and provides rotation invariance. Our method is tested on

a combination of industrial data and street data [25, 13]

containing various types of objects and scene conditions. In

comparisons with state-of-the-art 3D recognition methods,

our method has competitive overall performance with one-

order of magnitude speed-up.

Our main contributions include:

• Transforming the 3D point cloud object recognition

problem into a series of 2D detection problems to

reduce search complexity.

• Employing multi-view projection to provide rotation

invariance and stabilize performance in cluttered and

occluded scenes.

• Achieving one-order of magnitude speed-up compared

to state-of-the-art 3D recognition algorithms.

• Removing all requirements for prior object segmenta-

tion or detector training needed in other algorithms.

2. Related Work

Existing 3D recognition methods usually require seg-

mentation or detector training, and they are slow due to

3D complexity. Methods for object recognition in urban

street data often require segmentation of objects from the

ground [10, 12, 13, 15]. Then a set of object types is deﬁned

to train either a global detector or a set of local descriptors.

Golovinskiy et al. [12] extend the targets to over twenty

types of street objects, using classiﬁers trained with global

features, while requiring the scene to be pre-processed

based on ground estimation, so that candidate objects are

segmented before applying recognition algorithms. Pang

et al. [14] employ Adaboost to train a combination of

weighted 3D Haar-like features for detectors and exhaus-

tively searches for objects in 3D space, thus avoiding the

requirement for segmentation. However, this method only

handles limited rotation changes. Song et al. [17] use depth

maps for object detection with a 3D detector scanning in

3D space. This shares some similarity with our depth-based

projections, but their method focuses on RGB-D data rather

than point clouds, and it is very time-consuming due to the

extensive costs for detector training.

Local 3D shape descriptors are frequently used by exist-

ing methods. Most popular are spin images (SI) [1] which

encodes surface properties in a local object-oriented system,

as well as others such as 3D shape context [9], fast point

feature histogram [16], signature of histograms of orienta-

tions [2] and unique shape context [3]. Several surveys

compare these 3D descriptors in more detail [11, 18, 19].

A few others focus on improving descriptor matching [6, 7,

8, 15]. However, 3D descriptor-based recognition methods

require prior segmentation of background points, as well

as descriptor computation and matching in 3D space, time-

consuming processes that make these methods inefﬁcient.

The strategy of reducing 3D problem into 2D space

was similarly employed for 3D object retrieval. Chen

et al. [20] use 2D shapes and silhouettes to retrieve 3D

object mesh models. Ohbuchi et al. [21] extract 2D multi-

scale local features from range images to aid in 3D object

model retrieval. Shang and Greenspan [22] use view sphere

sampling to extract features from the minima of the error

surfaces for 3D recoginition. Aubry et al. [23] also apply

the idea of 3D-to-2D to align 3D CAD chair models to 2D

images with trained mid-level visual elements. However,

these methods focus on matching individual clean object

mesh models, much simpler than the unsegmented noisy

large-scale 3D point clouds in our case with tremendously

increased complexity.

Figure 2. Flow of the proposed algorithm: First project the 3D

point clouds into 2D images from multiple views, then detect

object in each view separately, and ﬁnally re-project all 2D results

back into 3D for a fused 3D object location estimate.

3. Algorithm Introduction

3.1. Multi-View Projection

The core idea of our recognition algorithm is to trans-

form a 3D detection problem into a series of 2D detec-

tion problem, thereby transforming the complexity of an

exhaustive 3D search into a ﬁxed number of 2D search-

es. This is achieved by projection of 3D point clouds at

multiple viewpoints to decompose it into a series of 2D

images, which works like the reverse process of multi-view

stereo reconstruction [24] where 2D images from multiple

viewpoints are fused to reconstruct 3D information. To

ensure that the original 3D information is not lost, the 3D

to 2D projection is done at multiple viewing angles (evenly

chosen on a sphere). Depth information is utilized when

projecting 2D images for each view, and kept for later re-

projection back into 3D for fusion of 2D results. As shown

in the algorithm ﬂow in ﬁg. 2, after the input 3D point cloud

is projected into 2D images from multiple views, each view

is used to locate the target object. Lastly, all 2D detection

results are re-projected back into 3D space for a fused 3D

object location estimate.

The beneﬁts of this multi-view projection are three-

fold. Firstly, each view can compensate for others’ missing

information, equivalent to a pseudo 3D recognition process

with reduced complexity. Secondly, target objects are also

projected from multiple views and detected in all projected

scene views, making the recognition process invariant to

rotation changes. Thirdly, multiple independent 2D detec-

tion processes stabilize the ﬁnal fused 3D object locations,

ﬁltering discrete location offsets common in 2D detection.

剩余8页未读，继续阅读

评论收藏

内容反馈

铮铭

粉丝: 76
资源: 14

Multi-View 3D Object Recognition in Point Clouds

learning from Millions of 3D Scans for Large-scale 3D Face Recognition

ImVoteNet - Boosting 3D Object Detection in Point Clouds With Im

Fast Object Detection in 3D Point Clouds Using Convolutional Neural Networks

Multi-Scale Categorical Object Recognition Using Contour Fragments

Multi-digit Number Recognition from Street View Imagery using DCNN

Learning Representation for Multi-View Data Analysis

Face Recognition: A Novel Multi-Level Taxonomy based Survey

Invariant surface characteristics for 3D object recognition in range images

CIFAR-10 - Object Recognition in Images-数据集

An Overview of Multi-Task Learning in Deep Neural Networks.pdf

Generative-Multi-View-Human-Action-Recognition:生成式多视图人类动作识别

多标签分类问题multi-label recognition

HOG+LBP_HOG-LBP_objectrecognition_形状纹理_纹理特征_lbp.zip

Multi-attribute Recognition,The key to universal neural network2.pdf

HOG+LBP_HOG-LBP_objectrecognition_形状纹理_纹理特征_lbp_源码.zip

Object Detection and Recognition Using Deep Learning in OpenCV [Chapter 1 and 2]

Indoor Scene Recognition by 3-D Object Search

Three-Dimensional-Object-Recognition-and-6-DoF-Pose-Estimation-C

Relation-Aware Pedestrian Attribute Recognition with.pdf

Real-time-Facial-Expression-Recognition-and-Fast-Face-Detection-master.zip

Human-level Moving Object Recognition from Traffic Video

Hierarchical models of object recognition in cortex.pdf

faceRecognition-PHP-master_facerecognition_

最新资源