模板匹配之Linemod论文资源-CSDN文库

需积分: 1 91 浏览量 2024-03-13 09:34:02 上传评论收藏 2.73MB PDF 举报

### 模板匹配之Linemod论文：Gradient Response Maps for Real-Time Detection of Textureless Objects #### 引言本文提出了一种实时3D物体实例检测方法，该方法无需耗时的训练阶段，并能够处理无纹理的物体。核心是新颖的图像表示法用于模板匹配，旨在对小图像变换具有鲁棒性。这种鲁棒性基于图像梯度方向的扩散，并允许我们在解析图像时仅测试一小部分可能的像素位置，并用有限数量的模板表示3D物体。此外，我们还展示了如果可用密集深度传感器，我们可以扩展这种方法以获得更好的性能，同时考虑3D表面法线方向。我们介绍了如何利用现代计算机架构构建高效且非常有判别性的输入图像表示，以便在实时情况下考虑数千个模板。 #### 主要贡献 1. **梯度响应图（Gradient Response Maps）**：这是一种新的图像表示方法，通过考虑图像梯度的方向来增强模板匹配的鲁棒性。 2. **快速检测无纹理物体**：本文的方法特别适合于快速检测无纹理或纹理较少的物体。 3. **高效模板匹配**：通过减少需要测试的像素位置的数量来提高效率。 4. **融合深度信息**：当存在密集深度传感器时，可以进一步利用3D表面法线方向信息来增强检测性能。 5. **并行计算优化**：利用现代计算机架构实现高效并行处理，使系统能够在实时情况下处理大量模板。 6. **实验证明**：通过大量真实数据实验验证了该方法与现有技术相比，在速度和背景杂乱下的鲁棒性方面具有显著优势。 #### 方法概述 - **梯度响应图**：该方法的核心是基于图像梯度方向的扩散。具体来说，它使用图像中的局部梯度方向作为特征，这些特征对小的图像变换具有鲁棒性。这种方法使得即使在物体发生轻微变形的情况下也能保持较高的匹配率。 - **模板表示**：通过提取物体的关键梯度特征，可以构建一个紧凑的模板集合来代表3D物体。这不仅减少了存储需求，还大大加快了检测速度。 - **实时性能**：通过减少需要检查的像素数量，并利用现代计算机的并行处理能力，该方法可以在实时环境中处理大量的模板，实现快速准确的物体检测。 #### 实验结果 - **比较研究**：与当前最先进的方法相比，该方法在检测速度和鲁棒性方面表现出了明显的优势。 - **应用场景**：特别是在机器人技术和实时视觉应用领域，如自动化装配、物体识别等场景中，该方法具有巨大的潜力。 #### 结论本文介绍的方法为实时检测无纹理物体提供了一种新的解决方案，其关键在于使用梯度响应图来提高模板匹配的鲁棒性和效率。通过实验验证，该方法不仅速度快，而且对于复杂背景下的物体检测具有很好的鲁棒性。此外，结合深度信息的应用进一步增强了系统的性能。该方法为计算机视觉领域的实时物体检测和学习任务提供了一个强大而灵活的工具。 #### 展望未来的工作可以探索更多种类的物体以及更复杂的环境条件下的应用，进一步提升算法的泛化能力和适应性。同时，随着硬件技术的进步，可以预见该方法将在更多的实际应用场景中发挥重要作用。

资源推荐

资源详情

资源评论

Gradient Response Maps for Real-Time

Detection of Textureless Objects

Stefan Hinterstoisser, Member, IEEE, Cedric Cagniart, Slobodan Ilic, Member, IEEE,

Peter Sturm, Member, IEEE, Nassir Navab, Member, IEEE,

Pascal Fua, Fellow, IEEE, and Vincent Lepetit

Abstract—We present a method for real-time 3D object instance detection that does not require a time-consuming training stage, and

can handle untextured objects. At its core, our approach is a novel image representation for template matching designed to be robust

to small image transformations. This robustness is based on spread image gradient orientations and allows us to test only a small

subset of all possible pixel locations when parsing the image, and to represent a 3D object with a limited set of templates. In addition,

we demonstrate that if a dense depth sensor is available we can extend our approach for an even better performance also taking 3D

surface normal orientations into account. We show how to take advantage of the architecture of modern computers to build an efficient

but very discriminant representation of the input images that can be used to consider thousands of templates in real time. We

demonstrate in many experiments on real data that our method is much faster and more robust with respect to background clutter than

current state-of-the-art methods.

Index Terms—Computer vision, real-time detection and object recognition, tracking, multimodality template matching.

1INTRODUCTION

EAL-TIME object instance detection and learning are two

important and challenging tasks in computer vision.

Among the application fields that drive development in this

area, robotics especially has a strong need for computation-

ally efficient approaches as autonomous systems continu-

ously have to adapt to a changing and unknown

environment and to learn and recognize new objects.

For such time-critical applications, real-time template

matching is an attractive solution because new objects can

be easily learned and matched online, in contrast to

statistical-learning techniques that require many training

samples and are often too computationally intensive for

real-time performance [1], [2], [3], [4], [5]. The reason for

this inefficiency is that those learning approaches aim at

detecting unseen objects from certain object classes instead

of detecting a priori, known object instances from multiple

viewpoints. Classical template matching tries to achieve the

latter in classical template matching where generalization is

not performed on the object class but on the viewpoint

sampling. While this is considered as an easier task, it does

not make the problem trivial, as the data still exhibit

significant changes in viewpoint, in illumination, and in

occlusion between the training and the runtime sequence.

When the object is textured enough for keypoints to be

found and recognized on the basis of their appearance, this

difficulty has been successfully addressed by defining patch

descriptors that can be computed quickly and used to

characterize the object [6]. However, this kind of approach

will fail on textureless objects such as those of Fig. 1, whose

appearance is often dominated by their projected contours.

To overcome this problem, we propose a novel approach

based on real-time template recognition for rigid 3D object

instances, where the t emplates can be both built and

matched very quickly. We will show that this makes it

very easy and virtually instantaneous to learn new

incoming objects by simply adding new templates to the

database while maintaining reliable real-time recognition.

However, we also wish to keep the e fficiency and

robustness of statistical methods, as they learn how to reject

unpromising image locations very quickly and tend to be

very robust because they can generalize well from the

training set. We therefore propose a new image representa-

tion that holds local image statistics and is fast to compute. It

is designed to be invariant to small translations and

deformations of the templates, which has been shown to

be a key factor to generalization to different viewpoints of the

same object [6]. In addition, it allows us to quickly parse the

image by skipping many locations without loss of reliability.

Our approach is related to recent and efficient template

matching methods [7], [8] which consider only images and

their gradients to detect objects. As such, they work even

when the object is not textured enough to use feature point

techniques, and learn new objects virtually instantaneously.

In addition, they can directly provide a coarse estimation of

the object pose, which is especially important for robots

876 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012

. S. Hinterstoisser, C. Cagniart, S. Ilic, and N. Navab are with the

Department of Computer Aided Medical Procedures (CAMP), Technische

Universita

tMu

nchen, Garching bei Mu

nchen 85478, Germany.

E-mail: {hinterst, cagniart, Slobodan.Ilic, navab}@in.tum.de.

. P. Sturm is with the STEEP Team, INRIA Grenoble-Rho

ne-Alpes, Saint-

Ismier Cedex 38334, France. E-mail: Peter.Sturm@inrialpes.fr.

. P. Fua and V. Lepetit are with the Computer Vision Lab (CVLAB), Ecole

Polytechnique Fe

rale de Lausane, Lausanne 1015, Switzerland.

E-mail: {pascal.fua, vincent.lepetit}@epfl.ch.

Manuscript received 29 Sept. 2010; revised 15 Sept. 2011; accepted 17 Sept.

2011; published online 8 Oct. 2011.

Recommended for acceptance by S. Sclaroff.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number

TPAMI-2010-09-0748.

Digital Object Identifier no. 10.1109/TPAMI.2011.206.

0162-8828/12/$31.00 ß 2012 IEEE Published by the IEEE Computer Society

which have to interact with their environment. However,

similarly to previous template matching approaches [9],

[10], [11], [12], they suffer severe degradation of perfor-

mance or even failure in the presence of strong background

clutter such as the one displayed in Fig. 1.

We therefore propose a new approach that addresses this

issue while being much faster for larger templates. Instead

of making the templates invariant to small deformations

and translations by considering dominant orientations only

as in [7], we build a representation of the input images

which has similar invariance properties but consider all

gradient orientations in local image neighborhoods. To-

gether wit h a novel similarity measure, this prevents

problems due to too strong gradients in the background,

as illustrated by Fig. 1.

To avoid slowing down detection when using this finer

method, we have to carefully consider how modern CPUs

work. A naive implementation would result in many

“memory cache misses,” which slow down the computa-

tions, and we thus show how to structure our image

representation in memory to prevent these and to addition-

ally exploit heavy SSE parallelization. We consider this as

an important contribution: Because of the nature of the

hardware improvements, it is no longer guaranteed that

legacy code will run faster on the new versions of CPUs

[13]. This is particularly true for computer vision, where

algorithms are often computationally expensive. It is now

required to take the CPU architecture into account, which is

not an easy task.

For the case where a dense depth sensor is available, we

describe an extension of our method where additional depth

data are used to further increase the robustness by simulta-

neously leveraging the information of the 2D image gradients

and 3D surface normals. We propose a method that robustly

computes 3D surface normals from dense depth maps in real-

time, making sure to preserve depth discontinuities on

occluding contours and to smooth out discretization noise

of the sensor. The 3D normals are then used together with the

image gradients and in a similar way.

In the remainder of the paper, we first discuss related

work before we explain our approach. We then discuss

the theoretical complexity of our approach. We finally

present experiments and quantitative evaluations for

challenging scenes.

2RELATED WORK

Template matching has played an important role in

tracking-by-detection applications for many years. This is

due to its simplicity and its capability of handling different

types of objects. It neither needs a large training set nor a

time-consuming training stage, and can handle low-

textured or textureless objects, which are, for example,

difficult to detect with feature points-based methods [6],

[14]. Unfortunately, this increased robustness often comes

at the cost of an increased computational load that makes

naive template matching inappropriate for real-time

applications. So far, several works have attempted to

reduce this complexity.

An early approach to Template Matching [12] and its

extension [11] include the use of the Chamfer distance

between the template and the input image contours as a

dissimilarity measure. For instance, Gavrila and Philomin

[11] introduced a coarse-to-fine approach in shape and

parameter space using Chamfer Matching [9] on the

Distance Transform (DT) of a binary edge image. The

Chamfer Matching minimizes a generalized distance be-

tween two sets of edge points. Although fast when using the

Distance Transform, the disadvantage of the Chamfer

Transform is its sensitivity to outliers, which often result

from occlusions.

Another common measure on binary edge images is the

Hausdorff distance [15]. It measures the maximum of all

distances from each edge point in the image to its nearest

neighbor in the template. However, it is sensitive to

occlusions and clutter. Huttenlocher et al. [10] tried to

avoid t hat shortcoming by introducing a generalized

Hausdorff distance which only computes the maximum

of the kth largest distances between the image and the

model edges and the lth largest distances between the

model and the image edges. This makes the method robust

against a certain percentage of occlusions and clutter.

Unfortunately, a prior estimate of the background clutter in

the image is required but not always available. Addition-

ally, computing the Hausdorff distance is computationally

expensive and prevents its real-time application when

many templates are used.

Both Chamfer Matching and the Hausdorff distance can

easily be modified to take the orientation of edge points into

account. This drastically reduces the number of false

positives as shown in [12], but unfortunately also increases

the computational load.

Themethofof[16]isalsobasedontheDistance

Transform; however, it is invariant to scale changes and

robust enough against planar perspective distortions to do

real-time matching. Unfortunately, it is restricted to objects

with closed contours, which are not always available.

All these methods use binary edge images obtained with a

contour extraction algorithm, using the Canny detector [17],

for example, and they are very sensitive to illumination

changes, noise, and blur. For instance, if the image contrast is

lowered, the number of extracted edge pixels progressively

HINTERSTOISSER ET AL.: GRADIENT RESPONSE MAPS FOR REAL-TIME DETECTION OF TEXTURELESS OBJECTS 877

Fig. 1. Our method can detect textureless 3D objects in real time under

different poses over heavily cluttered background using gradient

orientation.

剩余12页未读，继续阅读

评论收藏

内容反馈

thisiszdy

粉丝: 414
资源: 12

模板匹配之Linemod论文

模板匹配的论文

论文研究-基于图像边缘摘要的快速模板匹配.pdf

论文研究-一种基于模板匹配的运动目标跟踪方法.pdf

论文研究-一种基于图像纹理的模板匹配算法的改进与实现.pdf

论文研究-基于轮廓特征的模板匹配方法及其应用.pdf

shape_based_matching.rar_linemod_linemod匹配拾取_somet5v_形状模板匹配_模板匹配

模板——匹配

模板匹配

OpenCV 4.8.0

ZXPSignLib-minimal.dll

OriginPro 色卡

落雪音乐自定义音源切换

基于FPGA的ov5640图像采集

CorelDRAW-X4-SP2精简增强版

Video Speed Controller谷歌插件

GIF图片制作神器gifcam.exe

xfeatures2d.zip

Halcon License 1月

python调用DXGI实时快速截屏，是python截屏的最快版了

组态图库-精美图1000+

《数字图像处理》期末复习题库3 + 试题答案

win10下cdr_X4.X5.X6菜单.7z

《数字图像处理》期末复习题库1 + 试题答案

photoshop2024增效工具ICOFormat.8bi(PS ico插件)，photoshop2024等历届亲测试可用

鸿蒙HarmonyOS壁纸，万物皆鸿蒙（无水印版）.rar

源代码-C#与halcon通用开发框架.zip

海康VM框架软件PLC通信功能使用详解

最新资源