FeatureTrackingforRobustStructure-from-Motion资源-CSDN文库

需积分: 9 131 浏览量 2018-09-13 14:43:55 上传评论收藏 3.88MB PDF 举报

结构从运动（Structure-from-Motion，SfM）是一种重要的三维重建技术，广泛应用于多个领域，如谷歌地球、微软虚拟地球等。SfM算法能够自动估计图像或视频集合输入下的三维特征，而其中特征跟踪（feature tracking）的质量对于整个算法的效果至关重要。图像序列中，如果由于物体进出视野、偶尔的遮挡或图像噪声等原因造成的不连续特征跟踪（disjointed tracks），没有得到良好处理，将严重影响对应SfM的效果。对于大规模场景的准确SfM而言，这一问题更加严重，因为通常需要捕获多个序列来覆盖整个场景。在本篇研究中，Guofeng Zhang 等人提出了一个高效的非连续特征跟踪（Efficient Non-Consecutive Feature Tracking，ENFT）框架，以匹配分布在不同子序列乃至不同视频中的中断跟踪。该框架包括解决特征“缺失”问题的步骤，特别是当结构不明显、存在噪声或图像出现严重扭曲时；还包括快速识别和连接不同子序列中相同特征的步骤。此外，研究者们贡献了一个基于段的粗到精的SfM估计算法，以高效和鲁棒地处理大型数据集。在多个具有挑战性的大型视频数据集上的实验结果表明了所提系统的效果。非连续特征跟踪、跟踪匹配、结构从运动、捆绑调整等都是本研究中的关键词。结构从运动通常依赖于特征点跟踪技术来处理帧与帧之间的时序关系，这不仅是一个基本的计算机视觉问题解决工具，也是解决如相机跟踪、视频匹配和对象识别等多种问题的关键技术。大型三维重建在实际应用中找到了广泛的应用，例如在电影和商业行业中，视频包含了密集的几何和结构信息，是SfM的重要来源。视频SfM估计的一个常见策略就是使用特征点跟踪，它照顾到帧与帧之间的时序关系。这一技术不仅是三维重建的基础，也涉及到诸多计算机视觉问题的解决。为了有效解决特征跟踪中出现的问题，ENFT框架提供了一种新的方法，用于匹配和处理分散在不同序列或视频中的跟踪特征。为了提高算法的鲁棒性和效率，研究者们还开发了一种基于段的粗到精SfM估计算法，以便能够有效地处理大型数据集。在实际应用中，对于大规模场景的三维重建，通常需要将场景分割成若干部分，分别进行重建。这种分割策略不仅提高了重建的效率，也能够在一定程度上提高重建的精度。ENFT框架通过识别和匹配不同片段中的共同特征来实现场景的完整重建。 SfM算法的鲁棒性和效率对于大规模场景的三维重建至关重要。因为大规模场景往往需要从多个不同的视角、不同的光照条件下进行拍摄，收集的数据量大，且数据的复杂性也高。在面对大规模数据集时，算法需要能够快速处理大量特征点，并在保证精度的前提下有效地估计出场景的三维结构。高效的非连续特征跟踪技术，正是为解决这一挑战而设计的。此外，文章提到的捆绑调整（Bundle Adjustment）是SfM中的一个关键步骤，它是一个优化过程，通过调整相机参数和三维点位置，使得重建模型尽可能地与所有输入图像中的观测点对齐。它通过最小化重投影误差来实现这一目标，是实现精确三维重建的关键技术之一。 ENFT框架提出了一种新的方法来处理图像序列中的非连续特征跟踪问题，对于提高大规模场景三维重建的鲁棒性和效率具有重要意义。通过识别和匹配不同序列或视频中的中断跟踪，以及提出基于段的粗到精SfM估计算法，有效地处理了大型数据集，为大规模三维重建应用提供了强有力的技术支持。

资源推荐

资源详情

资源评论

ENFT: Efﬁcient Non-Consecutive Feature

Tracking for Robust Structure-from-Motion

Guofeng Zhang, Haomin Liu, Zilong Dong, Jiaya Jia, Tien-Tsin Wong, and Hujun Bao

Abstract—Structure-from-motion (SfM) largely relies on the

quality of feature tracking. In image sequences, if disjointed

tracks caused by objects moving in and out of the view,

occasional occlusion, or image noise, are not handled well,

the corresponding SfM could be signiﬁcantly affected. This

problem becomes more serious for accurate SfM of large-scale

scenes, which typically requires to capture multiple sequences

to cover the whole scene. In this paper, we propose an efﬁcient

non-consecutive feature tracking (ENFT) framework to match

the interrupted tracks distributed in different subsequences

or even in different videos. Our framework consists of steps of

solving the feature ‘dropout’ problem when indistinctive struc-

tures, noise or even large image distortion exist, and of rapidly

recognizing and joining common features located in different

subsequences. In addition, we contribute an effective segment-

based coarse-to-ﬁne SfM estimation algorithm for efﬁciently

and robustly handling large datasets. Experimental results on

several challenging and large video datasets demonstrate the

effectiveness of the proposed system.

Index Terms—Non-Consecutive Feature Tracking, Track

Matching, Structure-from-Motion, Bundle Adjustment.

I. INTRODUCTION

Large-scale 3D reconstruction [34], [23], [14], [13], [8]

ﬁnds many practical applications in, for example, Google

Earth and Microsoft Virtual Earth. Recent work primarily

relies on the SfM algorithms [16], [52], [2], [1], [48]

to automatically estimate 3D features given the input of

images or video collections.

Compared to images, videos contain denser geometrical

and structural information, and are the main source of SfM

in the movie and commercial industry. A common strategy

for video SfM estimation is by employing feature point

tracking [27], [38], [26], which takes care of the temporal

relationship among frames. It is also a basic tool for solving

a variety of computer vision problems, such as camera

tracking, video matching, and object recognition.

In this paper, we discuss two critical problems for

feature point tracking, which could seriously handicap SfM

especially for large-scale scene modeling. We propose new

methods to address them. One problem is the vulnerabil-

ity of feature tracking to object occlusions, illumination

change, noise, and large motion, which easily causes occa-

sional feature drop-out and distraction. This problem makes

robust feature tracking from long sequences challenging.

G. Zhang, H. Liu, Z. Dong and H. Bao are with the State Key

Lab of CAD&CG, Zhejiang University. G. Zhang is also afﬁliated with

Innovation Joint Research Center for Cyber-Physical-Society System, Zhe-

jiang University. Email: {zhangguofeng, zldong, bao}@cad.zju.edu.cn,

172753015@qq.com.

J. Jia and T.-T. Wong are with The Chinese University of Hong Kong.

Email: {leojia,ttwong}@cse.cuhk.edu.hk

Fig. 1. A large-scale “Garden” example. (a) Snapshots of the input

videos. (b) With the matched feature tracks, we register 3D points and

camera trajectories in a large-scale uniﬁed 3D system. Camera trajectories

are differently color-coded.

The other problem is the inability of sequential fea-

ture tracking to cope with feature matching over non-

consecutive subsequences. A typical scenario is that the

tracked object moves out and then re-enters the ﬁeld-

of-view, which yields two discontinuous subsequences

containing the same object. Although there are com-

mon features in the two subsequences, they cannot be

matched/included in a single track using conventional track-

ing methods. Addressing this issue can alleviate the drift

problem of SfM, which in turn beneﬁts high-quality 3D re-

construction as demonstrated in our experimental results. A

ıve solution is to exhaustively search all features, which

could consume much computation since many temporally

far away frames simply share no content.

We propose an efﬁcient non-consecutive feature tracking

(ENFT) framework which can effectively address the above

problems in two phases – that is, consecutive point track-

ing and non-consecutive track matching. We demonstrate

their signiﬁcance for SfM using a few challenging videos.

Consecutive point tracking detects and matches invariant

features in consecutive frames. A new matching strategy

is proposed to greatly increase the matching rate and

extend lifetime of the tracks. Then in non-consecutive

track matching, by rapidly computing a match matrix, a

set of disjoint subsequences with overlapping content can

arXiv:1510.08012v1 [cs.CV] 27 Oct 2015

be detected. Common feature tracks scattered over these

subsequences can also be reliably matched.

Our ENFT method can help reduce estimation errors

for long loopback sequences. Given limited memory, it is

generally intractable to use global bundle adjustment to

reﬁne camera poses and 3D points for very long sequences.

Iteratively applying local bundle adjustment is difﬁcult to

effectively distribute estimation errors to all frames. We

address this issue by adopting a segment-based coarse-to-

ﬁne SfM estimation algorithm, which globally optimizes

structure and motion only requiring limited memory.

Based on our ENFT algorithm and segment-based

coarse-to-ﬁne estimation scheme, we present a novel SfM

system called ENFT-SFM, which can effectively handle

long loopback sequences and even multiple sequences.

Fig. 1 shows a challenging example containing 6 sequences

with about 95, 476 frames in total in a large-scale scene.

Our system ﬁrst splits them to 37 shorter sequences, then

quickly computes many long and accurate feature tracks, ef-

ﬁciently estimates camera trajectories in different sequences

and accurately registers them in a uniﬁed 3D system, as

shown in Fig. 1(b). The whole process only takes about 90

minutes (excluding I/O) on a desktop PC, i.e., 17.7 FPS on

average. Our supplementary video

contains the complete

result.

A preliminary conference version paper was presented

in [51]. In this manuscript, we have made a number of

major modiﬁcations to improve robustness and efﬁciency.

Particularly, we altered the second-pass matching by for-

mulating it as minimizing an energy function incorpo-

rating two geometric constraints. We have developed an

enhanced non-consecutive track matching algorithm, which

can signiﬁcantly reduce the matching time and robustly

eliminate outliers. Finally, we proposed a novel segment-

based coarse-to-ﬁne SfM method, which can handle large

sequence datasets with only limited memory.

II. RELATED WORK

We review feature tracking and large-scale SfM methods

in this section.

A. Feature Matching and Tracking

For video tracking, sequential matchers are used for

establishing correspondences between consecutive frames.

Kanade-Lucas-Tomasi (KLT) tracker [27], [38], [50] is

widely used for small baseline matching. Other methods

detect image features and match them considering local

image patches [32], [35] or advanced descriptors [26], [29],

[28].

Both the KLT tracker and invariant feature algorithms

depend on modeling feature appearance, and can be dis-

tracted by occlusion, similar structures, and noise. Gen-

erally, sequential matchers cannot match non-consecutive

frames under image transformation. Scale-invariant feature

http://www.cad.zju.edu.cn/home/gfzhang/projects/tracking

/featuretracking/featuretracking.wmv

detection and matching algorithms [26], [3] are effective

in matching images with large transformation. But they

generally produce many short tracks in consecutive point

tracking due primarily to the global indistinctiveness and

feature dropout problems. In addition, invariant features are

relatively sensitive to image distortion. Although variations,

such as ASIFT [30], can improve matching performance

under substantial viewpoint change, computation overhead

signiﬁcantly increases owing to exhaustive viewpoint sim-

ulation.

In this paper, we propose a novel two-pass matching

method to address this problem. In [7], memory-based

tracking method was used to extend feature trajectories,

by matching each frame to its neighbors. However, if an

object re-enters the ﬁeld-of-view after a long period of

time, the size of the neighboring windows has to be very

large and the computation becomes expensive. Besides, this

method cannot cope with multiple videos. Our method does

not have this limitation, and the computation complexity is

approximately linear to the number of processed frames.

There are methods using invariant features for object

and location recognition in images/videos [41], [36], [18],

[37], [19]. These methods typically use the bag-of-words

technique to perform global localization and loop-closure

detection in an image classiﬁcation framework. For lo-

cation recognition, Nist

er and Stew

enius [33] proposed

using the feature descriptors to construct a vocabulary

tree, and computing an appearance vector for each input

image. Exhaustively comparing all image pairs is still time

consuming for a long sequence. Cummins and Newman [9]

proposed clustering similar images as a location, such that

the computation can be reduced to only comparing the

input image with previously visited locations. This method

reduces the number of frames to be compared, but could

perform less satisfyingly if consecutive frames have large

overlaps.

In addition, these methods divide the location recognition

and non-consecutive feature matching into two separated

phases [24], [6], [10], [17]. Because the match matrix by

bag-of-words only roughly reﬂects the match conﬁdence,

completely trusting it may lose many common features.

In this paper, we introduce a novel strategy where the

match matrix can be reﬁned and updated along with non-

consecutive feature matching. Our method can reliably and

efﬁciently match the common features even with a coarse

match matrix.

Engels et al. [11] proposed integrating wide-baseline

local features with the tracked ones to improve SfM. The

method creates small and independent submaps and links

them via feature recognition. This approach also cannot

produce many long and accurate point tracks. Short tracks

are not enough for drift-free SfM estimation. In compar-

ison, our method is effective in high-quality point track

estimation. We also address the ubiquitous nondistinctive

feature matching problem in dense frames. Similar to

[15], we utilize track descriptors, instead of the feature

descriptors, to reduce computation redundancy.

Wu et al. [49] proposed using dense 3D geometry infor-

剩余12页未读，继续阅读

评论收藏

内容反馈

zx2012zx2011

粉丝: 0
资源: 1

Feature Tracking for Robust Structure-from-Motion

Robust scale-adaptive meanshift for tracking

Robust fragments-based tracking with adaptive feature selection

Robust Object Tracking via Sparsity-based Collaborative Model

Slippage-robust Gaze Tracking for Near-eye Display

Robust Visual Tracking using L1 Minimization-译文

Time Tracking for Github Issues-crx插件

Tracking Based Structure and Motion Recovery for Augmented Video Productions

Robust On-line Beat Tracking with Kalman Filtering and probabilistic data association

Robust scale-adaptive mean-shift for tracking.rar_VOT2016_mean s

Discriminatively Trained Particle Filters for Complex Multi-Object Tracking

Robust fragments-based tracking using the integral histogram

Incremental Learning for Robust Visual Tracking.

Robust second-order consensus tracking of multiple 3-DOF laboratory helicopters via output feedback

Tracking7-7-21.rar_ Tracking7-7-21_帧 差法_帧差法图片_鐩爣璺熻釜

Tracking-Power-Control-Soft-Removal(TPC-SR)无线网络资源分配matlab仿真

A Twofold Siamese Network for Real-Time Object Tracking

演示-Robust Multi-Modality Multi-Object Tracking.pptx

讲稿_Robust Multi-Modality Multi-Object Tracking.docx

论文《Accurate scale estimation for robust visual tracking》代码

data-fusion-for-indoor-tracking-by-RFID-matlab画图资源

code-Robust Object Tracking via Sparsity-based Collaborative Model

稳健的铰接式ICP实时手势跟踪

论文研究-在线特征选择的目标跟踪.pdf

Gait-Tracking-With-x-IMU-master.rar_Gait Tracking_IMU_去除重力

Radiometric_Tracking_Techniques_for_Deep-Space_Navigation.pdf

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

最新资源

Tracking7-7-21.rar_ Tracking7-7-21_帧差法_帧差法图片_鐩爣璺熻釜