没有合适的资源?快使用搜索试试~ 我知道了~
视觉里程入门Visual Odometry part1
5星 · 超过95%的资源 需积分: 2 68 下载量 104 浏览量
2016-10-12
17:35:33
上传
评论
收藏 527KB PDF 举报
温馨提示
试读
13页
Visual Odometry Part I: The First 30 Years and Fundamentals
资源推荐
资源详情
资源评论
Visual Odometry
Part I: The First 30 Years and Fundamentals
By Davide Scaramuzza and Friedrich Fraundorfer
V
isual odometry (VO) is th e pro cess of estimating
the egomotion of an agent (e.g., vehicle, h uman,
and robot) using only the input of a single or
multiple cameras attached to it. Application domains
include robotics, wearable computing, augmented reality,
and automotive. The term VO was coined in 2004 by Nis-
ter in his landmark paper [1]. The term was chosen for its
similarity to wheel odometry, which incrementally esti-
mates the motion of a vehicle by integrating the number
of turns of its wheels over time. Likewise, VO operates by
incrementally estimating the pose of the vehicle through
examination of the changes that motion induces on the
images of its onboard cameras. For VO to w ork effec-
tively, there should be sufficient illumination in the envi-
ronment and a static scene with enough texture to allow
apparent motion to be extracted. Furthermore, consecu-
tive frames should be captured by ensuring that they have
sufficient scene overlap.
The advantage of VO with respect to wheel odometry is
that VO is not affected by wheel slip in uneven terrain or
other adverse conditions. It has been demonstrated that
compared to wheel odometry, VO provides more accurate
trajectory estimates, with relative position error ranging
from 0.1 to 2%. This capability makes VO an interesting
supplement to wheel odometry and, additionally, to other
navigation systems such as global positioning system
(GPS), inertial measurement units (IMUs), and laser
odometry (similar to VO, laser odometry estimates the
egomotion of a vehicle by scan-matching of consecutive
laser scans). In GPS-denied environments, such as under-
water and aerial, VO has utmost importance.
This two-part tutorial and survey provides a broad
introduction to VO and the research that has been under-
taken from 1980 to 2011. Although the first two decades
witnessed many offline implementations, only in the third
decade did real-time working systems flourish, which has
led VO to be used on another planet by two Mars-explora-
tion rovers for the first time. Part I (this tutorial) presents a
historical review of the first 30 years of research in this field
and its fundamentals. After a brief discussion on camera
80 •
IEEE ROBOTICS & AUTOMATION MAGAZINE
•
DECEMBER 2011
1070-9932/11/$26.00ª2011 IEEE
Digital Object Identifier 10.1109/MRA.2011.943233
Date of publication: 8 December 2011
© DIGITAL VISION
modeling and calibration, it describes the main motion-
estimation pipelines for both monocular and binocular
scheme, outlining pros and cons of each implementation.
Part II will deal with feature matching, robustness, and
applications. It will review the main point-feature detectors
used in VO and the different outlier-rejection schemes. Par-
ticular emphasis will be given to the random sample consen-
sus (RANSAC), and the distinct tricks devised to speed it up
will be discussed. Other topics covered will be error model-
ing, location recognition (or loop-closure detection), and
bundle adjustment.
This tutorial provides both the experienced and non-
expert user with guidelines an d references to algorithms
to build a complete VO system. Since an ideal an d unique
VO solution for every possible working environment
does not exist, the optimal solution should be chosen
carefully according to the specific navigation environ-
ment and the given computational resources.
History of Visual Odometry
The problem of recovering relative camera poses and
three-dimensional (3-D) structure from a set of camera
images (calibrated or non calibrated) is known in the
computer vision community as structure from motion
(SFM). Its origins can be dated back to wor ks such as [2]
and [3]. VO is a particular case of SFM. SFM is more gen-
eral and tackles the problem of 3-D reconstruction of
both the structure and camera poses from sequentially
ordered or unordered image sets. The final structure and
camera poses are typically refined with an offline optimi-
zation (i.e., bundle adjustment), whose computation time
grows with the number of images [4]. Conversely, VO
focuses on estimating the 3-D motion of the camera
sequentially—as a new frame arrives—and in real time.
Bundle adjustment can be used to refine the local estimate
of the trajectory.
The problem of estimating a vehicle’s egomotion from
visual input alone started in the early 1980s and was
described by Moravec [5]. It is interesting to observe that
most of the early research in VO [5]–[9] was done for
planetary rovers and was motivated by the NASA Mars
exploration program in the endeavor to provide all-terrain
rovers with the capability to measure their 6-degree-of-
freedom (DoF) motion in the presence of wheel slippage in
uneven and rough terrains.
The work of Moravec stands out not only for present-
ing the first motion-estimation pipeline —whose main
functioning blocks are still used today—but also for
describing one of the earliest corner detectors (after the
first one proposed in 1974 by Hannah [10]) which is
known today as the Moravec corner detector [11], a prede-
cessor of the one proposed by Forstner [12] and Harris
and Stephens [3], [82].
Moravec tested his work on a planetary rover equipped
with what he termed a slider stereo: a single camera sliding
on a rail. The robot moved in a stop-and-go fashion,
digitizing and analyzing images at every location. At each
stop, the camera slid horizontally taking nine pictures at
equidistant intervals. Corners were detected in an image
using his operator and matched along the epipolar lines of
the other eight frames using normalized cross correlation.
Potential matches at the next robot
locations were found again by correla-
tion using a coarse-to-fine strategy to
account for large-scale changes. Out-
liers were subsequently removed by
checking for depth inconsistencies in
the eight stereo pairs. Finally, motion
was computed as the rigid body
transformation to align the triangu-
lated 3-D points seen at two consecu-
tive robot positions. The system of
equation was solved via a weighted
least square, where the weights were
inversely proportional to the dis-
tance from the 3-D point.
Although Moravec used a single sliding camera, his
work belongs to the class of stereo VO algorithms. This
terminology accounts for the fact that the relative 3-D
position of the features is directly measured by triangula-
tion at every robot location and used to derive the relative
motion. Trinocular methods belong to the same class of
algorithms. The alternative to stereo vision is to use a
single camera. In this case, only bearing information is
available. The disadvantage is that motion can only be
recovered up to a scale factor. The absolute scale can then
be determined from direct measurements (e.g., measuring
the size of an element in the scene), motion constraints, or
from the integration with other sensors, such as IMU, air-
pressure, and range sensors. The interest in monocular
methods is due to the observation that stereo VO can
degenerate to the monocular case when the distance to the
scene is much larger than the stereo baseline (i.e., the dis-
tance between the two cameras). In this case, stereo vision
becomes ineffective and monocular methods must be used.
Over the years, monocular and stereo VOs have almost
progressed as two independent lines of research. In the
remainder of this section, we have surveyed the related
work in these fields.
Stereo VO
Most of the research done in VO has been produced using
stereo cameras. Building upon Moravec’s work, Matthies
and Shafer [6], [7] used a binocular system and Moravec’s
procedure for detecting and tracking corners. Instead of
using a scalar representation of the uncertainty as Moravec
did, they took advantage of the error covariance matrix of
the triangulated features and incorporated it into the
motion estimation step. Compared to Moravec, they dem-
onstrated superior results in trajectory recovery for a
planetary rover, with 2% relative error on a 5.5-m path.
Olson et al. [9], [13] later extended that work by
•
The advantage of VO
with respect to wheel
odometry is that VO is
not affected by wheel
slip in uneven terrain
or other adverse
conditions.
•
DECEMBER 2011
•
IEEE ROBOTICS & AUTOMATION MAGAZINE
•
81
•
introducing an absolute orientation sensor (e.g., compass
or omnidirectional camera) and using the Forstner corner
detector, which is significantly faster to compute than
Moravec’s operator. They showed that
the use of camera egomotion estimates
alone results in accumulation errors
with superlinear growth in the distance
traveled, leading to increased orienta-
tion errors. Conversely, when an abso-
lute orientation sensor is incorporated,
the error growth can be reduced to a
linear function of the distance traveled.
This led them to a relative position
error of 1:2% on a 20-m path.
Lacroix et al. [8] implemented a
stereo VO approach for planetary rovers similar to those
explained earlier. The differenc e lies in the selection of
key points. Instead of using the Forstner detector, they
used dense stereo and, then, se lected the candidate key
points by analyzing the correlation function around its
peaks—an approach that was later exploited in [14], [15],
and other works. This choice was based on the observa-
tion that there is a strong correlation between the shape
of the correlation curve and the standard deviation of the
feature depth. This observation was later used by Cheng
et al. [16], [17] in their final VO implementation onboard
the Mars rovers. They improved on the earlier implemen-
tation by Olson et al. [9], [13] in two areas. First, after
using the Harris corner detector, they utilized the curva-
ture of the correlation function around the feature— as
proposed by Lacroix e t al.— to define the error covariance
matrix of the image point. Second, as proposed by Nister
et al. [1], they used the random sample consensus (RAN-
SAC) RANSAC [18] in the le ast-squares motion estima-
tion step for outlier rejection.
A different approach to motion estimation and outlier
removal for an all-terrain rover was proposed by Milella
and Siegwart [14]. They used the Shi-Tomasi approach
[19] for corner detection, and similar to Lacroix, they
retained those points with high confidence in the stereo
disparity map. Motion estimation was then solved by first
using least squares, as in the methods earlier, and then the
iterative closest point (ICP) algorithm [20]—an algorithm
popular for 3-D registration of laser scans—for pose
refinement. For robustness, an outlier removal stage was
incorporated into the ICP.
The works mentioned so far have in common that the
3-D points are triangulated for every stereo pair, and the
relative motion is solved as a 3-D-to-3-D point registration
(alignment) problem. A completely different approach was
proposed in 2004 by Nister et al. [1]. Their paper is known
not only for coining the term VO but also for providing
the first real-time long-run implementation with a robust
outlier rejection scheme. Nister et al. improved the earlier
implementations in several areas. First, contrary to all
previous works, they did not track features among frames
but detected features (Harris corners) independently in all
frames and only allowed matches between features. This
has the benefit of avoiding feature drift during cross-corre-
lation-based tracking. Second, they did not compute the
relative motion as a 3-D-to-3-D point registration problem
but as a 3-D-to-two-dimensional (2-D) camera-pose estima-
tion problem (these methods are described in the “Motion
Estimation” section). Finally, they incorporated RANSAC
outlier rejection into the motion estimation step.
A different motion estimation scheme was introduced
by Comport et al. [21]. Instead of using 3-D-to-3-D point
registration or 3-D-to-2-D camera-pose estimation tech-
niques, they relied on the quadrifocal tensor, which
allows motion to be computed from 2-D-to-2-D image
matches without having to triangulate 3-D points in any of
the stereo pairs. The benefit of using directly raw 2-D points
in lieu of triangulated 3-D points lays in a more accurate
motion computation.
Monocular VO
The difference from the stereo scheme is that in the
monocular VO, both the relative motion and 3-D structure
must be computed from 2-D bearing data. Since the abso-
lute scale is unknown, the distance between the first two
camera poses is usually set to one. As a new image arrives,
the relative scale and camera pose with respect to the first
two frames are determined using either the knowledge of
3-D structure or the trifocal tensor [22].
Successful results with a single camera over long distan-
ces (up to several kilometers) have been obtained in the
last decade using both perspective and omnidirectional
cameras [23]–[29]. Related works can be divided into three
categories: feature-based methods, appearance-based meth-
ods, and hybrid methods. Feature-based methods are based
on salient and repeatable features that are tracked over the
frames; appearance-based methods use the intensity infor-
mation of all the pixels in the image or subregions of it; and
hybrid methods use a combination of the previous two.
In the first category are the works by the authors in [1],
[24], [25], [27], and [30]–[32]. The first real-time, large-
scale VO with a single camera was presented by Nister et
al. [1]. They used RANSAC for outlier rejection and 3-D-
to-2-D camera-pose estimation to compute the new
upcoming camera pose. The novelty of their paper is the
use of a five-point minimal solver [33] to calculate the
motion hypotheses in RANSAC. After that paper, five-
point RANSAC became very popular in VO and was used
in several other works [23], [25], [27]. Corke et al. [24]
provided an approach for monocular VO based on omni-
directional imagery from a catadioptric camera and optical
flow. Lhuillier [25] and Mouragnon et al. [30] presented an
approach based on local windowed-bundle adjustment to
recover both the motion and the 3-D map (this means that
bundle adjustment is performed over a window of the last
m frames). Again, they used the five-point RANSAC in
[33] to remove the outliers. Tardif et al. [27] presented an
•
Keyframe selection is
averyimportantstep
in VO and should
always be done before
updating the motion.
•
82 •
IEEE ROBOTICS & AUTOMATION MAGAZINE
•
DECEMBER 2011
•
剩余12页未读,继续阅读
资源评论
- ls_故乡的原风景2018-12-24不错,很清晰的文档
langsunny
- 粉丝: 0
- 资源: 7
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Screenshot_20240427_031602.jpg
- 网页PDF_2024年04月26日 23-46-14_QQ浏览器网页保存_QQ浏览器转格式(6).docx
- 直接插入排序,冒泡排序,直接选择排序.zip
- 在排序2的基础上,再次对快排进行优化,其次增加快排非递归,归并排序,归并排序非递归版.zip
- 实现了7种排序算法.三种复杂度排序.三种nlogn复杂度排序(堆排序,归并排序,快速排序)一种线性复杂度的排序.zip
- 冒泡排序 直接选择排序 直接插入排序 随机快速排序 归并排序 堆排序.zip
- 课设-内部排序算法比较 包括冒泡排序、直接插入排序、简单选择排序、快速排序、希尔排序、归并排序和堆排序.zip
- Python排序算法.zip
- C语言实现直接插入排序、希尔排序、选择排序、冒泡排序、堆排序、快速排序、归并排序、计数排序,并带图详解.zip
- 常用工具集参考用于图像等数据处理
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功