视觉里程入门VisualOdometrypart1资源-CSDN文库

vo,slam

5星 · 超过95%的资源需积分: 2 104 浏览量 2016-10-12 17:35:33 上传评论收藏 527KB PDF 举报

资源推荐

资源详情

资源评论

Visual Odometry

Part I: The First 30 Years and Fundamentals

By Davide Scaramuzza and Friedrich Fraundorfer

isual odometry (VO) is th e pro cess of estimating

the egomotion of an agent (e.g., vehicle, h uman,

and robot) using only the input of a single or

multiple cameras attached to it. Application domains

include robotics, wearable computing, augmented reality,

and automotive. The term VO was coined in 2004 by Nis-

ter in his landmark paper [1]. The term was chosen for its

similarity to wheel odometry, which incrementally esti-

mates the motion of a vehicle by integrating the number

of turns of its wheels over time. Likewise, VO operates by

incrementally estimating the pose of the vehicle through

examination of the changes that motion induces on the

images of its onboard cameras. For VO to w ork effec-

tively, there should be sufficient illumination in the envi-

ronment and a static scene with enough texture to allow

apparent motion to be extracted. Furthermore, consecu-

tive frames should be captured by ensuring that they have

sufficient scene overlap.

The advantage of VO with respect to wheel odometry is

that VO is not affected by wheel slip in uneven terrain or

other adverse conditions. It has been demonstrated that

compared to wheel odometry, VO provides more accurate

trajectory estimates, with relative position error ranging

from 0.1 to 2%. This capability makes VO an interesting

supplement to wheel odometry and, additionally, to other

navigation systems such as global positioning system

(GPS), inertial measurement units (IMUs), and laser

odometry (similar to VO, laser odometry estimates the

egomotion of a vehicle by scan-matching of consecutive

laser scans). In GPS-denied environments, such as under-

water and aerial, VO has utmost importance.

This two-part tutorial and survey provides a broad

introduction to VO and the research that has been under-

taken from 1980 to 2011. Although the first two decades

witnessed many offline implementations, only in the third

decade did real-time working systems flourish, which has

led VO to be used on another planet by two Mars-explora-

tion rovers for the first time. Part I (this tutorial) presents a

historical review of the first 30 years of research in this field

and its fundamentals. After a brief discussion on camera

80 •

IEEE ROBOTICS & AUTOMATION MAGAZINE

•

DECEMBER 2011

1070-9932/11/$26.00ª2011 IEEE

Digital Object Identifier 10.1109/MRA.2011.943233

Date of publication: 8 December 2011

modeling and calibration, it describes the main motion-

estimation pipelines for both monocular and binocular

scheme, outlining pros and cons of each implementation.

Part II will deal with feature matching, robustness, and

applications. It will review the main point-feature detectors

used in VO and the different outlier-rejection schemes. Par-

ticular emphasis will be given to the random sample consen-

sus (RANSAC), and the distinct tricks devised to speed it up

will be discussed. Other topics covered will be error model-

ing, location recognition (or loop-closure detection), and

bundle adjustment.

This tutorial provides both the experienced and non-

expert user with guidelines an d references to algorithms

to build a complete VO system. Since an ideal an d unique

VO solution for every possible working environment

does not exist, the optimal solution should be chosen

carefully according to the specific navigation environ-

ment and the given computational resources.

History of Visual Odometry

The problem of recovering relative camera poses and

three-dimensional (3-D) structure from a set of camera

images (calibrated or non calibrated) is known in the

computer vision community as structure from motion

(SFM). Its origins can be dated back to wor ks such as [2]

and [3]. VO is a particular case of SFM. SFM is more gen-

eral and tackles the problem of 3-D reconstruction of

both the structure and camera poses from sequentially

ordered or unordered image sets. The final structure and

camera poses are typically refined with an offline optimi-

zation (i.e., bundle adjustment), whose computation time

grows with the number of images [4]. Conversely, VO

focuses on estimating the 3-D motion of the camera

sequentially—as a new frame arrives—and in real time.

Bundle adjustment can be used to refine the local estimate

of the trajectory.

The problem of estimating a vehicle’s egomotion from

visual input alone started in the early 1980s and was

described by Moravec [5]. It is interesting to observe that

most of the early research in VO [5]–[9] was done for

planetary rovers and was motivated by the NASA Mars

exploration program in the endeavor to provide all-terrain

rovers with the capability to measure their 6-degree-of-

freedom (DoF) motion in the presence of wheel slippage in

uneven and rough terrains.

The work of Moravec stands out not only for present-

ing the first motion-estimation pipeline —whose main

functioning blocks are still used today—but also for

describing one of the earliest corner detectors (after the

first one proposed in 1974 by Hannah [10]) which is

known today as the Moravec corner detector [11], a prede-

cessor of the one proposed by Forstner [12] and Harris

and Stephens [3], [82].

Moravec tested his work on a planetary rover equipped

with what he termed a slider stereo: a single camera sliding

on a rail. The robot moved in a stop-and-go fashion,

digitizing and analyzing images at every location. At each

stop, the camera slid horizontally taking nine pictures at

equidistant intervals. Corners were detected in an image

using his operator and matched along the epipolar lines of

the other eight frames using normalized cross correlation.

Potential matches at the next robot

locations were found again by correla-

tion using a coarse-to-fine strategy to

account for large-scale changes. Out-

liers were subsequently removed by

checking for depth inconsistencies in

the eight stereo pairs. Finally, motion

was computed as the rigid body

transformation to align the triangu-

lated 3-D points seen at two consecu-

tive robot positions. The system of

equation was solved via a weighted

least square, where the weights were

inversely proportional to the dis-

tance from the 3-D point.

Although Moravec used a single sliding camera, his

work belongs to the class of stereo VO algorithms. This

terminology accounts for the fact that the relative 3-D

position of the features is directly measured by triangula-

tion at every robot location and used to derive the relative

motion. Trinocular methods belong to the same class of

algorithms. The alternative to stereo vision is to use a

single camera. In this case, only bearing information is

available. The disadvantage is that motion can only be

recovered up to a scale factor. The absolute scale can then

be determined from direct measurements (e.g., measuring

the size of an element in the scene), motion constraints, or

from the integration with other sensors, such as IMU, air-

pressure, and range sensors. The interest in monocular

methods is due to the observation that stereo VO can

degenerate to the monocular case when the distance to the

scene is much larger than the stereo baseline (i.e., the dis-

tance between the two cameras). In this case, stereo vision

becomes ineffective and monocular methods must be used.

Over the years, monocular and stereo VOs have almost

progressed as two independent lines of research. In the

remainder of this section, we have surveyed the related

work in these fields.

Stereo VO

Most of the research done in VO has been produced using

stereo cameras. Building upon Moravec’s work, Matthies

and Shafer [6], [7] used a binocular system and Moravec’s

procedure for detecting and tracking corners. Instead of

using a scalar representation of the uncertainty as Moravec

did, they took advantage of the error covariance matrix of

the triangulated features and incorporated it into the

motion estimation step. Compared to Moravec, they dem-

onstrated superior results in trajectory recovery for a

planetary rover, with 2% relative error on a 5.5-m path.

Olson et al. [9], [13] later extended that work by

•

The advantage of VO

with respect to wheel

odometry is that VO is

not affected by wheel

slip in uneven terrain

or other adverse

conditions.

•

DECEMBER 2011

•

IEEE ROBOTICS & AUTOMATION MAGAZINE

•

introducing an absolute orientation sensor (e.g., compass

or omnidirectional camera) and using the Forstner corner

detector, which is significantly faster to compute than

Moravec’s operator. They showed that

the use of camera egomotion estimates

alone results in accumulation errors

with superlinear growth in the distance

traveled, leading to increased orienta-

tion errors. Conversely, when an abso-

lute orientation sensor is incorporated,

the error growth can be reduced to a

linear function of the distance traveled.

This led them to a relative position

error of 1:2% on a 20-m path.

Lacroix et al. [8] implemented a

stereo VO approach for planetary rovers similar to those

explained earlier. The differenc e lies in the selection of

key points. Instead of using the Forstner detector, they

used dense stereo and, then, se lected the candidate key

points by analyzing the correlation function around its

peaks—an approach that was later exploited in [14], [15],

and other works. This choice was based on the observa-

tion that there is a strong correlation between the shape

of the correlation curve and the standard deviation of the

feature depth. This observation was later used by Cheng

et al. [16], [17] in their final VO implementation onboard

the Mars rovers. They improved on the earlier implemen-

tation by Olson et al. [9], [13] in two areas. First, after

using the Harris corner detector, they utilized the curva-

ture of the correlation function around the feature— as

proposed by Lacroix e t al.— to define the error covariance

matrix of the image point. Second, as proposed by Nister

et al. [1], they used the random sample consensus (RAN-

SAC) RANSAC [18] in the le ast-squares motion estima-

tion step for outlier rejection.

A different approach to motion estimation and outlier

removal for an all-terrain rover was proposed by Milella

and Siegwart [14]. They used the Shi-Tomasi approach

[19] for corner detection, and similar to Lacroix, they

retained those points with high confidence in the stereo

disparity map. Motion estimation was then solved by first

using least squares, as in the methods earlier, and then the

iterative closest point (ICP) algorithm [20]—an algorithm

popular for 3-D registration of laser scans—for pose

refinement. For robustness, an outlier removal stage was

incorporated into the ICP.

The works mentioned so far have in common that the

3-D points are triangulated for every stereo pair, and the

relative motion is solved as a 3-D-to-3-D point registration

(alignment) problem. A completely different approach was

proposed in 2004 by Nister et al. [1]. Their paper is known

not only for coining the term VO but also for providing

the first real-time long-run implementation with a robust

outlier rejection scheme. Nister et al. improved the earlier

implementations in several areas. First, contrary to all

previous works, they did not track features among frames

but detected features (Harris corners) independently in all

frames and only allowed matches between features. This

has the benefit of avoiding feature drift during cross-corre-

lation-based tracking. Second, they did not compute the

relative motion as a 3-D-to-3-D point registration problem

but as a 3-D-to-two-dimensional (2-D) camera-pose estima-

tion problem (these methods are described in the “Motion

Estimation” section). Finally, they incorporated RANSAC

outlier rejection into the motion estimation step.

A different motion estimation scheme was introduced

by Comport et al. [21]. Instead of using 3-D-to-3-D point

registration or 3-D-to-2-D camera-pose estimation tech-

niques, they relied on the quadrifocal tensor, which

allows motion to be computed from 2-D-to-2-D image

matches without having to triangulate 3-D points in any of

the stereo pairs. The benefit of using directly raw 2-D points

in lieu of triangulated 3-D points lays in a more accurate

motion computation.

Monocular VO

The difference from the stereo scheme is that in the

monocular VO, both the relative motion and 3-D structure

must be computed from 2-D bearing data. Since the abso-

lute scale is unknown, the distance between the first two

camera poses is usually set to one. As a new image arrives,

the relative scale and camera pose with respect to the first

two frames are determined using either the knowledge of

3-D structure or the trifocal tensor [22].

Successful results with a single camera over long distan-

ces (up to several kilometers) have been obtained in the

last decade using both perspective and omnidirectional

cameras [23]–[29]. Related works can be divided into three

categories: feature-based methods, appearance-based meth-

ods, and hybrid methods. Feature-based methods are based

on salient and repeatable features that are tracked over the

frames; appearance-based methods use the intensity infor-

mation of all the pixels in the image or subregions of it; and

hybrid methods use a combination of the previous two.

In the first category are the works by the authors in [1],

[24], [25], [27], and [30]–[32]. The first real-time, large-

scale VO with a single camera was presented by Nister et

al. [1]. They used RANSAC for outlier rejection and 3-D-

to-2-D camera-pose estimation to compute the new

upcoming camera pose. The novelty of their paper is the

use of a five-point minimal solver [33] to calculate the

motion hypotheses in RANSAC. After that paper, five-

point RANSAC became very popular in VO and was used

in several other works [23], [25], [27]. Corke et al. [24]

provided an approach for monocular VO based on omni-

directional imagery from a catadioptric camera and optical

flow. Lhuillier [25] and Mouragnon et al. [30] presented an

approach based on local windowed-bundle adjustment to

recover both the motion and the 3-D map (this means that

bundle adjustment is performed over a window of the last

m frames). Again, they used the five-point RANSAC in

[33] to remove the outliers. Tardif et al. [27] presented an

•

Keyframe selection is

averyimportantstep

in VO and should

always be done before

updating the motion.

•

82 •

IEEE ROBOTICS & AUTOMATION MAGAZINE

•

DECEMBER 2011

•

剩余12页未读，继续阅读

评论收藏

内容反馈

ls_故乡的原风景

2018-12-24

不错，很清晰的文档

langsunny

粉丝: 0
资源: 7

视觉里程入门Visual Odometry part1

最新资源

视觉里程入门Visual Odometry part1

visual_odometry:在KITTI数据集上进行视觉里程计，实现了单眼2D-2D和立体声2D-3D

使用连续光线扭曲和体积对比度最大化的事件摄影机的视觉里程计_Visual Odometry with an Event Came

Consistency Analysis for Sliding-Window Visual Odometry

视觉里程入门Visual Odometry part2

Keyframe-Based Visual-Inertial Odometry Using Nonlinear Optimization.pdf

Monocular-Visual-Odometry-Emgu:轮式机器人单目视觉里程计算法的C#Emgu实现

Stereo-Visual-Odometry-Plotting-An-Objects-Trajectory-Using-A-Sequence-Of-Images:计算机视觉最终项目-使用车辆顶部的摄像头捕获的一系列图像来绘制车辆的轨迹

Deep_Visual_Inertial_Odometry:视觉惯性测深的深度学习

ist的matlab代码-Visual-Inertial-Odometry:用于移动平台（例如机器人）姿态估计的视觉惯性测距法的MSCKF算法

Dense Visual Odometry and SLAM

视觉里程计

MATLAB Implementation of Visual Odometry using SOFT algorithm

KITTI_visual_odometry:在带有OpenCVPython中使用KITTI里程表数据集的教程。 包括对计算机视觉基础知识的回顾

VSO-Visual-Semantic-Odometry.pdf

03_odometry视觉－雷达里程计1

High-Precision, Consistent EKF-based Visual-Inertial Odometry.pdf

On-Manifold Preintegration for Real-Time Visual-Inertial Odometry.pdf

Selective Sensor Fusion for Neural Visual-Inertial Odometry.pdf

Robust Visual Inertial Odometry Using a Direct EKF-Based Approach

Accurate Direct Visual-Lidar Odometry.pdf

Dealing with the Structured Scene in Visual Odometry(VO): Incomplete SURF

monocular-visual-odometry:使用单眼相机进行视觉里程计的实验。 计划最终尝试深度学习功能

python大作业 含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar

仿真电路以及操作方法

【纯干货啊】华为IPD流程管理(完整版).pptx

可编程语言标准IEC61131-3中文版.pdf

OFDM完整仿真过程与教程.zip

最新资源

KITTI_visual_odometry:在带有OpenCVPython中使用KITTI里程表数据集的教程。包括对计算机视觉基础知识的回顾

monocular-visual-odometry:使用单眼相机进行视觉里程计的实验。计划最终尝试深度学习功能

python大作业含爬虫、数据可视化、地图、报告、及源码（整和为一个文件）（2014-2020全国各地区原油加工量）.rar