SLAM论文集锦_视觉slam经典论文资源-CSDN文库

共18个文件

pdf：17个

docx：1个

slam

需积分: 50 191 浏览量 2017-10-23 22:59:47 上传评论 4 收藏 59.92MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

典型slam系统论文.zip （18个子文件）

各种典型slam系统论文

ORB-SLAM

ORB-SLAM2-2016-an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras.pdf 3.93MB

ORB-SLAM-2015-a Versatile and Accurate Monocular SLAM System.pdf 4.01MB

ORB-SLAM-2015-精确多功能单目SLAM系统-中文翻译.docx 3.34MB

PTAM2007-Parallel Tracking and Mapping for Small AR Workspaces-KleinMurray2007ISMAR.pdf 1.54MB

DVO SLAM-Dense Visual SLAM for RGB-D Cameras-kerl2013iros.pdf 1.52MB

ElasticFusion2015

ElasticFusion_ Dense SLAM Without A Pose Graph-whelan2015rss.pdf 3.62MB

ElasticFusion_ Real-Time Dense SLAM and Light Source Estimation-Whelan2016ijrr.pdf 6.97MB

SVO-2014-Fast semi-direct monocular visual odometry.pdf 1.55MB

RGB-D SLAM

RGB-D slam-2014-3-D Mapping With an RGB-D Camera-2014-06594910.pdf 777KB

RGB-D SLAM-2012-An Evaluation of the RGB-D SLAM System-endres12icra.pdf 1.38MB

其他

前RGB-D Mapping Using Depth Cameras for Dense 3D Modeling of Indoor Environments.pdf 4.03MB

A Benchmark for the Evaluation of RGB-D SLAM Systems2012.pdf 1.41MB

后RGB-D mapping Using Kinect-style.pdf 6.69MB

Real-time dense appearance-based SLAM for RGB-D sensors2011.pdf 1.49MB

MonoSLAM

MonoSLAM-2007-Andrew Davison_etal_pami2007.pdf 9.34MB

Monocular SLAM-2008-Inverse Depth Parametrization for Monocular SLAM-2008.pdf 1.31MB

DTAM2011-Dense Tracking and Mapping in Real-Time-newcombe_davison__2011__dtam.pdf 6.33MB

KinectFusion-2011-Real-time dense surface mapping and tracking.pdf 2.7MB

MonoSLAM: Real-Time Single Camera SLAM

Andrew J. Davison, Ian D. Reid, Member, IEEE, Nicholas D. Molton, and

Olivier Stasse, Member, IEEE

Abstract—We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a

previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from

mobile robotics to the “pure vision” domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible

to Structure from Motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural

landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the

use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation

estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera

hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We

present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live

augmented reality with a hand-held camera.

Index Terms—Autonomous vehicles, 3D/stereo scene analysis, tracking.

1INTRODUCTION

HE last 10 years have seen significant progress in

autonomous robot navigation and, specifically, Simulta-

neous Localization and Mapping (SLAM) has become well-

defined in the robotics community as the question of a moving

sensor platform constructing a representation of its environ-

ment on the fly while concurrently estimating its ego-motion.

SLAM is today is routinely achieved in experimental robot

systems using modern methods of sequential Bayesian

inference and SLAM algorithms are now starting to cross

over into practical systems. Interestingly, however, an d

despite the large computer vision research community, until

very recently the use of cameras has not been at the center of

progress in robot SLAM, with much more attention given to

other sensors such as laser range-finders and sonar.

This may seem surprising since for many reasons vision is

an attractive choice of SLAM sensor: cameras are compact,

accurate , noninvasive , and well -understood—and today

cheap and ubiquitous. Vision, of course, also has great

intuitive appeal as the sense humans and animals primarily

use to navigate. However, cameras capture the world’s

geometry only indirectly through photometric effects and it

was thought too difficult to turn the sparse sets of features

popping out of an image into rel iable long-term maps

generated in real-time, particularly since the data rates

coming from a camera are so much higher than those from

other sensors.

Instead, vision researchers concentrated on reconstruc-

tion problems from small image sets, developing the field

known as Structure from Motion (SFM). SFM algorithms

have been extended to work on longer image sequences,

(e.g., [1], [2], [3]), but these systems are fundamentally

offline in nature, analyzing a complete image sequence to

produce a reconstruction of the camera trajectory and scene

structure observed. To obtain globally consistent estimates

over a sequence, local motion estimates from frame-to-

frame feature matching are refined in a global optimization

moving backward and forward through the whole sequence

(called bundle adjustment). These methods are perfectly

suited to the automatic analysis of short image sequences

obtained from arbitrary sources—movie shots, consumer

video, or even decades-old archive footage—but do not scale

to consistent localization over arbitrarily long sequences in

real time.

Our work is highly focused on high frame-rate real-time

performance (typically 30Hz) as a requirement. In applica-

tions, real-time algorithms are necessary only if they are to

be used as part of a loop involving other components in the

dynamic world—a robot that must control its next motion

step, a human that needs visual feedback on his actions or

another computational process which is waiting for input.

In these cases, the most immediately useful information to

be obtained from a moving camera in real time is where it is,

rather than a fully detailed “final result” map of a scene

ready for display. Although localization and mapping are

intricately coupled problems and it has been proven in

SLAM research that solving either requires solving both, in

this work we focus on localization as the main output of

interest. A map is certainly built, but it is a sparse map of

landmarks optimized toward enabling localization.

Further, real-time camera tracking scenarios will often

involve extended and looping motions within a restricted

environment (as a humanoid performs a task, a domestic

robot cleans a home, or room is viewed from different angles

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007 1

. A.J. Davison is with the Department of Computing, Imperial College, 180

Queen’s Gate, SW7 2AZ, London, UK. E-mail: ajd@doc.ic.ac.uk.

. I.D. Reid is with the Robotics Research Group, Department of Engineering

Science, University of Oxford, OX1 3PJ, UK. E-mail: ian@robots.ox.ac.uk.

. N.D. Molton is with Imagineer Systems Ltd., The Surrey Technology

Centre, 40 Occam Road, The Surrey Research Park, Guildford GU2 7YG,

UK. E-mail: ndm@imagineersystems.com.

. O. Stasse is with the Joint Japanese-French Robotics Laboratory (JRL),

CNRS/AIST, AIST Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-

8568, Japan. E-mail: olivier.stasse@aist.go.jp.

Manuscript received 13 Dec. 2005; revised 29 June 2006; accepted 6 Sept.

2006; published online 18 Jan. 2007.

Recommended for acceptance by C. Taylor.

For information on obtaining reprints of this article, please send e-mail to:

tpami@computer.org, and reference IEEECS Log Number TPAMI-0705-1205.

Digital Object Identifier no. 10.1109/TPAMI.2007.1049.

0162-8828/07/$25.00 ß 2007 IEEE Published by the IEEE Computer Society

with graphical augmentations). Repeatable localization, in

which gradual drift from ground truth does not occur, will be

essential here and much more important than in cases where a

moving camera continually explores new regions without

returning. This is where our fully-probabilistic SLAM

approach comes into its own: It will naturally construct a

persistent map of scene landmarks to be referenced indefi-

nitely in a state-based framework and permit loop closures to

correct long-term drift. Forming a persistent world map

means that if camera motion is restricted, the processing

requirement of the algorithm is bounded and continuous

real-time operation can be maintained, unlike in tracking

approaches such as [4], where loop-closing corrections are

achieved by matching to a growing history of past poses.

1.1 Contributions of This Paper

Our key contribution is to show that it is indeed possible to

achieve real-time localization and mapping with a single

freely moving camera as the only data source. We achieve this

by applying the core of the probabilistic SLAM methodology

with novel insights specific to what here is a particularly

difficult SLAM scenario. The MonoSLAM algorithm we

explain and demonstrate achieves the efficiency required

for real-time operation by using an active, guided approach to

feature mapping and measurement, a general motion model

for smooth 3D camera movement to capture the dynamical

prior information inherent in a continuous video stream and a

novel top-down solution to the problem of monocular feature

initialization.

In a nutshell, when compared to SFM approaches to

sequence analysis, using SLAM we are able both to

implement on-the-fly probabilistic estimation of the state

of the moving camera and its map and benefit from this in

using the running estimates to guide efficient processing.

This aspect of SLAM is often overlooked. Sequential SLAM

is very naturally able for instance to select a set of highly

salient and trackable but efficiently spaced features to put

into its visual map, with the use of only simple mapping

heuristics. Sensible confidence bound assumptions allow all

but the most important image processing to be avoided and

at high frame-rates all but tiny search regions of incoming

images are completely ignored by our algorithm. Our

approach to mapping can be summarized as “a sparse map

of high quality features.”

In this paper, we are able to demonst rate real-time

MonoSLAM indoors in room-sized domains. A long term

goal in SLAM shared by many would be to achieve a system

with the following performance: A single low-cost camera

attached to a portable computer would be switched on at an

arbitrary location in an unknown scene, then carried off by a

fast-moving robot (perhaps flying or jumping) or even a

running human through an arbitrarily large domain, all the

time effortlessly recovering its trajectory in real time and

building a detailed, persistent map of all it has seen. While

others attack the large map issue, but continue to work with

the same slow-moving robots and multisensor platforms as

before, we are approaching the problem from the other

direction and solve issues relating to highly dynamic

3D motion, commodity vision-o nly sensing, p rocessing

efficiency and relaxing platform assumptions. We believe

that our results are of both theoretical and practical

importance because they open up completely new avenues

for the application of SLAM techniques.

The current paper draws on earlier work published in

conference papers [5], [6], [7]. We also present new unpub-

lished results demonstrating the advanced use of the

algorithm in humanoid robotics and augmented reality

applications.

2RELATED WORK

The work of Harris and Pike [8], whose DROID system built

visual maps sequentially using input from a single camera,

is perhaps the grandfather of our research and was far

ahead of its time. Impressive results showed 3D maps of

features from long image sequences, and a later real-time

implementation was achieved. A serious oversight of this

work, however, was the treatment of the locations of each of

the mapped visual features as uncoupled estimation

problems, neglecting the strong correlations introduced by

the common camera motion. Closely-related approaches

were presented by Ayache [9] and later Beardsley et al. [10]

in an uncalibrated geometrical framework, but these

approaches also neglected correlations, the result being

overconfident mapping and localization estimates and an

inability to close loops and correct drift.

Smith et al. [11] and, at a similar time, Moutarlier and

Chatila [12], had proposed taking account of all correlations

in general robot localization and mapping problems within a

single state vector and covariance matrix updated by the

Extende d Kalman Filter (EKF). Work by Leonard [13],

Manyika [14], and others demonstrated increasingly sophis-

ticated robot mapping and localization using related EKF

techniques, but the single state vector and “full covariance”

approach of Smith et al. did not receive widespread attention

until the mid to late 1990s, perhaps when computing power

reached the point where it could be practically tested. Several

early implementations [15], [16], [17], [18], [19] proved the

single EKF approach for building modest-sized maps in real

robot systems and demonstrated convincingly the impor-

tance of maintaining estimate correlations. These successes

gradually saw very widespread adoption of the EKF as the

core estimation technique in SLAM and its generality as a

Bayesian solution was understood across a variety of

different platforms and sensors.

In the intervening years, SLAM systems based on the

EKF and related probabilistic filters have demonstrated

impressive results in varied domains. The methods deviat-

ing from the standard EKF have mainly aimed at building

large scale maps, where the EKF suffers problems of

computational complexity and inaccuracy due to lineariza-

tion, and have included submapping strategies (e.g., [20],

[21]) and factorized particle filtering (e.g., [22]). The most

impressive results in terms of mapping accuracy and scale

have come from robots using laser range-finder sensors.

These directly return accurate range and bearing scans over

a slice of the nearby scene, which can either be processed to

extract repeatable features to insert into a map (e.g., [23]) or

simply matched whole-scale with other overlapping scans

to accurately measure robot displacement and build a map

of historic robot locations each with a local scan reference

(e.g., [24], [25]).

2.1 Vision-Based SLAM

Our algorithm uses vision as the only outward-looking

sense. In Section 1, we mentioned the additional challenges

2 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007

posed by vision over laser sensors, which include the very

high input data rate, the inherent 3D quality of visual data,

the lack of direct depth measurement and the difficulty in

extracting long-term features to map. These factors have

combined to mean that there have been relatively few

successful vision-only SLAM systems (where now we

define a SLAM system as one able to construct persistent

maps on the fly while closing loops to correct drift). In this

section, we review some of the most interesting and place

our work into context.

Neira et al. presented a simple system mapping vertical

line segments in 2D in a constrained indoor environment

[26], but the direct ancestor of the approach in the current

paper was the work by Davison and Murray [18], [27], [28]

whose system using fixating active stereo was the first

visual SLAM system with processing in real time (at 5Hz),

able to build a 3D map of natural landmarks on the fly and

control a mobile robot. The robotic active head that was

used forced a one-by-one choice of feature measurements

and sparse mapping. Nevertheless, it was proven that a

small set of landmarks could provide a very accurate SLAM

reference if carefully chosen and spread. Davison and Kita

[29] extended this method to the case of a robot able to

localize while traversing nonplanar ramps by combining

stereo vision with an inclinometer.

In more recent work, vision-based SLAM has been used in

a range of different systems. Jung and Lacroix [30] presented a

stereo vision SLAM system using a downward-looking stereo

rig to localize a robotic airship and perform terrain mapping.

Their implementation was sequential, but did not run in real

time and relied on a wide baseline fixed stereo rig to obtain

depth measurements directly. Kim and Sukkarieh [31] used

monocular vision in combination with accura te inertial

sensing to map ground-based targets from a dynamically

maneuvering UAV in an impressive system, though the

targets w ere artificially placed and estimation of thei r

locations is made easier by the fact that they can be assumed

to lie in a plane.

Bosse et al. [20], [32] used omnidirectional vision in

combination with other sensors in their ATLAS submap-

ping framework, making particular use of lines in a man-

made environment as consistent bearing references. Most

recently Eustice et al. [33] have used a single downward-

looking camera and inertial sensing to localize an under-

water remote vehicle and produce detailed seabed recon-

structions from low frame-rate image sequences. Using an

efficient sparse information filter their approach scales well

to large-scale mapping in their experimental setup where

loop closures are relatively infrequent.

Recently published work by Sim et al. [34] uses an

algorithm combining SI FT f eatures [35] and FastSLAM

filtering [22] to achieve particularly large-scale vision-only

SLAM mapping. Their method is processor-intensive and at

an average of 10 seconds processing time per frame is

currently a large factor away from real-time operation. The

commercial vSLAM system [36] also uses SIFT features,

though withinaSLAMalgorithmwhichrelies significantly on

odometry to build a connected map of recognizable locations

rather than fully continuous accurate localization. There is

little doubt that invariant features such as SIFT provide a high

level of performance in matching and permit high fidelity

“location recognition” in the same way as they were designed

for use in visual object recognition. Their value in loop-closing

or for localizing a “lost robot,” which involve matching with

very weak priors, is clear. They are less suited to continuous

tracking, however, due to the high-computational cost of

extracting them—a method like ours using active search will

always outperform invariant matching for speed.

A stress of our work is to simplify the hardware required

for SLAM to the simplest case possible, a single camera

connected to a computer, and to require a minimum of

assumptions about this camera’s free 3D movement. Several

authors have presented real-time camera tracking systems

with goals similar to our own. McLauchlan and Murray [37]

introduced the VSDF (Variable State-Dimension Filter) for

simultaneous structure and motion recovery from a moving

camera using a sparse information filter framework, but were

not able to demonstrate long-term tracking or loop closing.

The approach of Chiuso et al. [38] shared several of the ideas of

our work, including the propagation of map and localization

uncertainty using a single Extended Kalman Filter, but only

limited results of tracking small groups of objects with small

camera motions were presented. Their method used simple

gradient descent feature tracking and was therefore unable to

match features during high acceleration or close loops after

periods of neglect. Niste

r et al. [39] presented a real-time

system based very much on the standard structure from

motion methodology of frame-to-frame matching of large

numbers of point features which was able to recover

instantaneous motions impressively but again had no ability

to rerecognize features after periods of neglect and, therefore,

would lead inevitably to rapid drift in augmented reality or

localization. Foxlin [40] has taken a different approach in a

single camera system by using fiducial markers attached to

the ceiling in combination with high-performance inertial

sensing. This system achieved very impressive and repea-

table localization results, but with the requirement for

substantial extra infrastructure and cost. Burschka and Hager

[41] demonstrated a smal l-scale visual localization and

mapping system, though by separating the localization and

mapping steps they neglect estimate correlations and the

ability of this method to function over long time periods is

doubtful.

In the following section, we will present our method step

by step in a form accessible to readers unfamiliar with the

details of previous SLAM approaches.

3METHOD

3.1 Probabilistic 3D Map

The key concept of our approach, as in [11], is a probabilistic

feature-based map, representing at any instant a snapshot of

the current estimates of the state of the camera and all features

of interest and, crucially, also the uncertainty in these

estimates. The map is initialized at system start-up and

persists until operation ends, but evolves continuously and

dynamically as it is updated by the Extended Kalman Filter.

The probabilistic state estimates of the camera and features

are updated during camera motion and feature observation.

When new features are observed the map is enlarged with

new states and, if necessary, features can also be deleted.

The probabilistic character of the map lies in the propaga-

tion over time not only of the mean “best” estimates of the

states of the camera and features but a first order uncertainty

distribution describing the size of possible deviations from

these values. Mathematically, the map is represented by a

DAVISON ET AL.: MONOSLAM: REAL-TIME SINGLE CAMERA SLAM 3

state vector

x and covariance matrix P. State vector

x is

composed of the stacked state estimates of the camera and

features and P is a square matrix of equal dimension which

can be partitioned into submatrix elements as follows:

x ¼

; P ¼

...

: ð1Þ

In doing this, the probability distribution over all map

parameters is approximated as a single multivariate

Gaussian distribution in a space of dimension equal to the

total state vector size.

Explicitly, the camera’s state vector x

comprises a metric

3D position vector r

, orientation quaternion q

, velocity

vector v

, and angular velocity vector !

relative to a fixed

world frame W and “robot” frame R carried by the camera

(13 parameters):

: ð2Þ

In this work, feature states y

are the 3D position vectors of

the locations of point features. Camera and feature

geometry and coordinate frames are defined in Fig. 3a.

The role of the map is primarily to permit real-time

localization rather than serve as a complete scene description,

and we therefore aim to capture a sparse set of high-quality

landmarks. We assume that the scene is rigid and that each

landmark is a stationary world feature. Specifically, in this

work, each landmark is assumed to correspond to a well-

localized point feature in 3D space. The camera is modeled as

a rigid body needing translation and rotation parameters to

describe its position and we also maintain estimates of its

linear and angular velocity: This is important in our algorithm

since we will make use of motion dynamics as will be

explained in Section 3.4.

The map can be pictured as in Fig. 1a: All geometric

estimates can be considered as surrounded by ellipsoidal

regions representing uncertainty bounds (here correspond-

ing to three standard deviations). What Fig. 1a cannot show

is that the various ellipsoids are potentially correlated to

various degrees: In sequential mapping, a situation which

commonly occurs is that spatially close features which are

often observed simultaneously by the camera will have

position estimates whose difference (relative position) is

very well-known, while the position of the group as a whole

relative to the global coordinate frame may not be. This

situation is represented in the map covariance matrix P by

nonzero entries in the off-diagonal matrix blocks and comes

about naturally through the operation of the algorithm.

The total size of the map representation is order OðN

Þ,

where N is the number of features and the complete SLAM

algorithm we use has OðN

Þ complexity. This means that

the number of features which can be maintained with real-

time processing is bounded—in our system to around 100 in

current 30 Hz implementation.

There are strong reasons why we have chosen, in this work,

to use the “standard” single, full covariance EKF approach to

SLAM rather than variants which use different probabilistic

representations. As we have stated, our current goal is long-

term, repeatable localization within restricted volumes. The

pattern of observation of features in one of our maps is quite

different from that seen in many other implementations of

SLAM for robot mapping, such as [25], [34], or [22]. Those

robots move largely through corridor-like topologies, follow-

ing exploratory paths until they infrequently come back to

places they have seen before, at that stage correcting drift

around loops. Relatively ad hoc approaches can be taken to

distributing the correction around the well-defined loops,

whether this is through a chain of uncertain pose-to-pose

transformations or submaps or by selecting from a potentially

impoverished discrete set of trajectory hypotheses repre-

sented by a finite number of particles.

In our case, as a free camera moves and rotates in 3D

around a restricted space, individual features will come in

and out of the field of view in varying sequences, various

subsets of features at different depths will be covisible as

the camera rotates, and loops of many different sizes and

interlinking patterns will be routinely closed. We have

considered it very important to represent the detailed,

flexible correlations which will arise between different parts

of the map accurately. Within the class of known methods,

this is only computationally feasible with a sparse map of

features maintained within a single state vector and

covariance matrix. One hundred well-chosen features turn

out to be sufficient with careful map management to span a

room. In our opinion, it remains to be proven whether a

method (for instance, FastSLAM [22], [42]) which can cope

with a much larger number of features, but represent

4 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 6, JUNE 2007

Fig. 1. (a) A snapshot of the probabilistic 3D map, showing camera position estimate and feature position uncertainty ellipsoids. In this, and other

figures, in the paper the feature color code is as follows: red = successfully measured, blue = attempted but failed measurement, and yellow = not

selected for measurement on this step. (b) Visually salient feature patches detected to serve as visual landmarks and the 3D planar regions deduced

by back-projection to their estimated world locations. These planar regions are projected into future estimated camera positions to predict patch

appearance from new viewpoints.

correlations less accurately will be able to give such good

repeatable localization results in agile single camera SLAM.

3.2 Natural Visual Landmarks

Now, we turn specifically to the features which make up the

map. We have followed the approach of Davison and Murray

[5], [27], who showed that relatively large (11  11 pixels)

image patches are able to serve as long-term landmark

features, the large templates having more unique signatures

than standard corner features. However, we extend the

power of such features significantly by using the camera

localization information we have available to improve

matching over large camera displacements and rotations.

Salient image regions are originally detected automati-

cally (at times and in locations guided by the strategies of

Section 3.7) using the detection operator of Shi and Tomasi

[43] from the monochrome images obtained from the camera

(note that, in the current work, we use monochrome images

primarily for reasons of efficiency). The goal is to be able to

identify these same visual landmarks repeatedly during

potentially extreme camera motions and, therefore, straight-

forward 2D template matching (as in [5]) is very limiting, as

after only small degrees of camera rotation and translation the

appearance of a landmark can change greatly. To improve on

this, we make the approximation that each landmark lies on a

locally planar surface—an approximation that will be very

good in many cases and bad in others, but a great deal better

than assuming that the appearance of the patch will not

change at all. Further, since we do not know the orientation of

this surface, we make the assignment that the surface normal

is parallel to the vector from the feature to the camera at

initialization (in Section 3.8, we will present a method for

updating estimates of this normal direction). Once the

3D location, including depth, of a feature has been fully

initialized using the method of Section 3.6, each feature is

stored as an oriented planar texture (Fig. 1b). When making

measurements of a feature from new camera positions, its

patch can be projected from 3D to the image plane to produce

a template for matching with the real image. This template

will be a warped version of the original square template

captured when the feature was first detected. In general, this

will be a full projective warping, with shearing and

perspecti ve distortion, since we just send the template

through backward and forward camera models. Even if the

orientation of the surface on which the feature lies is not

correct, the warping will still take care successfully of rotation

about the cyclotorsion axis and scale (the degrees of freedom

to which the SIFT descriptor is invariant) and some amount of

other warping.

Note that we do not update the saved templates for

features over time—since the goal is repeatable localization,

we need the ability to exactly remeasure the locations of

features over arbitrarily long time periods. Templates which

are updated over time will tend to drift gradually from their

initial positions.

3.3 System Initialization

In most SLAM systems, the robot has no specific knowledge

about the structure of the world around it when first

switched on. It is free to define a coordinate frame within

which to estimate its motion and build a map and the

obvious choice is to fix this frame at the robot’s starting

position, defined as the origin. In our single camera SLAM

algorithm, we choose to aid system start-up with a small

amount of prior information about the scene in the shape of

a known target placed in front of the camera. This provides

several features (typically four) with known positions and

of known appearance. There are two main reasons for this:

1. In single camera SLAM, with no direct way to

measure feature depths or any odometry, starting

from a target of known size allows us to assign a

precise scale to the estimated map and motion—

rather than running with scale as a completely

unknown degree of freedom. Knowing the scale of

the map is desirable whenever it must be related to

other information such as priors on motion dy-

namics or features depths and makes it much more

easy to use in real applications.

2. Having some features in the map right from the start

means that we can immediately enter our normal

predict-measure-update tracking sequence without

any special first step. With a single camera, features

cannot be initialized fully into the map after only one

measurement because of their unknown depths and,

therefore, within our standard framework we would

be stuck without features to match to estimate the

camera motion from frames one to two. (Of course,

standard stereo algorithms provide a separate ap-

proach which could be used to bootstrap motion and

structure estimation.)

Fig. 2a shows the first step of tracking with a typical

initialization target. The known features—in this case, the

DAVISON ET AL.: MONOSLAM: REAL-TIME SINGLE CAMERA SLAM 5

Fig. 2. (a) Matching the four known features of the initialization target on

the first frame of tracking. The large circular search regions reflect the

high uncertainty assigned to the starting camera position estimate.

(b) Visualization of the model for “smooth” motion: At each camera

position, we predict a most likely path together with alternatives with small

deviations.

Fig. 3. (a) Frames and vectors in camera and feature geometry. (b) Active

search for features in the raw images from the wide-angle camera.

Ellipses show the feature search regions derived from the uncertainty in

the relative positions of camera and features and only these regions are

searched.

评论收藏

内容反馈

自由时间孕育自由思想

粉丝: 7
资源: 4

SLAM论文集锦

顶会SLAM论文锦集

SLAM综述论文整理分享

SLAM综述文章

SLAM相关文献

2017年IROS稀疏点SLAM论文集合

开源激光SLAM优质论文

激光SLAM经典论文翻译 完善版

slam的经典文献

激光 SLAM 算法

ORB_SLAM论文原文

icra2017 PLSLAM论文

slam论文pdf版本视觉slam的论文，orbslam2pdf版本论文

ORB-SLAM3论文.pdf

ORB-SLAM相关论文

orb_slam论文

本科毕业论文_基于ROS和RGB-D传感器的SLAM智能机器人

SLAM 领域 ROVIO论文，VIO经典开源方案

CNN-SLAM Real-time dense monocular SLAM with learned depth prediction

Multi-session Visual Simultaneous Localisation and Mapping（移动机器人slam）

激光SLAM Hector官方论文A Flexible and Scalable SLAM System with Full1

ORB-SLAM2 论文pdf

ORB-SLAM2 论文翻译.pdf

论文《Real-time loop closure in 2D LIDAR SLAM》

LSD-SLAM Large-Scale Direct Monocular SLAM.pdf

汇总-VIO_激光SLAM相关论文分类集锦.zip

SLAM 双目DSO+IMU 论文

ORB-SLAM2论文原文

MaskFusion SLAM的论文

cartographer、graphslam论文

最新资源

激光SLAM经典论文翻译完善版