CS231A3D视觉资源-CSDN文库

共5个文件

pdf：5个

立体视觉

需积分: 32 157 浏览量 2017-12-16 10:37:18 上传评论收藏 5.33MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

CS231A.zip （5个子文件）

02-single-view-metrology.pdf 708KB

03-epipolar-geometry.pdf 1.96MB

01-camera-models.pdf 951KB

05-active-volumetric-stereo.pdf 1.35MB

04-stereo-systems.pdf 2.53MB

CS231A Course Notes 4: Stereo Systems and

Structure from Motion

Kenji Hata and Silvio Savarese

1 Introduction

In the previous notes, we covered how adding additional viewpoints of a

scene can greatly enhance our knowledge of the said scene. We focused on

the epipolar geometry setup in order to relate points of one image plane to

points in the other without extracting any information about the 3D scene.

In these lecture notes, we will discuss how to recover information about the

3D scene from multiple 2D images.

2 Triangulation

One of the most fundamental problems in multiple view geometry is the

problem of triangulation, the process of determining the location of a 3D

point given its projections into two or more images.

Figure 1: The setup of the triangulation problem when given two views.

In the triangulation problem with two views, we have two cameras with

known camera intrinsic parameters K and K

respectively. We also know the

relative orientations and oﬀsets R, T of these cameras with respect to each

other. Suppose that we have a point P in 3D, which can be found in the

images of the two cameras at p and p

respectively. Although the location of

P is currently unknown, we can measure the exact locations of p and p

the image. Because K, K

, R, T are known, we can compute the two lines of

sight ` and `

, which are deﬁned by the camera centers O

, O

and the image

locations p, p

. Therefore, P can be computed as the intersection of ` and `

Figure 2: The triangulation problem in real-world scenarios often involves

minimizing the reprojection error.

Although this process appears both straightforward and mathematically

sound, it does not work very well in practice. In the real world, because the

observations p and p

are noisy and the camera calibration parameters are

not precise, ﬁnding the intersection point of ` and `

may be problematic. In

most cases, it will not exist at all, as the two lines may never intersect.

2.1 A linear method for triangulation

In this section, we describe a simple linear triangulation method that solves

the lack of an intersection point between rays. We are given two points in the

images that correspond to each other p = MP = (x, y, 1) and p

= M

P =

, y

, 1). By the deﬁnition of the cross product, p × (MP ) = 0. We can

explicitly use the equalities generated by the cross product to form three

constraints:

x(M

P ) −(M

P ) = 0

y(M

P ) − (M

P ) = 0

x(M

P ) − y(M

P ) = 0

(2.1)

where M

is the i-th row of the matrix M. Similar constraints can be for-

mulated for p

and M

. Using the constraints from both images, we can

formulate a linear equation of the form AP = 0 where

A =







− M







(2.2)

This equation can be solved using SVD to ﬁnd the best linear estimate of

the point P . Another interesting aspect of this method is that it can actu-

ally handle triangulating from multiple views as well. To do so, one simply

appends additional rows to A corresponding to the added constraints by the

new views.

This method, however is not suitable for projective reconstruction, as it is

not projective-invariant. For example, suppose we replace the camera matri-

ces M, M

with ones aﬀected by a projective transformation MH

−1

, M

−1

The matrix of linear equations A then becomes AH

−1

. Therefore, a solution

P to the previous estimation of AP = 0 will correspond to a solution HP for

the transformed problem (AH

−1

)(HP ) = 0. Recall that SVD solves for the

constraint that kP k = 1, which is not invariant under a projective transfor-

mation H. Therefore, this method, although simple, is often not the optimal

solution to the triangulation problem. -

2.2 A nonlinear method for triangulation

Instead, the triangulation problem for real-world scenarios is often mathe-

matically characterized as solving a minimization problem:

min

P − pk

+ kM

P − p

(2.3)

In the above equation, we seek to ﬁnd a

P in 3D that best approximates P

by ﬁnding the best least-squares estimate of the reprojection error of

P in

both images. The reprojection error for a 3D point in an image is the distance

between the projection of that point in the image and the corresponding

observed point in the image plane. In the case of our example in Figure 2,

since M is the projective transformation from 3D space to image 1, the

projected point of

P in image 1 is M

P . The matching observation of

in image 1 is p. Thus, the reprojection error for image 1 is the distance

P − pk. The overall reprojection error found in Equation 2.3 is the sum

of the reprojection errors across all images. For cases with more than two

images, we would simply add more distance terms to the objective function:

min

− p

(2.4)

In practice, there exists a variety of very sophisticated optimization tech-

niques that result in good approximations to the problem. However, for the

scope of the class, we will focus on only one of these techniques, which is the

Gauss-Newton algorithm for nonlinear least squares. The general nonlinear

least squares problem is to ﬁnd an x ∈ R

that minimizes

kr(x)k

i=1

(x)

(2.5)

where r is any residual function r : R

→ R

such that r(x) = f(x) −

y for some function f, input x, and observation y. The nonlinear least

squares problem reduces to the regular, linear least squares problem when

the function f is linear. However, recall that, in general, our camera matrices

are not aﬃne. Because the projection into the image plane often involves a

division by the homogeneous coordinate, the projection into the image is

generally nonlinear.

Notice that if we set e

to be a 2 × 1 vector e

= M

− p

, then we can

reformulate our optimization problem to be:

min

(

P )

(2.6)

which can be perfectly represented as a nonlinear least squares problem.

In these notes, we will cover how we can use the popular Gauss-Newton

algorithm to ﬁnd an approximate solution to this nonlinear least squares

problem. First, let us assume that we have a somewhat reasonable estimate

of the 3D point

P , which we can compute by the previous linear method.

The key insight of the Gauss-Newton algorithm is to update our estimate by

correcting it towards an even better estimate that minimizes the reprojection

error. At each step we want to update our estimate

P by some δ

P =

P + δ

But how do we choose the update parameter δ

? The key insight of the

Gauss-Newton algorithm is to linearize the residual function near the current

estimate

P . In the case of our problem, this means that the residual error e

of a point P can be thought of as:

P + δ

) ≈ e(

P ) +

∂e

∂P

(2.7)

Subsequently, the minimization problem transforms into

min

∂e

∂P

− (−e(

P ))k

(2.8)

When we formulate the residual like this, we can see that it takes the format

of the standard linear least squares problem. For the triangulation problem

with N images, the linear least squares solution is

= −(J

−1

e (2.9)

where

e =



















− M







(2.10)

and

J =







∂e

∂

∂e

∂

∂e

∂

∂e

∂

∂e

∂

∂e

∂







(2.11)

Recall that the residual error vector of a particular image e

is a 2 × 1

vector because there are two dimensions in the image plane. Consequently,

in the simplest two camera case (N = 2) of triangulation, this results in the

residual vector e being a 2N × 1 = 4 × 1 vector and the Jacobian J being a

2N ×3 = 4×3 matrix. Notice how this method handles multiple views seam-

lessly, as additional images are accounted for by adding the corresponding

rows to the e vector and J matrix. After computing the update δ

, we can

simply repeat the process for a ﬁxed number of steps or until it numerically

converges. One important property of the Gauss-Newton algorithm is that

our assumption that the residual function is linear near our estimate gives

us no guarantee of convergence. Thus, it is always useful in practice to put

an upper bound on the number of updates made to the estimate.

评论收藏

内容反馈

cocoaqin

粉丝: 264
资源: 5

CS231A 3D视觉

最新资源

CS231A 3D视觉

斯坦福大学CS231a课程资料.rar

CS231A 笔记

斯坦福大学CS231A的代码

cs231a课件（第二部分）

斯坦福CS231课程全套资料.zip

cs231a课件（第一部分）

cs231a_softra

cs231a 2022版PPT与作业

CS231A:我自己的CS231A_1718fall问题集解决方案

CS231A-项目-立体声匹配：CS231A的课程项目。 深度立体声匹配，重新实现GC-Net

CS231A-Final-Project

CNN 计算机视觉课程斯坦福大学report

cs231a_final:CS231A的最终项目

cs231a_project

WGB-231A技术及使用说明书v1.00.pdf

DepthNet

SP231A_232A_233A_310A_312A_中文

stereo_based_tracking:基于立体的自动驾驶3D对象跟踪

matlabransac代码-Final:最终项目

stats231a：UCLA的STATS 231A-模式识别和机器学习，2020秋季

PanoramaFromVideo:基于 OpenCV 的程序，用于从视频中识别和生成全景图

ORB_SLAM3源码（附带详细注释）

用于车牌号识别的字符模板.zip

2020仿720云VR全景网站系统源码（含示例）.zip

StereoV3D-3rdparty.zip

10月最新720全景云系统，可生成小程序+带PC端+安装教程.zip

李迎松博士论文-摄影测量影像快速立体匹配关键算法研究.pdf

人脸检测特征数据.rar

meshlab教程（整合网上资料）.rar

最新资源

CS231A-项目-立体声匹配：CS231A的课程项目。深度立体声匹配，重新实现GC-Net