SuperpixelSoup_MonocularDense3DReconstructionofaComplexDynamicScene资源-CSDN文库

需积分: 12 17 浏览量 2019-11-28 20:32:03 上传评论收藏 6.58MB PDF 举报

标题《Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene》中提出的技术挑战，是当前计算机视觉领域的研究热点之一，涉及到从单一视角图像中实现复杂动态场景的密集三维重建。这是计算机视觉中一个基础而困难的问题，即利用图像序列重建场景的三维几何结构，这个问题也被称作从运动中恢复结构（Structure from Motion, SfM）。从单个摄像头（monocular）进行三维重建是一个挑战，因为从二维图片中恢复出三维结构需要估计深度信息，而这种估计通常涉及多个不确定因素。传统的解决方法通常包括一系列步骤，需要多个处理流程协同工作。这些传统方法的局限性在于它们依赖于多个独立的处理环节，并且在处理动态场景时，往往假设场景是静态的，或者只能恢复出稀疏特征点的三维结构。这样的场景重建方法不能很好地应用于复杂的动态场景，尤其是在重建精度和速度上都有较大局限。本文提出了一种统一的方法来解决这一问题。文章作者假设一个动态场景可以由许多个平面片段来近似，每个平面片段有其自身的刚体运动，并且在连续两帧之间的全局场景变化尽可能地保持刚性（as-rigid-as-possible, ARAP）。因此，动态场景的模型可以简化为一系列平面结构的“汤”（soup），以及这些局部平面结构的刚体运动。通过利用场景的平面超分割（over-segmentation），将三维重建任务转变为解决一个三维拼图问题。在这种方法中，关键任务是正确拼接每个刚体组件以构建一个符合场景几何结构的三维形状，同时在ARAP假设下进行。文章还指出，所提出的方法为在透视投影下的从运动中恢复结构的固有尺度模糊性提供了有效的解决途径。在这项工作中，作者提供了一系列基准数据集上的广泛实验结果和评估，与竞争对手的方法进行了定量比较，结果表明该方法在性能上达到了最先进水平。文章中的关键词包括密集三维重建、透视相机、尽可能刚性的约束、相对尺度模糊性和从运动中恢复结构。文章引入了几个重要的计算机视觉领域的概念： 1. Superpixel：在图像处理中，超像素是指将像素图像划分为许多小的、相对均匀的图像区域，这些区域的特征在视觉上相似，但边界比单个像素更模糊。与传统的像素级操作不同，超像素作为高级特征，可以用于图像分割、边缘检测等任务。 2. Dense 3D Reconstruction：密集三维重建，与传统的稀疏重建相对，是指从图像序列中恢复场景的每一个像素点的三维信息，包括深度信息，从而可以重建出整个场景的密集三维模型。 3. Structure from Motion (SfM)：从运动中恢复结构是一种三维重建技术，通过分析一系列从不同角度拍摄的二维图像，恢复出场景的三维几何结构和相机的运动轨迹。SfM常用于各种计算机视觉应用，如机器人导航、增强现实和虚拟现实。 4. As-rigid-as-possible (ARAP)：尽可能刚性约束是一种用以保证形状变化最小化的模型，常用于几何建模和图形学中，特别是在处理非刚性变形的问题上，比如在三维重建中要保持物体形状的连续性。 5. Relative Scale Ambiguity：相对尺度模糊性是指在三维重建过程中，尤其是从二维图像重建三维结构时，由于缺乏实际物理大小的参考，导致无法确定真实世界的尺度大小的问题。 6. Perspective Camera：透视相机是摄影术语，指的是一种成像模型，其模拟的是人眼观看世界的视觉效果，即随着物体距离的增加，物体的尺寸在照片上呈现出缩小的效果。这篇文章的创新之处在于提出了一种全新的方法来处理单视角图像的密集三维重建问题，它通过将场景近似为一系列平面片段，并且假设这些片段之间存在尽可能刚性的运动关系，进而将重建任务简化为类似于三维拼图的解决方案。这不仅提高了重建的精度，同时也为解决尺度模糊性问题提出了新的思路。文章的工作验证了这一方法的实用性和优越性，为未来计算机视觉在动态场景重建方面提供了新的发展方向。

资源推荐

资源详情

资源评论

Superpixel Soup: Monocular Dense 3D

Reconstruction of a Complex Dynamic Scene

Suryansh Kumar, Member, IEEE, Yuchao Dai, Member, IEEE, Hongdong Li, Senior Member, IEEE

Abstract—This work addresses the task of dense 3D reconstruction of a complex dynamic scene from images. The prevailing idea to

solve this task is composed of a sequence of steps and is dependent on the success of several pipelines in its execution [1]. To overcome

such limitations with the existing algorithm, we propose a uniﬁed approach to solve this problem. We assume that a dynamic scene can

be approximated by numerous piecewise planar surfaces, where each planar surface enjoys its own rigid motion, and the global change

in the scene between two frames is as-rigid-as-possible (ARAP). Consequently, our model of a dynamic scene reduces to a soup of

planar structures and rigid motion of these local planar structures. Using planar over-segmentation of the scene, we reduce this task to

solving a “3D jigsaw puzzle” problem. Hence, the task boils down to correctly assemble each rigid piece to construct a 3D shape that

complies with the geometry of the scene under the ARAP assumption. Further, we show that our approach provides an effective solution

to the inherent scale-ambiguity in structure-from-motion under perspective projection. We provide extensive experimental results and

evaluation on several benchmark datasets. Quantitative comparison with competing approaches shows state-of-the-art performance.

Index Terms—Dense 3D reconstruction, perspective camera, as-rigid-as-possible, relative scale ambiguity, structure from motion.

1 INTRODUCTION

HE task of reconstructing 3D geometry of the scene from

images —popularly known as structure-from-motion

(SfM), is a fundamental problem in computer vision. An

initial introduction and working solution to this problem

can be found as early as 1970’s and 1980’s [2] [3] [4], which

Blake et al. discussed comprehensively in their seminal

work [5]. While this ﬁeld of study in the past was largely

dominated by sparse feature based reconstruction of a rigid

scene [6] [7] [8] [9] [10] and a non-rigid object [11] [12] [13]

[14] [15], in recent years, with the surge in computational

resources, dense 3D reconstruction of the scene have been

introduced and successfully demonstrated [16] [17] [1].

A dense solution to this inverse problem is essential due

to its increasing demand in many real-world applications

–from animation and entertainment industry to robotics

industry (VSLAM). In particular, with the proliferation of

monocular camera in almost all modern mobile devices has

elevated the demand for sophisticated dense reconstruc-

tion algorithm. When the scene is static and the camera

is moving, 3D reconstruction of such scenes from images

can be achieved by using conventional rigid structure from

motion techniques [8] [18] [19] [20]. In contrast, to model

arbitrary dynamic scene can be very challenging. When the

camera is moving and the scene is static under such settings,

the elegant geometrical constraint can help explain the

camera’s [7] [21], which are later used to realize the dense

3D reconstruction of the scene [19] [20] [17] [22]. However,

such geometrical constraint may fail when multiple rigidly

moving objects are observed by a moving camera. Although

• Suryansh Kumar is with ETH Z¨urich and Australian National Univer-

sity. E-mail: sukumar@vision.ee.ethz.ch, suryansh.kumar@anu.edu.au.

• Yuchao Dai is with Northwestern Polytechnical University. E-mail:

daiyuchao@gmail.com.

• Hongdong Li is with Australian National University and ARC Centre of

Excellent for Robotic Vision. E-mail: hongdong.li@anu.edu.au.

Dense 3D

reconstruction

Input Images

You-Tube Object Dataset(Messi) MPI Sintel (alley_1)

Fig. 1: Dense 3D reconstruction of a complex dynamic scene, where both the

camera and the objects are moving with respect to each other. The top left shows

a sample reconstruction on messi sequence from Youtube Object dataset [23].

The top right shows the reconstruction on alley 1 sequence from the MPI Sintel

dataset [24].

each of the individual rigid objects can be reconstructed

up to an arbitrary scale (assuming motion segmentation is

provided), the reconstruction of the whole dynamic scene

is generally impossible, simply because the relative scales

among all the moving shapes cannot be determined in a

globally consistent way. Furthermore, since all the estimated

motions are relative to each other, one cannot distinguish

camera motion from the object motion. Therefore, prior

information about the objects, or the scene, and their relation

to the frame of reference are used to ﬁx the placement of

these objects relative to each other.

Hence, from the above discussion, it can be argued that

the solution to 3D reconstruction of a general dynamic

scene is non-trivial. Nevertheless, it is an important problem

to solve as many real-world applications need a reliable

solution to this problem. For example, understanding of a

trafﬁc scene, a typical outdoor trafﬁc scene consists of both

multiple rigid motions of vehicles, and non-rigid motion of

the pedestrians. To model such scenarios, it is important to

have an algorithm that can provide dense 3D information

from images.

arXiv:1911.09092v1 [cs.CV] 19 Nov 2019

Recently, Ranftl et al. [1] proposed a three-step approach

to procure dense 3D reconstruction of a general dynamic

scene using two consecutive perspective frames. Concretely,

it performs object-level motion segmentation followed by

per-object 3D reconstruction and ﬁnally solves for scale

ambiguity. We know that in a general dynamic setting, the

task of densely segmenting rigidly moving objects or part

is not trivial. Consequently, inferring motion models for

deforming shapes becomes very challenging. Furthermore,

the success of object-level segmentation builds upon the

assumption of multiple rigid motions, fails to describe more

general scenarios such as “when the objects themselves

are deforming”. Subsequently, 3D reconstruction algorithms

dependent on motion segmentation of objects suffer.

Motivated by such limitations, we propose a uniﬁed

approach that neither performs any object-level motion

segmentation nor assumes any prior knowledge about the

scene rigidity type and still able to recover scale consistent

dense reconstruction of a complex dynamic scene. Our for-

mulation instinctively encapsulates the solution to inherent

scale ambiguity in perspective structure from motion which

is a very challenging problem in general. We show that

by using two prior assumptions —about the 3D scene and

the deformation, we can effectively pin down the unknown

relative scales, and obtain a globally consistent dense 3D

reconstruction of a dynamic scene from its two perspective

views. The two basic assumptions we used about the dy-

namic scene are:

1) The dynamic scene can be approximated by a collection

of piecewise planar surfaces each having its own rigid

motion.

2) The deformation of the scene between two frames is

locally-rigid but globally as-rigid-as-possible.

• Piece-wise planar model: Our method models a dynamic

scene as a collection of piece-wise planar regions. Given two

perspective images I (reference image), I

(next image) of

a general dynamic scene, our method ﬁrst over-segment

the reference image into superpixels. This collection of

superpixels are assumed approximation of the dynamic

scene in the projective space. It can be argued that mod-

eling dynamic scene per pixel can be more compelling,

however, modeling of a scene using planar regions makes

this problem computationally tractable for optimization

or inference [25], [26].

• Locally-rigid and globally as-rigid-as-possible: We implicitly

assume that each local plane undergoes a rigid motion.

Suppose every individual superpixel corresponds to a

small planar patch moving rigidly in 3D space and dense

optical ﬂow between frame is given, we can estimate its

location in 3D using rigid reconstruction pipeline [8], [27].

Since the relative scale of these patches is not determined

correctly, they are ﬂoating in 3D space as a set of un-

organized superpixel soup. Under the assumption that

the change between the frame is not too arbitrary rather

regular or smooth, the scene can be assumed to be chang-

ing as rigid as possible globally. Using this intuition, our

method starts ﬁnding for each superpixel an appropriate

scale, under which the entire set of superpixels can be

assembled (glued) together coherently, forming a piece-

wise smooth surface, as if playing the game of “3D jig-

Output

Optimization

framework

Fig. 2: Reconstructing a 3D surface from a soup of un-scaled superpixels via

solving a 3D Superpixel Jigsaw puzzle problem.

saw puzzle”. Hence, we call our method the “SuperPixel

Soup” algorithm (see Fig. 2 for a conceptual visualization).

In this paper, we show that our aforementioned assump-

tions can faithfully model most of the real-world dynamic

scenarios. Furthermore, we encapsulate these assumptions

in a simple optimization problem which are solved using a

combination of continuous and discrete optimization algo-

rithms [28], [29], [30]. We demonstrate the beneﬁt of our

approach on available benchmark dataset such as KITTI

[31], MPI Sintel [24] and Virtual KITTI [32]. The statistical

comparison shows that our algorithm outperforms many

available state-of-the-art methods by a signiﬁcant margin.

2 RELATED WORKS

The solution to SfM has undergone prodigious development

since its inception [2]. Even after such a remarkable devel-

opment in this ﬁeld, the choice of algorithm depends on the

complexity of the object motion and the environment. In this

work, we utilize the idea of rigidity (locally) to solve dense

reconstruction of a general dynamic scene. The concept of

rigidity is not new in structure from motion problem [2]

[33] and has been effectively applied as a global constraint

to solve large scale reconstruction problem [18]. The idea of

global rigidity to solve structure and motion has also been

exploited to solve reconstruction over multiple frames via a

factorization approach [10].

The literature on structure from motion and its treatment

to different scenarios is very extensive. Consequently, for

brevity, we only discuss the previous works that are of direct

relevance to dynamic 3D reconstruction from monocular

images. The linear low-rank model has been used for dense

non-rigid reconstruction. Kumar et al. [34], [35] and Garg et

al. [36] solved the task with an orthographic camera model

assuming feature matches across multiple frames is given as

input. Fayad et al. [37] recovered deformable surfaces with

a quadratic approximation, again from multiple frames.

Taylor et al. [38] proposed a piecewise rigid solution using

locally-rigid SfM to reconstruct a soup of rigid triangles.

While Taylor et al. [38] method is conceptually similar to

ours, there are major differences:

1) We achieve two-view dense reconstruction while [38]

relies on multiple views (N ≥ 4).

2) We use perspective camera model while they rely on an

orthographic camera model.

3) We solve the scale-indeterminacy issue, which is an

inherent ambiguity for 3D reconstruction under per-

spective projection, while Taylor et al. [38] method does

not suffer from this, at the cost of being restricted to the

orthographic camera model.

Recently, Russel et al. [39] and Ranftl et al. [1] used object-

level segmentation for dense dynamic 3D reconstruction. In

剩余11页未读，继续阅读

评论收藏

内容反馈

qq_31628315

粉丝: 0
资源: 9

Superpixel Soup_ Monocular Dense 3D Reconstruction of a Complex ...

最新资源

Superpixel Soup_ Monocular Dense 3D Reconstruction of a Complex ...

3D_building_reconstruction：通过使用全景图像序列和建筑足迹数据自动增强CityGML LOD2建筑的外观细节

三维重构 scene reconstruction

图像3D重建

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

机器学习期末复习题及答案

用于研究python写界面小工具

神经网络回归预测--气温数据集

Mathwork+Matlab+编程手册

Ollama软件windows安装包(版本0.3.10)

中文短信数据集-带标签

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！

实验三 医学知识图谱构建与推理

亚博K210模型训练部署

西安电子科技大学人工智能学院模式识别课程自用笔记

Plecs电力电子仿真PLECS41.64 电力系统仿真软件免安装版本

shape_predictor_68_face_landmarks.zip

一款操作微信的RPA工具

hugging face的models-openai-clip-vit-large-patch14文件夹

改进版的yolov5+双目测距

Stable-Diffusion WEBUI 简体中文语言包（2023.05.30更新）

Matlab 基于支持向量机(SVM)的数据回归预测 SVM回归

基于鲸鱼优化算法优化VMD参数试看效果代码(目标函数为样本熵)

mock_kaggle.csv

电机故障数据集.rar

《1. 机器学习前置知识》配套数据集

基于LSTM模型的股票预测模型_python

ADRC控制器仿真 simulink 2017a版本

TA-Lib的whl文件

temps.csv数据集

最新资源

实验三医学知识图谱构建与推理