没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
Recurrent MVSNet for High-resolution Multi-view Stereo Depth Inference
Yao Yao
1∗
Zixin Luo
1
Shiwei Li
1
Tianwei Shen
1
Tian Fang
2†
Long Quan
1
1
The Hong Kong University of Science and Technology
{yyaoag, zluoag, slibc, tshenaa, quan}@cse.ust.hk
2
Shenzhen Zhuke Innovation Technology (Altizure)
fangtian@altizure.com
Abstract
Deep learning has recently demonstrated its excellent
performance for multi-view stereo (MVS). However, one
major limitation of current learned MVS approaches is the
scalability: the memory-consuming cost volume regulariza-
tion makes the learned MVS hard to be applied to high-
resolution scenes. In this paper, we introduce a scalable
multi-view stereo framework based on the recurrent neu-
ral network. Instead of regularizing the entire 3D cost vol-
ume in one go, the proposed Recurrent Multi-view Stereo
Network (R-MVSNet) sequentially regularizes the 2D cost
maps along the depth direction via the gated recurrent
unit (GRU). This reduces dramatically the memory con-
sumption and makes high-resolution reconstruction feasi-
ble. We first show the state-of-the-art performance achieved
by the proposed R-MVSNet on the recent MVS benchmarks.
Then, we further demonstrate the scalability of the pro-
posed method on several large-scale scenarios, where pre-
vious learned approaches often fail due to the memory con-
straint. Code is available at https://github.com/
YoYo000/MVSNet.
1. Introduction
Multi-view stereo (MVS) aims to recover the dense repre-
sentation of the scene given multi-view images and cali-
brated cameras. While traditional methods [24, 10, 29, 9]
have achieved excellent reconstruction performance, recent
works [14, 13, 30] show that learned approaches are able to
produce results comparable to the traditional state-of-the-
arts. In particular, MVSNet [30] proposed a deep architec-
ture for depth map estimation, which significantly boosts
the reconstruction completeness and the overall quality.
One of the key advantages of learning-based MVS is
the cost volume regularization, where most networks ap-
∗
Intern at Shenzhen Zhuke Innovation Technology (Altizure).
†
Corresponding author.
ply multi-scale 3D CNNs [14, 15, 30] to regularize the 3D
cost volume. However, this step is extremely memory ex-
pensive: it operates on 3D volumes and the memory re-
quirement grows cubically with the model resolution (Fig. 1
(d)). Consequently, current learned MVS algorithms could
hardly be scaled up to high-resolution scenarios.
Recent works on 3D with deep learning also acknowl-
edge this problem. OctNet [23] and O-CNN [27] exploit the
sparsity in 3D data and introduce the octree structure to 3D
CNNs. SurfaceNet [14] and DeepMVS [13] apply the engi-
neered divide-and-conquer strategy to the MVS reconstruc-
tion. MVSNet [30] builds the cost volume upon the ref-
erence camera frustum to decouple the reconstruction into
smaller problems of per-view depth map estimation. How-
ever, when it comes to a high-resolution 3D reconstruction
(e.g., volume size > 512
3
voxels), these methods will either
fail or take a long time for processing.
To this end, we present a novel scalable multi-view
stereo framework, dubbed as R-MVSNet, based on the re-
current neural network. The proposed network is built upon
the MVSNet architecture [30], but regularizes the cost vol-
ume in a sequential manner using the convolutional gated
recurrent unit (GRU) rather than 3D CNNs. With the se-
quential processing, the online memory requirement of the
algorithm is reduced from cubic to quadratic to the model
resolution (Fig. 1 (c)). As a result, the R-MVSNet is appli-
cable to high resolution 3D reconstruction with unlimited
depth-wise resolution.
We first evaluate the R-MVSNet on DTU [1], Tanks and
Temples [17] and ETH3D [25] datasets, where our method
produces results comparable or even outperforms the state-
of-the-art MVSNet [30]. Next, we demonstrate the scal-
ability of the proposed method on several large-scale sce-
narios with detailed analysis on the memory consumption.
R-MVSNet is much more efficient than other methods in
GPU memory and is the first learning-based approach ap-
plicable to such wide depth range scenes, e.g., the advance
set of Tanks and Temples dataset [17].
5520
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
978-1-7281-3293-8/19/$31.00 ©2019 IEEE
DOI 10.1109/CVPR.2019.00567
Authorized licensed use limited to: Institute of Software. Downloaded on November 15,2024 at 03:27:14 UTC from IEEE Xplore. Restrictions apply.
资源评论
GL_Rain
- 粉丝: 729
- 资源: 36
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功