（ICCV2023）Parallax-TolerantUnsupervisedDeepImageStitching资源-CSDN文库

154 浏览量 2024-03-19 15:35:19 上传评论收藏 9.18MB PDF 举报

### ICCV 2023：Parallax-Tolerant Unsupervised Deep Image Stitching #### 概述本文介绍了一种新的图像拼接方法——Parallax-Tolerant Unsupervised Deep Image Stitching（UDIS++），该方法针对现有的图像拼接技术在处理大视差图像时遇到的问题进行了改进。UDIS++是一种无监督学习方法，能够在无需人工标记的情况下学习到鲁棒性强、适应性好的图像特征，从而实现高质量的图像拼接效果。 #### 背景与挑战传统的图像拼接技术通常依赖于几何特征（如点、线、边缘等），这些手工设计的特征对于具备足够几何结构的自然场景表现良好。然而，在纹理较少或视差较大的情况下，传统方法的表现就会受到影响。例如，当面对大量视差时，基于学习的方法可能会模糊视差区域；而在低纹理场景下，传统的基于几何特征的方法则可能因缺乏足够的几何特征而失败。 #### 方法论为了克服上述问题，研究团队提出了UDIS++，一种能够容忍视差变化的无监督深度图像拼接技术。该技术主要分为以下几个步骤： 1. **灵活的图像配准**：UDIS++通过一种从全局同构变换到局部薄板样条运动的鲁棒且灵活的配准方式来建模图像对齐过程。这种配准方法可以为重叠区域提供精确的对齐，并通过联合优化对齐和失真来保持非重叠区域的形状。 2. **增强的迭代策略**：为了提高模型的泛化能力，UDIS++还设计了一个简单但有效的迭代策略来增强配准适应性，尤其是在跨数据集和跨分辨率的应用中。 3. **消除视差伪影**：UDIS++进一步采取措施来消除视差伪影，确保最终的拼接结果既准确又自然。 #### 实验与验证研究团队利用了UDIS-D数据集中的两个具体案例来验证UDIS++的有效性和优越性： 1. **大视差案例**：在这个案例中，研究人员展示了UDIS++如何有效地处理具有大视差的图像，相比于之前的方法（例如UDIS），UDIS++不仅没有模糊视差区域，而且还保持了高清晰度和连贯性。 2. **低纹理场景案例**：另一个案例则聚焦于低纹理场景下的图像拼接。与传统方法（如LPC）相比，UDIS++能够在缺乏足够几何特征的情况下成功地完成拼接任务，这得益于其强大的自适应学习能力和对复杂场景的理解。 #### 结论与展望 UDIS++作为一种新的图像拼接技术，不仅克服了现有方法在处理大视差和低纹理场景时的局限性，而且还在提高图像拼接质量方面取得了显著的进步。随着未来对这一领域研究的深入，预计UDIS++将在更多实际应用场景中发挥重要作用，如虚拟现实、自动驾驶等领域。此外，UDIS++的成功也为后续的研究提供了有价值的参考方向，即如何结合深度学习技术和传统几何特征提取方法来解决更广泛的实际问题。

资源推荐

资源详情

资源评论

Parallax-Tolerant Unsupervised Deep Image Stitching

Lang Nie

1,2

, Chunyu Lin

1,2

, Kang Liao

1,2

, Shuaicheng Liu

, Yao Zhao

1,2

Institute of Information Science, Beijing Jiaotong University, Beijing, China

Beijing Key Laboratory of Advanced Information Science and Network, Beijing, China

University of Electronic Science and Technology of China, Chengdu, China

LPC UDIS Ours

Reference/

target image

LPC UDIS Ours

(a) A stitching case of large parallax from UDIS-D dataset.

(b) A stitching case of low texture from UDIS-D dataset.

Reference/

target image

Figure 1: Limitations of existing methods. (a) UDIS [41] (learning method) deals with large parallax by blurring parallax re-

gions (highlighted in red). (b) LPC [19] (traditional method) fails in low-texture scenes without sufﬁcient geometric features.

Instead, our solution is free from these limitations, achieving promising results in both of the challenging circumstances.

Abstract

Traditional image stitching approaches tend to lever-

age increasingly complex geometric features (e.g., point,

line, edge, etc.) for better performance. However, these

hand-crafted features are only suitable for speciﬁc natu-

ral scenes with adequate geometric structures. In con-

trast, deep stitching schemes overcome adverse conditions

by adaptively learning robust semantic features, but they

cannot handle large-parallax cases.

To solve these issues, we propose a parallax-tolerant

unsupervised deep image stitching technique (UDIS++).

First, we propose a robust and ﬂexible warp to model

the image registration from global homography to local

thin-plate spline motion. It provides accurate alignment

for overlapping regions and shape preservation for non-

overlapping regions by joint optimization concerning align-

ment and distortion. Subsequently, to improve the gen-

eralization capability, we design a simple but effective it-

erative strategy to enhance the warp adaption in cross-

dataset and cross-resolution applications. Finally, to fur-

ther eliminate the parallax artifacts, we propose to com-

posite the stitched image seamlessly by unsupervised learn-

ing for seam-driven composition masks. Compared with ex-

isting methods, our solution is parallax-tolerant and free

from laborious designs of complicated geometric features

for speciﬁc scenes. Extensive experiments show our superi-

ority over the SoTA methods, both quantitatively and qual-

itatively. The code is available at https://github.

com/nie-lang/UDIS2.

1. Introduction

Image stitching is a practical technology that aims to

construct a scene with a wide ﬁeld-of-view (FoV) from dif-

ferent images with limited FoV. It is useful in a wide range

of ﬁelds, such as autonomous driving, medical imaging,

surveillance videos, virtual reality, etc.

Over the past decades, traditional stitching approaches

tend to adopt increasingly complicated geometric features

to achieve better content alignment and shape preservation.

In the beginning, SIFT [38] is widely used in various im-

age stitching algorithms [4, 13, 50, 5, 34, 25] to extract dis-

arXiv:2302.08207v2 [cs.CV] 22 Jul 2023

criminative key points and calculate adaptive warps. Then,

the line segment is proved to be another unique feature to

achieve better stitching quality and preserve linear struc-

tures [31, 49, 32, 19]. Recently, the large-scale edge is also

introduced in [10] to preserve the contour structures. Be-

sides, there is a great variety of other geometric features

that are leveraged to improve the stitching quality, such as

depth maps [33], semantic planar regions [26], etc.

Having calculated the warps, seam cutting is usually

used to remove parallax artifacts. To explore an invisible

seam, various energy functions are designed using colors

[22], edges [35, 8], salient maps [30], depth [6], etc.

From the broad usage of geometric features, a clear de-

veloping trend has been discovered: increasingly sophisti-

cated features are leveraged. We ask: are these complex

designs practical in real applications? We attempt to an-

swer this question from two perspectives. 1) These elabo-

rate algorithms with complicated geometric features poorly

adapt to scenes without sufﬁcient geometric structures, such

as medical images, industrial images, and other natural im-

ages with low texture (Fig.9b), low light or low resolution.

2) When there exist abundant geometric structures, the run-

ning speed is intolerant (please refer to Table 2,3 for detail).

Such a trend seems to violate the “practical” original intent.

Recently, deep stitching technologies using convolu-

tional neural networks (CNNs) have aroused widespread

attention in the community. They abandon geometric fea-

tures and head for high-level semantic features that can be

adaptively learned in a data-driven pattern in a supervised

[24, 40, 44, 47, 23], weakly-supervised [46], or unsuper-

vised [41] manner. Although they are robust to various nat-

ural or unnatural conditions, they cannot handle large paral-

lax and demonstrate unsatisfactory generalization in cross-

dataset and cross-resolution conditions. A large-parallax

case is shown in Fig.9a, where the tree is in the middle of

the car in the reference image while it is on the left in the

target image. To deal with parallax, UDIS [41] reconstructs

stitched images from feature to pixel. However, the parallax

is so large that undesired blurs are produced as a side effect.

In this paper, we propose a parallax-tolerant unsuper-

vised deep image stitching technique, addressing the robust-

ness issue in traditional stitching and the large-parallax is-

sue in deep stitching simultaneously. Actually, the proposed

deep learning-based solution is naturally robust to various

scenes due to effective semantic feature extraction. Then,

it overcomes the large parallax via two stages: warp and

composition. In the ﬁrst stage, we propose a robust and

ﬂexible warp to model the image registration. Particularly,

we simultaneously parameterize homography transforma-

tion and thin-plate spline (TPS) transformation as uniﬁed

representations in a compact framework. The former offers

a global linear transformation, while the latter produces lo-

cal nonlinear deformation, allowing our warp to align im-

ages with parallax. Besides, this warp contributes to both

content alignment and shape preservation simultaneously

via combined optimization of alignment and distortion. In

the second stage, the existing reconstruction-based method

[41] treats artifact elimination as a reconstruction process

from feature to pixel, leading to inevitable blurs around the

parallax regions. To overcome this drawback, we cooper-

ate the motivation of seam-cutting into deep composition

and implicitly ﬁnd a “seam” through unsupervised learn-

ing for seam-driven composition masks. To this end, we

design boundary and smoothness constraints to restrict the

endpoints and route of a “seam”, compositing the stitched

image seamlessly. In addition to the two stages, we de-

sign a simple iterative strategy to enhance the generaliza-

tion, rapidly improving the registration performance of our

warp in different datasets and resolutions.

Furthermore, we conduct extensive experiments about

the warp and composition, demonstrating our superiority to

other SoTA solutions. The contributions center around:

• We propose a robust and ﬂexible warp by parameteriz-

ing the homography and thin-plate spline into uniﬁed

representations, realizing unsupervised content align-

ment and shape preservation in various scenes.

• A new composition approach is proposed to generate

seamless stitched images via unsupervised learning for

composition masks. Compared with the reconstruc-

tion [41], our composition eliminates parallax artifacts

without introducing undesirable blurs.

• We design a simple iterative strategy to enhance warp

adaption in different datasets and resolutions.

2. Related Work

2.1. Traditional Image Stitching

Adaptive warp. AutoStitch [4] leveraged SIFT [38] to

extract discriminative keypoints to construct a global ho-

mography transformation. After that, SIFT becomes an in-

dispensable feature to calculate various ﬂexible warps, such

as DHW [13], SVA [36] APAP [50], ELA [28], TFA [27]

for better alignment, SPHP [5], AANAP [34], GSP [7] for

better shape preservation. Then, DFW [13] adopted line

segments extracted by LSD [48] with keypoints together to

enrich structural information in artiﬁcial environments. Fur-

thermore, line-guided mesh deformation [49] is designed

by optimizing an energy function of various line-preserving

terms [32, 19]. To preserve the nonlinear structures, the

edge features are used in GES-GSP [10] to achieve a smooth

transition between local alignment and structural preserva-

tion. In addition to these basic geometric features (point,

line, and edge), the depth maps and semantic planes are

also used to assist the feature matching using extra depth

consistency [33] and planar consensus [26].

Seam cutting. The seam cutting is usually used as

a post-processing operation to composite stitched images,

which introduces an optimization problem of label assign-

ment along the seam. To obtain a plausible stitched result,

an extensive range of energy terms are deﬁned by penal-

izing photometric differences, such as the Euclidean-metric

color difference [22], gradient difference [1, 8], motion- and

exposure-aware difference [11], salient difference [30], etc.

Then these energy functions are minimized via graph-cut

optimization [22]. Besides that, seam cutting is also ap-

plied in image alignment to ﬁnd the best alignment warp

with minimal seam-based cost [14, 51, 35, 29].

These complex geometric features are beneﬁcial in nat-

ural scenes with adequate geometric structures. However,

there are two drawbacks: 1) Without sufﬁcient geomet-

ric structures, the strict feature requirements yield inferior

stitching quality, even failure. 2) With excessive geometric

structures, the computational cost leaps dramatically.

2.2. Deep Image Stitching

In contrast, deep stitching schemes are free from end-

less designs of geometric features. They learn to capture

high-level semantic features from extensive data automati-

cally in a supervised [24, 40, 44, 47, 23], weakly-supervised

[46], or unsupervised [41] fashion, making them robust to

various challenging scenes. Among them, the unsupervised

one [41] is more popular due to the unavailability of real

stitched labels. However, it cannot handle large parallax

due to the limitation of the homography-based alignment

model. The subsequent reconstruction would bring unde-

sirable blurs around parallax regions.

3. Methodology

The overview of our method is shown in Fig.2, where the

proposed framework is composed of two stages: warp and

composition. In the ﬁrst stage, our method takes a reference

image (I

) and a target image (I

) with overlapping regions

as input, and regresses a robust and ﬂexible warp. Then the

warped images (I

, I

) are input to the second stage to

predict composition masks (M

, M

). The stitched image

(S) can be seamlessly composited as follows:

S = M

× I

+ M

× I

. (1)

3.1. Unsupervised Warp Construction

3.1.1 Warp Parameterization

The homography transformation is an invertible mapping

from one image to another with 8 degrees of freedom: each

two for translation, rotation, scale, and lines at inﬁnity. To

guarantee the non-singularity [39] in a regression network,

it is commonly parameterized as the motions of four ver-

tices [9], which is solved as a 3 × 3 matrix using DLT [15].

However, if a non-planar scene is captured by cameras

with different shooting centers, the homography fails to

achieve accurate alignment. To solve it, the mesh-based

multi-homography scheme [50] is usually used in tradi-

tional stitching algorithms. But it cannot be efﬁciently par-

allel accelerated, which means it fails to be used in a deep

learning framework [43, 42]. Please refer to Section 2.3 of

the supplementary material for speciﬁc analysis. To over-

come this issue, we propose to leverage TPS transformation

[3, 18] to achieve efﬁcient local deformation.

TPS transformation is a nonlinear, ﬂexible transforma-

tion that is usually used to approximate the deformation of

non-rigid objects using a thin plate. It is determined by two

sets of control points, with a one-to-one correspondence be-

tween a ﬂat image and a warped image. Denote N con-

trol points on a ﬂat image as P = [p

, ..., p

]

and corre-

sponding points on the warped image as P

′

= [p

′

, ..., p

′

]

, p

′

∈ R

2×1

). By minimizing an energy function con-

sisting of a data term and a distortion term [20] (refer to

Section 2.1 of the supplementary material for more details),

the TPS transformation can be parameterized as Eq.2:

′

= T (p) = C + Mp +

i=1

O(∥ p − p

∥

), (2)

where p is an arbitrary point on the ﬂat image and p

′

the corresponding point on the warped image. C ∈ R

2×1

M ∈ R

2×2

, and w

∈ R

2×1

are the transformation pa-

rameters. O(r) = r

logr

is a radial basis function that

indicates the impact of each control point on p. To solve

these parameters, we formulate N data constraints using N

pairs of control points according to Eq.2, and impose extra

dimensional constraints [20] as described in Eq.3:

i=1

= 0 and

i=1

= 0. (3)

Then, these constraints can be rewritten in the form of ma-

trix calculation and the parameters can be solved as follows:













P K

0 0

0 0 P





−1





′





, (4)

where

is a N × 1 all-one matrix. Each element k

K ∈ R

N×N

is determined by O(∥ p

− p

∥

), and W =

, ..., w

]

Similar to the 4-pt parameterization of the homography,

TPS transformation can also be parameterized as the mo-

tions of control points. In this work, we deﬁne (U + 1) ×

(V +1) control points being evenly distributed on the target

image, and then predict the motions of each control point.

To bridge the global homography warp with the local TPS

warp, we regress the homography transformation ﬁrst to

provide initial motions of control points. Then we can pre-

dict the residual motions for further ﬂexible deformation.

剩余19页未读，继续阅读

评论收藏

内容反馈

Seung-YimYau

粉丝: 303
资源: 16

（ICCV 2023）Parallax-Tolerant Unsupervised Deep Image Stitching

最新资源

（ICCV 2023）Parallax-Tolerant Unsupervised Deep Image Stitching

Image-stitching

基于并行容忍的无监督深度图像拼接python实现源码+项目说明（使用UDIS-D数据集、无监督图像变形和合成）.zip

image_stitching

基于UDIS-D数据集+无监督的深度图像拼接方案python源码+数据集下载链接（可拼接特征到图像）.zip

Deshpande-Learning-Large-Scale-Automatic-ICCV-2015-paper

ICCV-2019-Paper-Statistics:接受率的统计和可视化，ICCV 2019主要计算机视觉会议（ICCV）接受论文的关键词

ICCV2013-Hybrid Deep Learning for Face Verification

深度学习三维重建 SurfaceNet-ICCV-2017（源码+原文）

ICCV 2023 论文和开源项目合集项目资源.zip

ICCV 图像分割

深度学习三维重建 PointMVSNet-ICCV-2019（源码、原文、译文、批注）

Su_Pose-Driven_Deep_Convolutional_ICCV_2017_paper-耿韶松1

Unsupervised-Adaptation-for-Deep-Stereo:“用于深度立体声的无监督自适应”的代码-ICCV17

“用于对象检测的深度直接训练脉冲神经网络”的正式实现（ICCV2023）.zip

Python_光胶局部特征匹配在光速ICCV 2023.zip

Python_了解领先的计算机视觉会议ICCV 2023的前沿研究，了解最新的计算机视觉和深度学习，包括代码支持视觉智.zip

ICCV2019-LearningToPaint：ICCV2019-一种绘画AI，可以使用深度强化学习逐笔复制绘画

Python_ICCV 2023 ProPainter改进视频喷漆的传播和变压器.zip

ICCV2017 Person Re-Id

python_iccv15.zip_GT3_image

Intrinsic3D -高质量的3D重建通过联合外观和几何优化与空间变化的照明(ICCV 2017) - NVlabs/ Intrinsic3D

SC-FEGAN：具有用户草图和颜色的面部编辑生成对抗网络（ICCV2019）-Python开发

Image Stitching.pdf

WB color gainer：可以提高图像分类和语义分割精度的WB增强器（ICCV 2019）-matlab开发

matlab逆变换代码-Pytorch-Implemented-Deep-SR-ITM:一个Pytorch实现了DeepSR-ITM(ICCV

Self-Guided-Network-for-Fast-Image-Denoising:ICCV 2019论文SGN的PyTorch实施

MCWNNM_ICCV2017-master.zip_MCWNNM_加权核范数_图像 去噪_图像去噪_联合开发

2015ICCV_person_re-id

最新资源

MCWNNM_ICCV2017-master.zip_MCWNNM_加权核范数_图像去噪_图像去噪_联合开发