Parallax-Tolerant Image Stitching with Epipolar Displacement Field
Jian Yu, Yi Yu and Feipeng Da
Southeast University
{yujian, yuyi, dafp}@seu.edu.cn
Abstract
Large parallax image stitching is a challenging task. Ex-
isting methods often struggle to maintain both the local and
global structures of the image while reducing alignment
artifacts and warping distortions. In this paper, we pro-
pose a novel approach that utilizes epipolar geometry to
establish a warping technique based on the epipolar dis-
placement field. Initially, the warping rule for pixels in
the epipolar geometry is established through the infinite ho-
mography. Subsequently, Subsequently, the epipolar dis-
placement field, which represents the sliding distance of the
warped pixel along the epipolar line, is formulated by thin
plate splines based on the principle of local elastic defor-
mation. The stitching result can be generated by inversely
warping the pixels according to the epipolar displacement
field. This method incorporates the epipolar constraints
in the warping rule, which ensures high-quality alignment
and maintains the projectivity of the panorama. Qualitative
and quantitative comparative experiments demonstrate the
competitiveness of the proposed method in stitching images
large parallax.
1. Introduction
Image stitching is a powerful technique that has already
made significant contributions in various fields such as au-
tonomous driving, medical imaging, surveillance video, and
virtual reality. It involves combining multiple images with
a limited field of view to create a scene with a wider field of
view. Despite the significant progress made in image stitch-
ing techniques over the past few decades, generating high-
quality panoramic images remains a challenge, particularly
when dealing with images with large parallaxes.
Image stitching commonly employs a 3×3 homography
matrix, which represents a 2D projection transform, for im-
age alignment. However, real-world scenes are often non-
planar, or the viewpoints are not co-located, rendering a sin-
gle global homography projection model inadequate in de-
scribing the required transformations. As a result, image
misalignment or ghosting effects may occur.
To mitigate parallax artifacts, representative existing
methods include adaptive warping algorithms and shape
preservation methods. Adaptive warping algorithms seg-
ment the image and employ distinct warping models [6, 12,
14, 32, 34] for different regions to optimize the warping pro-
cess using an energy minimization framework, thus reduc-
ing parallax artifacts [14, 34]. Nevertheless, the use of mul-
tiple homography transformations may introduce inconsis-
tencies among the perspective transformations, which can
impact the natural appearance of the overall stitched image.
Shape preservation methods aim to maintain both local
and global geometric formations by leveraging geometric
features, leading to improved stitching outcomes. Promi-
nent geometric features, including frequently used feature
points and line segments that retain the linear structure of an
image, form substantial constraints for homography estima-
tion when combined in image stitching [9, 15, 16, 30]. Ad-
ditionally, more intricate geometric features, such as edge
contours [5], depth maps [17] and semantic plane regions
[13], are employed when designing diverse energy func-
tions to enhance content alignment and shape preservation.
Nonetheless, the suitability of these intricate designs in real-
world applications necessitates considering factors like the
availability of adequate geometric support for the scene and
the computational efficiency of the algorithm.
Recently, techniques utilizing Convolutional Neural Net-
works (CNNs) for accurate homography estimation and
stitching have emerged gradually. These methods discard
geometric features in favor of high-level semantic features
that can be flexibly learned using supervised [21, 24, 27],
weakly supervised [28], or unsupervised [23] approaches.
Although these methods deliver robust performance in
stitching images with small baselines, they encounter dif-
ficulties when handling substantial parallax and conditions
involving different datasets and resolutions.
Upon examining the aforementioned methods, it is evi-
dent that they are no longer reliant on fixed projection mod-
els. Instead, they adapt the models based on image data or
salient geometric features in a data-driven manner, aiming
to accurately align with the data. This approach effectively
eliminates artifacts, but it also raises the possibility of vio-
arXiv:2311.16637v1 [cs.CV] 28 Nov 2023