Efficient Large-Scale Stereo Matching 3
smoothness implicitly. Optimization is usually performed using a winner-takes-
all strategy, which selects for each pixel the disparity with the smallest value
under some distance metric [2]. Weber et al. [3] achieved real-time performance
using the Census transform and a GPU implementation. However, as illustrated
by Fig. 1, traditional local methods [11] often suffer from border bleeding effects
or struggle with correspondence ambiguities. Approaches based on adaptive sup-
port windows [12, 13] adjust the window size or adapt the pixel weighting within
a fixed-size window to improve performance, especially close to border discon-
tinuities. Unfortunately, since for each pixel many weight factors have to be
computed, these methods are much slower than fixed-window ones [13].
Dense and accurate matching can be obtained by global methods, which en-
force smoothness explicitly by minimizing an MRF-based energy function which
can be decomposed as the sum of a data fitting term and a regularization term.
Since for most energies of practical use such an optimization is NP-hard, approx-
imate algorithms have been proposed, e.g. graph-cuts [4, 5], belief propagation
[6]. Klaus et al. [14] extend global methods to use mean-shift color segmentation,
followed by belief propagation on super-pixels. In [15], a parallel VLSI hardware
design for belief propagation that achieves real time performance on VGA im-
agery was proposed . The application of global methods to high-resolution images
is, however, limited by their high computational and memory requirements, es-
pecially in the presence of large disparity ranges. Furthermore, models based
on binary potentials between pixels favor fronto-parallel surfaces which leads to
errors in low-textured slanted surfaces. Higher order cliques can overcome these
problems [7], but they are even more computationally demanding.
Hirschm¨uller proposed semi-global matching [16], an approach which extends
polynomial time 1D scan-line methods to propagate information along 16 orien-
tations. While reducing streaking artifacts and improving accuracy compared to
traditional methods based on dynamic programming, computational complex-
ity increases with the number of computed paths. ‘ground control points’ are
used in [17] to improve the occlusion cost sensitivity of dynamic programming
algorithms. In [18, 19] disparities are ‘grown’ from a small set of initial corre-
spondence seeds. Though these methods produce accurate results and can be
faster than global approaches, they do not provide dense matching and strug-
gle with textureless and distorted image areas. Approaches to reduce the search
space have been investigated for global stereo methods [10, 20]. However, they
mainly focus on memory requirements and start with a full search using local
methods first. Furthermore, the use of graph-cuts imposes high computational
costs particularly for large-scale imagery.
In contrast, in this paper we propose a Bayesian approach to stereo matching
that is able to compute accurate disparity maps of high resolution images at
frame rates close to real time without the need for global optimization. The
remainder of this paper is structured as follows: In Section 3 we describe our
approach to efficient large-scale stereo matching. Experimental results on real-
world datasets and comparisons to a variety of other methods on large-scale