Segmentation-based Stereo and Motion with Occlusions

Thesis by Michael Bleyer

Abstract

Given two images recorded from slightly different perspectives, a shape-from-stereo approach identifies corresponding points in both images that are projections of the same point in the scene. In a standard stereo setup, such corresponding pixels are known to lie on the same horizontal scanline, so that this correspondence problem is reduced to a one-dimensional search task. The offset between x-coordinates in the left and right images is then referred to as disparity, and a pixel's disparity is inversely proportional to the pixel's depth. However, assigning each point to its correct disparity is a fundamental problem in computer vision. Although there is a large body of literature, common stereo methods still show poor performance in some image areas. Firstly, matching often fails in the absence of discriminative image features that can be uniquely matched in the other view (untextured regions). Secondly, some pixels' matching points are occluded in the second image. Since occlusions occur at disparity discontinuities, it is specifically challenging to precisely outline object boundaries. A large number of stereo algorithms fail in this respect, since the fact that there are occlusions is simply ignored.

In this thesis, we propose two novel stereo algorithms that tackle the inherent problems in stereo matching by dividing the reference image into segments of homogeneous colour. We assume that disparity inside such segments varies smoothly, while disparity discontinuities coincide with the segment borders. Both algorithms make use of a layered representation and model the stereo task as a two step problem. In the first (layer extraction) step, we answer the question: What are the dominant disparity planes (which we refer to as layers) likely to occur in the scene? These layers are extracted by clustering a set of initial disparity segments. In the second (layer assignment) step, we then focus on the more difficult question: Which part of the image is covered by which layer and where do occlusions occur? For the first algorithm presented in this thesis, we develop a novel global cost function that measures the goodness of an assignment of segments to layers by image warping. We show that image warping can as well be used to detect the occlusions in both images. This cost function is then optimized by a greedy algorithm, which is computationally efficient, but can get trapped in a local optimum. In order to overcome this weakness, we present a second algorithm for the layer assignment task that makes use of a recent robust optimization scheme, namely graph-cuts. The novelty of this approach lies in that we show how segmentation-based stereo matching can be formulated in a graph-cut approach with explicitly modelling occlusions. Both methods are then extended to the closely related optical flow (or motion) problem. As opposed to stereo, the displacement vector for this problem is a two-dimensional one.

In the experimental results, we demonstrate that our methods produce good-quality results, especially in regions of low texture and close to disparity/motion boundaries. Moreover, our stereo algorithms show excellent results on the Middlebury stereo evaluation website.

Reference

PhD thesis, Vienna University of Technology