Interactive Media Systems, TU Wien

Spatio-temporal Video Analysis for Semi-automatic 2D-to-3D Conversion

Thesis by Nicole Brosch

Supervision by Margrit Gelautz and Markus Rupp


This thesis addresses the problem of cost-efficiently converting monoscopic (2D) videos to stereoscopic (3D) videos. Common practices to perform such a 2D-to-3D conversion are labor-intensive manual conversions, which are typically used for high-quality 3D cinema productions, and fully-automatic conversions of lower conversion quality, which may be integrated into, e.g.,(auto-)stereoscopic TVs. In this thesis we focus on semi-automatic 2D-to-3D conversions, which can be seen as a compromise between fully-automatic and manual techniques. Such approaches are typically based on sparse user-given disparity (or depth) information, which is propagated to each pixel in a 2D video by assuming a color constancy model. This process ideally requires only minimal user input and efficiently generates disparity maps of high conversion quality, which are suitable for rendering a second 2D video that completes the 3D video. In order to avoid common artifacts related to such propagations, e.g., over-smoothed results and spatio-temporal or perceptual incoherencies, we exploit spatio-temporal segmentation information. The thesis presents two novel semi-automatic 2D-to-3D conversion algorithms that view segmentation as an integral part of the conversion process and are based on comfortable user input in the form of sparse scribbles drawn in the first (and last) frame of a 2D video. Our first 2D-to-3D conversion algorithm tackles 2D-to-3D conversion and segmentation in a joint approach. It propagates available disparities between neighboring pixels while assigning them to the same segment. In this manner, our algorithm generates disparity maps that capture object borders in the 2D video and contain smooth disparity changes within segments and over time, which is challenging for currently available algorithms. We also provide a scalable implementation that achieves interactive runtimes of one frame per second (resolution of approximately 0.3 megapixels). The second 2D-to-3D conversion algorithm takes a step towards the generation of perceptually coherent disparity maps. In particular, it enables temporal disparity interpolations that are performed in accordance with motion-caused occlusions between segments. This results in spatio-temporally coherent disparity maps in which disparities of moving objects harmonize with those of nearby objects. The presented segmentation algorithm, used in the conversion algorithm, relies on a spatio-temporal filtering scheme and, thus, achieves fast processing speeds (250 frames per second for a video with a resolution of approximately 0.2 megapixels per frame). We compare our own algorithms with different semi-automatic 2D-to-3D conversion algo-rithms suggested in the literature and achieve results of high conversion quality. In this context, our algorithms outperform a well-established conversion algorithm. As opposed to most earlier studies, our final evaluation study is performed under consideration of different scribbling strate-gies and provides practical insights into the annotation process by investigating the performance of various scribble placement techniques in conjunction with different 2D image content.


N. Brosch: "Spatio-temporal Video Analysis for Semi-automatic 2D-to-3D Conversion"; Supervisor, Reviewer: M. Gelautz, M. Rupp; Institut für Softwaretechnik und Interaktive Systeme, 2016; oral examination: 10-25-2016.

Additional Information

A higher-resolution version of the dissertation (24MB) is available under Downloads below.


Dissertation Nicole Brosch (24MB) 22.8 MB PDF document Download


Click into the text area and press Ctrl+A/Ctrl+C or ⌘+A/⌘+C to copy the BibTeX into your clipboard… or download the BibTeX.