The high computational demands of state-of-the-art video coding standards such as H.264 pose serious challenges on embedded processor architectures. A natural way to tackle this problem is the use of multi-processor systems. However, the efficient distribution of complex video coding algorithms among multiple processing units (PUs) is a non-trivial task. In order to use the available processing resources efficiently, an equally balanced distribution of the coding algorithm onto the hardware units must be found. The system designer has to consider data-dependency issues as well as inter-communication and synchronization between the PUs. Furthermore, efficient software design is necessary in order to satisfy the resource limitations in an embedded environment, such as low computational power, small-sized on-chip memories and low bus bandwidth. A parallel video coding implementation for an embedded system must be able to work under these resource restrictions. Being able to predict the resource requirements of a parallel video coding application (VCA) is therefore essential during the design of a video coding system (VCS) considering these strict requirements on runtime performance and resource usage. This thesis contributes novel methods to support the complex design process of parallel VCS in an early phase of system design when highly critical decisions on hardware and software are made. The contributions of this thesis can be summarised as follows. (i) We propose the Data-Driven Profiling (DDP) method for analysing and visualizing the runtime complexity of a VCS. This method maps traditional runtime profilings onto the coding elements and functional blocks of a video coding algorithm. It enables the system designer to relate runtime complexity with the application levels where parallelisation takes place and introduces means for analysing the workload distribution. (ii) We demonstrate how to exploit DDPs for analysing complexity and deriving essential information for parallel system design. Assumptions about the performance of a VCA on a parallel architecture can be made, potential problems in work balancing identified and complexity variations in the functional blocks of a VCA´s video coding elements analysed. (iii) We introduce the Partition Assessment Simulation (PAS) methodology for enabling the exploration of complex parallel VCS designs. This methodology exploits the structural and functional similarities of modern video coding algorithms for predicting a VCA´s runtime on a "virtual" architecture. (iv) We implement a simulator for the PAS concept. By modelling and simulating an existing multi-processor platform, the PAS methodology is verified. We demonstrate the flexibility of the PAS to simulate complex parallel video coding platforms and to explore new parallel designs for functional as well as data-parallel H.264 decoder partitioning methods. We believe that the contributed techniques enable system designers to address the challenges of parallel VCS design in an intuitive and time-efficient way leading to application-tailored and cost-competitive VCS.
F. Seitner: "Virtual HW/SW Prototyping for Design and Runtime Prediction of Parallel Video Coding Systems"; Supervisor, Reviewer: M. Gelautz, B. Rinner; Institut für Softwaretechnik und interaktive Systeme, 2013; oral examination: 11-25-2013.
A compressed version of the dissertation (<5MB) is available under Downloads below.
|Dissertation Florian Seitner (compressed)||4.79 MB||PDF document||Download|
Click into the text area and press Ctrl+A/Ctrl+C or ⌘+A/⌘+C to copy the BibTeX into your clipboard… or download the BibTeX.