Interactive Media Systems, TU Vienna

Media Processing

Contact: Horst Eidenberger

Our ultimate media processing goal is human-like understanding of multimedia content. We extract semantically relevant patterns from audio, biosignals, images, text, video and other data types and set them into context by intelligent categorization.

The machine of the 21st century perceives media with Mozart's eye and Van Gogh's ear. Every day, we work on reversing that.


In our teaching and research we intend to bridge the gap between works which focus exclusively on a single type of media. We do this by comparing methods employed in different domains, emphasizing their communalities and grouping them by fundamental types of strategies. Our publications provide a thorough introduction into the research areas of computer science that deal with the content analysis and categorization of digital media, including audio retrieval, biosignal processing, content-based image retrieval, environmental sound classification, face recognition, genome analysis, music genre classification, speech recognition, technical stock analysis, text retrieval, video analysis and video surveillance, to name a few. We summarize these areas under media understanding, since we realize that they share some very important properties:

  1. They exploit digital signals.
  2. Signals are summarized by signal processing.
  3. Summaries are classified by machine learning algorithms.

Digital audio, biosignals, digital images and digital video are data sources that have been investigated in signal processing for many years. Text and bioinformation, on the contrary, are usually not considered appropriate input for signal processing operations. Closer investigation, however, shows that the summarization methods employed on text and, for example, gene strings are comparable to sample-based signal processing operations. In short, media understanding aims at the imitation of the sensual pattern recognition capabilities of the human being.

Media understanding wants to achieve more than just summarization: the computational understanding of media content that is comparable to the understanding of humans. Therefore, machine learning algorithms are employed for the interpretation of digital media summaries. No matter if the data type is audio, image, video, text or some other, machine learning algorithms employ the same learning and classification strategies. Hence, very similar methods are, for example, used in structural alignment of gene sequences and the classification of video events based on prototypes.

Featured Aspects of Our Work

More information about our research is available on the projects and publications pages.