Interactive Media Systems, TU Wien: Publications

mHealthINX - The mental Health experience concept

2021-12-20T17:27:36+01:00

by Miroslav Sili, Martin Bachler, Elisabeth Broneder, Réne Luigies, and Niklas-Aron Hungerländer

M. Sili, M. Bachler, E. Broneder, R. Luigies, N. Hungerländer: "mHealthINX - The mental Health experience concept"; accepted as talk for: 4th International Conference on Human Systems Engineering and Design: Future Trends and Applications (IHSED 2021), Dubrovnik; 09-23-2021 - 09-25-2021; in: "Human Systems Engineering and Design 4", T. Ahram, W. Karwowski (ed.); Springer, 4 (2021), ISSN: 2194-5365; 10 pages.

StARboard & TrACTOr: Actuated Tangibles in an Educational TAR Application

2021-12-20T17:27:31+01:00

by Emanuel Vonach, Christoph Schindler, and Hannes Kaufmann

We explore the potential of direct haptic interaction in a novel approach to Tangible Augmented Reality in an educational context. Employing our prototyping platform ACTO, we developed a tabletop Augmented Reality application StARboard for sailing students. In this personal viewpoint environment virtual objects, e.g., sailing ships, are physically represented by actuated micro robots. These align with virtual objects, allowing direct physical interaction with the scene. When a user tries to pick up a virtual ship, its physical robot counterpart is grabbed instead. We also developed a tracking solution TrACTOr, employing a depth sensor to allow tracking independent of the table surface. In this paper we present concept and development of StARboard and TrACTOr. We report results of our user study with 18 participants using our prototype. They show that direct haptic interaction in tabletop AR scores en-par with traditional mouse interaction on a desktop setup in usability (mean SUS = 86.7 vs. 82.9) and performance (mean RTLX = 15.0 vs. 14.8), while outperforming the mouse in factors related to learning like presence (mean 6.0 vs 3.1) and absorption (mean 5.4 vs. 4.2). It was also rated the most fun (13× vs. 0×) and most suitable for learning (9× vs. 4×).

E. Vonach, C. Schindler, H. Kaufmann: "StARboard & TrACTOr: Actuated Tangibles in an Educational TAR Application"; Multimodal Technologies and Interaction (invited), 5 (2021), 2.

Local projections for high-dimensional outlier detection

2020-12-29T17:27:31+01:00

by Thomas Ortner, Peter Filzmoser, Maia Rohm, Sarka Brodinova, and Christian Breiteneder

A novel approach for outlier detection is proposed, called local projections, which is based on concepts of the Local Outlier Factor (LOF) (Breunig et al. in Lof: identifying densitybased local outliers. In: ACM sigmod record, ACM, volume 29, pp. 93-104, 2000) and ROBPCA (Hubert et al. in Technometrics 47(1):64-79, 2005). By using aspects of both methods, this algorithm is robust towards noise variables and is capable of performing outlier detection in multi-group situations. The idea is to focus on local descriptions of the observations and their neighbors using linear projections. The outlyingness of an observation is determined by a weighted distance of the observation to all identified projection spaces, with weights depending on the appropriateness of the local description. Experiments with simulated and real data demonstrate the usefulness of this method when compared to existing outlier detection algorithms.

T. Ortner, P. Filzmoser, M. Rohm, S. Brodinova, C. Breiteneder: "Local projections for high-dimensional outlier detection"; METRON, 1 (2021), 79; 18 pages.

Teaching Digital Fabrication to Early Intervention Specialists for Designing Their Own Tools

2020-11-04T17:27:36+01:00

by Florian Güldenpfennig, Peter Fikar, and Roman Ganhör

We taught basic principles of digital fabrication to four early intervention therapists that were specialized in the training of children with cerebral visual impairment and related disabilities. Here, our intention was threefold. First, we wanted to engage in digital fabrication together with the therapists to `kick-off´ a co- design project and get to know them; the project was about creating therapeutic toys, and we hadn´t met our participants or co-designers before. Second, we wanted to give them an impression of the tools we use and the sorts of designs that we are capable of producing in the course of such a one-year design project. Third, we aimed at generating a first set of design ideas. In this paper, we show in which ways teaching digital fabrication enabled us to accomplish these goals. Interestingly, we did not anticipate one of our most interesting findings. - As it turned out, the therapists continued creating their own designs after the project was completed, drawing on their newly developed digital fabrication skills. Hence, as a fourth outcome, we `accidently´ empowered the participants to address their problems independently.

F. Güldenpfennig, P. Fikar, R. Ganhör: "Teaching Digital Fabrication to Early Intervention Specialists for Designing Their Own Tools"; Poster: Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility, Lisboa (Online due to COVID-19); 10-26-2020 - 10-28-2020; in: "ASSETS 2020", (2020), 5 pages.

Robust and sparse k-means clustering in high dimension

2020-11-04T17:27:39+01:00

by Peter Filzmoser, Sarka Brodinova, Thomas Ortner, Christian Breiteneder, and Maia Rohm

We introduce a robust k-means-based clustering method for high-dimensional data where not only outliers but also a large number of noise variables are very likely to be present [4]. Although Kondo et al. [2] already addressed such an application scenario, our approach goes even further. Firstly, the introduced method is designed to identify clusters, informative variables, and outliers simultaneously. Secondly, the proposed clustering technique additionally aims at optimizing required parameters, e.g. the number of clusters. This is a great advantage over most existing methods. Moreover, the robustness aspect is achieved through a robust initialization [3] and a proposed weighting function using the Local Outlier Factor [1]. The weighting function provides a valuable source of information about the outlyingness of each observation for a subsequent outlier detection. In order to reveal both clusters and informative variables properly, the approach uses a lasso-type penalty [5]. The method has thoroughly been tested on simulated as well as on real high-dimensional datasets. The conducted experiments demonstrated a great ability of the clustering method to identify clusters, outliers, and informative variables.

P. Filzmoser, S. Brodinova, T. Ortner, C. Breiteneder, M. Rohm: "Robust and sparse k-means clustering in high dimension"; Talk: Seminarvortrag an der JKU Linz, Linz (invited); 01-23-2020 - 01-24-2020.

Immersive training of first responder squad leaders in untethered virtual reality

2021-01-08T17:27:31+01:00

by Annette Mossel, Christian Schönauer, Mario Froeschl, Andreas Peer, Johannes Göllner, and Hannes Kaufmann

We present the VROnSite platform that supports immersive training of first responder units´ on-site squad leaders. Our training platform is fully immersive, entirely untethered to ease use and provides two means of navigation-abstract and natural walking-to simulate stress and exhaustion, two important factors for decision making. With the platform´s capabilities, we close a gap in prior art for first responder training. Our research is closely interlocked with stakeholders from multiple fire brigades to gather early feedback in an iterative design process. In this paper, we present the system´s design rationale, provide insight into the process of training scenario development and present results of a user study with 41 squad leaders from the firefighting domain. Virtual disaster environments with two different navigation types were evaluated using quantitative and qualitative measures. Participants considered our platform highly suitable for training of decision making in complex first responder scenarios and results show the importance of the provided navigation technologies in this context.

A. Mossel, C. Schönauer, M. Froeschl, A. Peer, J. Göllner, H. Kaufmann: "Immersive training of first responder squad leaders in untethered virtual reality"; Virtual Reality, 204 (2020), 15 pages.

Robust k-means-based clustering for high-dimensional data

2019-11-14T17:27:36+01:00

by Peter Filzmoser, Sarka Brodinova, Thomas Ortner, Christian Breiteneder, and Maia Rohm

We introduce a robust k-means-based clustering method for high-dimensional data where not only outliers but also a large number of noise variables are very likely to be present. Although Kondo et al. [2] already addressed such an application scenario, our approach goes even further. Firstly, the introduced method is designed to identify clusters, informative variables, and outliers simultaneously. Secondly, the proposed clustering technique additionally aims at optimizing required parameters, e.g. the number of clusters. This is a great advantage over most existing methods. Moreover, the robustness aspect is achieved through a robust initialization [3] and a proposed weighting function using the Local Outlier Factor [1]. The weighting function provides a valuable source of information about the outlyingness of each observation for a subsequent outlier detection. In order to reveal both clusters and informative variables properly, the approach uses a lasso-type penalty [4]. The method has thoroughly been tested on simulated as well as on real highdimensional datasets. The conducted experiments demonstrated a great ability of the clustering method to identify clusters, outliers, and informative variables.

P. Filzmoser, S. Brodinova, T. Ortner, C. Breiteneder, M. Rohm: "Robust k-means-based clustering for high-dimensional data"; Talk: International Conference on Robust Statistics (ICORS 2019), Guayaquil (invited); 05-28-2019 - 05-31-2019.

Towards Eye-Friendly VR: How Bright Should It Be?

2019-04-11T17:27:35+02:00

by Khrystyna Vasylevska, Hyunjin Yoo, Tara Akhavan, and Hannes Kaufmann

Visual information plays an important part in the perception of the world around us. Recently, head-mounted displays (HMD) came to the consumer market and became a part of everyday life of thousands of people. Like with the desktop screens or hand-held devices before, the public is concerned with the possible health consequences of the prolonged usage and question the adequacy of the default settings. It has been shown that the brightness and contrast of a display should be adjusted to match the external light to decrease eye strain and other symptoms. Currently, there is a noticeable mismatch in brightness between the screen and dark background of an HMD that might cause eye strain, insomnia, and other unpleasant symptoms. In this paper, we explore the possibility to significantly lower the screen brightness in the HMD and successfully compensate for the loss of the visual information on a dimmed screen. We designed a user study to explore the connection between the screen brightness in HMD and task performance, cybersickness, users´ comfort, and preferences. We have tested three levels of brightness: the default Full Brightness, the optional Night Mode and a significantly lower brightness with original content and compensated content. Our results suggest that although users still prefer the brighter setting, the HMDs can be successfully used with significantly lower screen brightness, especially if the low screen brightness is compensated.

K. Vasylevska, H. Yoo, T. Akhavan, H. Kaufmann: "Towards Eye-Friendly VR: How Bright Should It Be?"; Talk: IEEE Virtual Reality, Osaka, Japan; 03-23-2019 - 03-27-2019; in: "Proceedings of IEEE VR", (2019), 1 - 9.

Semi-automatic post-processing of multi-view 2D-plus-depth video

2019-09-12T17:27:32+02:00

by Braulio Sespede, Florian Seitner, and Margrit Gelautz

We propose a post-processing framework based on multiview interactive video segmentation for correcting 2D-plus-depth video footage. The suggested approach uses user-made scribbles to guide the multi-view segmentation process, which is based on an efficient cost-volume filtering algorithm. We extend the 2D algorithm to 3D and propose several improvements that increase precision and recall while also decreasing the need for user input. Our semi-automatic approach is supported by an interactive visualization tool that integrates both 2D and 3D views of the footage, allowing the user to explore novel views coherently and grasp a better understanding of the underlying data. We integrate our post-processing framework into a workflow for generating dynamic meshes from footage recorded by multiple stereo cameras, demonstrating the applicability of the technique.

B. Sespede, F. Seitner, M. Gelautz: "Semi-automatic post-processing of multi-view 2D-plus-depth video"; Poster: IS&T International Symposium on Electronic Imaging - Stereoscopic Displays and Applications, Burlingame, USA; 01-26-2019 - 01-30-2019; in: "IS&T International Symposium on Electronic Imaging", (2019), 5 pages.

An End-to-end System for Real-Time Point Cloud Visualization

2019-04-18T17:27:32+02:00

by Hansjörg Hofer, Florian Seitner, and Margrit Gelautz

The growing availability of RGB-D data, as delivered by current depth sensing devices, forms the basis for a variety of mixed reality (MR) applications, in which real and synthetic scene content is combined for interaction in real-time. The processing of dynamic point clouds with possible fast and unconstrained movement poses special challenges to the surface reconstruction and rendering algorithms. We propose an end-to-end system for dynamic point cloud visualization from RGB-D input data that takes advantage of the Unity3D game engine for efficient state-of-the-art rendering and platformindependence. We discuss specific requirements and key components of the overall system along with selected aspects of its implementation. Our experimental evaluation demonstrates that high-quality and versatile visualization results can be obtained for datasets of up to 5 million points in real-time.

H. Hofer, F. Seitner, M. Gelautz: "An End-to-end System for Real-Time Point Cloud Visualization"; Talk: 2018 International Conference on 3D Immersion (IC3D), Brüssel, Belgien; 12-05-2018 - 12-07-2018; in: "2018 International Conference on 3D Immersion (IC3D)", (2019), ISBN: 978-1-5386-7590-8; 8 pages.

Visual Computing in Autonomous Driving and Human-Robot Interaction

2020-01-28T12:44:30+01:00

by Margrit Gelautz

SLAMANTIC - Leveraging Semantics to Improve VSLAM in Dynamic Environments

2020-01-28T12:41:24+01:00

by Matthias Schörghuber, Daniel Steininger, Yohan Carbon, and Margrit Gelautz

In this paper, we tackle the challenge for VSLAM of handling nonstatic environments. We propose to include semantic information obtained by deep learning methods in the traditional geometric pipeline. Speciﬁcally, we compute a conﬁdence measure for each map point as a function of its semantic class (car, person, building, etc.) and its detection consistency over time. The conﬁdence is then applied to guide the usage of each point in the mapping and localization stage. Points with high conﬁdence are used to verify points with low conﬁdence in order to select the ﬁnal set of points for pose computation and mapping. Furthermore, we can handle map points whose state may change between static and dynamic (a car can be parked or in motion). Evaluating our method on public datasets, we show that it can successfully solve challenging situations in dynamic environments which cause state-of-theart baseline VSLAM algorithms to fail and that it maintains performance on static scenes. Code is available at github.com/mthz/slamantic

Digital Observation of Human Motion, Expression, and Intention

2020-01-28T12:45:07+01:00

by Margrit Gelautz

Die Entwicklung von Algorithmen und Softwarewerkzeugen zur Beobachtung und automatischen Interpretation von menschlicher Bewegung durch Videoanalyse hat sich in den letzten Jahren stark weiterentwickelt. Während ursprünglich die Beobachtung und Verfolgung von menschlicher Bewegung - beispielsweise im Zusammenhang mit Überwachungskameras oder zur Gestenerkennung bei Steuerungsaufgaben - im Vordergrund stand, beschäftigt sich die Forschung zunehmend auch mit subtileren Aspekten wie der Erkennung und Interpretation von menschlichen Gefühlen und Intentionen aus ausgenommenen Bildmaterial. Die frühzeitige Erkennung von beabsichtigten Handlungen - wie etwa der Absicht eines Fußgängers, die Straße zu überqueren - kann bei der automatischen Analyse von Verkehrsszenen oder in der Mensch-Roboter-Interaktion wertvolle Sicherheitshinweise geben. Andererseits werfen die Möglichkeiten zur Erkennung von sehr persönlichen Merkmalen wie Gefühlszuständen, Intentionen oder Bewegungsstilen aus aufgenommenem Bildmaterial neue Fragen in Hinblick auf den Schutz der Privatssphäre und den Umgang mit zugehörigem Datenmaterial auf.

Computer Vision Trends - Autonomous Driving and Human-Robot Interaction

2020-01-28T12:45:43+01:00

by Margrit Gelautz

In recent years, computer vision research has been strongly influenced by latest developments in the fields of artificial intelligence and deep learning. In this talk, we focus on computer vision algorithms for 2D and 3D environment perception in the context of assisted/autonomous driving and human-robot interaction. An important goal is to design algorithms that learn to reconstruct and interpret different types of traffic or robotic scenes based on large collections of suitable training data. Also, in the vision-based analysis of human motion and recognition of a person's expression/intention is gaining importance, in order to achieve trustworthy human-machine interaction and high user comfort. We discuss current trends and research challenges in the context of human-machine interaction along with potential societal implications.

Mapping of Realism in Rendering onto Perception of Presence in Augmented Reality

2019-11-26T17:27:37+01:00

by David Schüller-Reichl

Augmented Reality (AR) is about seamless integration of virtual computer-generated objects into the real-world view. Ideally, virtual objects should blend into the real world so that the user feels like the virtual objects are "here". The sense of something being "here" is also known as the concept of presence. Presence is especially important if an AR application is using virtual humans to interact with the user. This thesis examines if the visual realism is essential to achieve the highest possible presence in an AR application. Two hypotheses were posed to examine the effect of realism on the perception of presence and the convenience of users within AR applications. H1: "Increasing the level of realism increases the sense of presence and convenience of users.". H2: "The Uncanny Valley effect can be observed within the experiment.".

The approach of this thesis to examine these hypotheses was to conduct a user study in which the participants experienced a virtual human with a specific visual realism levels. Each visual realism level differs in geometry, texture and lights. The developed AR application included a rendering system which allows the levels of realism of the virtual human to be set. The results partially supported the first hypothesis (H1) and indicated that visual realism is an important factor to achieve a higher sense of presence within an AR application. The second hypothesis (H2) was not supported, most probably due to technical limitations which did not allow such a realistic virtual representation of a human so that the participant would believe it could be a real person.

The main novelty of this thesis is its focus on the presence of virtual humans within AR. Recent studies showed that the influence of visual realism on the sense of presence if different in the field of AR than in VR. Future presence demanding AR application can take the results of this thesis of basis to achieve a higher sense of presence. Especially, if virtual humans are used to interact with users.

D. Schüller-Reichl: "Mapping of Realism in Rendering onto Perception of Presence in Augmented Reality"; Supervisor: H. Kaufmann, P. Kán; Institute of Visual Computing and Human-Centered Technology, 2019; final examination: 03-04-2019.

Body Language in Human‐Robot Interaction

2019-12-17T10:40:45+01:00

by Darja Stoeva

D. Stoeva: "Body Language in Human‐Robot Interaction"; Poster: Summer School TrustRobots 2019, Wien; 15.09.2019 - 20.09.2019.

Camera-Based In-Cabin Monitoring - Applications and Business Cases

2019-12-17T10:50:48+01:00

by Margrit Gelautz

M. Gelautz: "Camera-Based In-Cabin Monitoring - Applications and Business Cases"; Vortrag: Forum Automatisierte Mobilität, Wien (eingeladen); 02.10.2019.

Body Language in Human‐Robot Interaction

2019-07-11T13:40:37+02:00

by Margrit Gelautz, Darja Stoeva, and Dominik Schörkhuber

Semantic Labelling of Objects in Street Scenes

2019-07-11T13:43:35+02:00

by Andreas Wittmann

An automatic and robust semantic interpretation of street scenes is required in order to improve driving assistance systems and to reach fully autonomous driving. Recent publications achieved remarkable prediction performances by using Deep Learning. However, the calculation of Neural Networks is computationally demanding. Classical Machine Learning approaches can reduce the complexity of the algorithms and computational demand. In this diploma thesis, we first give a comprehensive literature review of classical machine learning approaches for semantic scene labelling with a focus on street scenes. Furthermore, we compare pixel-wise annotated, freely available datasets of street scenes for the training and evaluation of semantic scene labelling algorithms. The main part of this thesis documents the development and implementation of our semantic scene labelling system. We implement two texture- and context-based features and calculate them on-the-fly in a random forest. We extensively evaluate the influence of the feature parameters and random forest parameters on the prediction results and compare the performance of both features. Our results show that textural features in semantically unconnected regions fail to robustly detect small objects in challenging street scenes. Providing additional information by using a combination of multiple features and a pre-segmentation of the image in semantically connected regions could possibly improve the prediction results.

Evaluation Study on Semantic Object Labelling in Street Scenes

2019-07-11T13:46:46+02:00

by Andreas Wittmann, Margrit Gelautz, and Florian Seitner

We present a processing pipeline for semantic scene labelling that was developed in view of autonomous driving applications. Our study focuses on two different methods for feature selection - Texture-layout-filter (TLF) and Single Histogram Class Models (SHCM) - whose influence on the performance of a random forest classifier is investigated. In tests on the Cityscapes dataset, we assess the effects of parameter variation and observe an improvement of the Intersection over Union score by 44 percent when substituting the TLF by the computationally more demanding SHCM feature.

Robust and sparse k-means clustering for high-dimensional data

2019-03-20T17:27:33+01:00

by Sarka Brodinova, Peter Filzmoser, Thomas Ortner, Christian Breiteneder, and Maia Rohm

S. Brodinova, P. Filzmoser, T. Ortner, C. Breiteneder, M. Rohm: "Robust and sparse k-means clustering for high-dimensional data"; Advances in Data Analysis and Classification, 1 (2019), 1 - 28.

Walkable Multi-User VR: Effects of Physical and Virtual Colocation

2019-02-11T17:27:40+01:00

by Iana Podkosova

The research presented in this dissertation focuses on multi-user VR, where multiple immersed users navigate the virtual world by physically walking in a large tracking area. In such a setup, different combinations of user colocation within the physical and the virtual space are possible. We consider a setup to be multi-user if at least one of these two spaces is shared. The dissertation starts with the classification of combinations of physical and virtual colocation. Four such combinations are defined: colocated shared VR, colocated non-shared VR, distributed shared VR and shared VR with mixed colocation. The characteristics of each of these four setups are discussed and the resulting problems and research questions outlined. The dissertation continues with the description of ImmersiveDeck - a large-scale multi-user VR platform that enables navigation by walking and natural interaction. Then, four experiments on multi-user walkable VR developed with the use of ImmersiveDeck are described. The first two experiments are set in colocated non-shared VR where walking users share a tracking space while being immersed into separate virtual worlds. We investigate users´ mutual awareness in this setup and explore methods of preventing mutual collisions between walking users. The following two experiments study shared VR scenarios in situations of varied physical colocation. We investigate the effects that different modes of physical colocation have on locomotion, collision avoidance and proxemics patterns exhibited by walking users. The sense of copresence and social presence within the virtual world reported by users is investigated as well. The experiments in the colocated non-shared VR setup show that HMD-based VR can produce immersion so strong that users do not notice others being present in their immediate proximity, thus making collision prevention the task of utmost importance. In our proposed method of displaying notification avatars to prevent potential imminent collisions between colocated users, the suitability of a particular type of notification avatar was found to be dependent on the type of scenario experienced by users. The general result of the experiments in shared VR is that physical colocation affects locomotor and proxemics behavior of users as well as their subjective experience in terms of copresence. In particular, users are more cautious about possible collisions and more careful in their collision avoidance behavior in the colocated setup compared to the real environment. In the distributed setup, conventional collision avoidance is often abandoned. vi

I. Podkosova: "Walkable Multi-User VR: Effects of Physical and Virtual Colocation"; Supervisor, Reviewer: H. Kaufmann, G. Welch, A. Chalmers; 193, 2019; oral examination: 02-08-2019.

An Evaluation Framework for Dynamic Scene Acquisitions Using Multiple Stereo Cameras

2018-12-14T17:27:49+01:00

by Christian Kapeller, Braulio Sespede, Matej Nezveda, Matthias Labschütz, Simon Flöry, Florian Seitner, and Margrit Gelautz

C. Kapeller, B. Sespede, M. Nezveda, M. Labschütz, S. Flöry, F. Seitner, M. Gelautz: "An Evaluation Framework for Dynamic Scene Acquisitions Using Multiple Stereo Cameras"; Poster: The 15th ACM SIGGRAPH European Conference on Visual Media Production, London; 12-13-2018 - 12-14-2018.

A Post-processing Tool for Multi-view 2D-plus-depth Video

2018-12-14T17:27:50+01:00

by Braulio Sespede, Florian Seitner, and Margrit Gelautz

B. Sespede, F. Seitner, M. Gelautz: "A Post-processing Tool for Multi-view 2D-plus-depth Video"; Poster: The 15th ACM SIGGRAPH European Conference on Visual Media Production, London; 12-13-2018 - 12-14-2018.

An End-to-end System for Real-Time Point Cloud Visualization

2018-12-11T17:27:39+01:00

by Hansjörg Hofer, Florian Seitner, and Margrit Gelautz

H. Hofer, F. Seitner, M. Gelautz: "An End-to-end System for Real-Time Point Cloud Visualization"; Talk: International Conference on 3D Immersion (IC3D), Brüssel; 12-05-2018; in: "Proceedings of the International Conference on 3D Immersion (IC3D)", (2018).

Co-Presence and Proxemics in Shared Walkable Virtual Environments with Mixed Colocation

2018-12-28T17:27:42+01:00

by Iana Podkosova and Hannes Kaufmann

The purpose of the experiment presented in this paper is to investigate co-presence and locomotory patterns in a walkable shared virtual environment. In particular, trajectories of users that use a walkable tracking space alone are compared to those of users who use the tracking space in pairs. Co-presence, in a sense of perception of another person being present in the same virtual space is analyzed through subjective responses and behavioral markers. The results indicate that both perception and proxemics in relation to co-located and distributed players diﬀer. The eﬀect on the perception is however mitigated if participants do not collide with the avatars of distributed co-players.

I. Podkosova, H. Kaufmann: "Co-Presence and Proxemics in Shared Walkable Virtual Environments with Mixed Colocation"; Talk: VRST 18, Tokio; 11-28-2018 - 12-01-2018; in: "Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology", ACM Digital Library, (2018), ISBN: 978-1-4503-6086-9; 1 - 11.

How Current Optical Music Recognition Systems Are Becoming Useful for Digital Libraries

2018-10-04T07:27:32+02:00

by Jan jr. Hajič, Marta Kolárová, Alexander Pacha, and Jorge Calvo-Zaragoza

Optical Music Recognition (OMR) promises to make large collections of sheet music searchable by their musical content. It would open up novel ways of accessing the vast amount of written music that has never been recorded before. For a long time, OMR was not living up to that promise, as its performance was simply not good enough, especially on handwritten music or under non-ideal image conditions. However, OMR has recently seen a number of improvements, mainly due to the advances in machine learning. In this work, we take an OMR system based on the traditional pipeline and an end-to-end system, which represent the current state of the art, and illustrate in proof-of-concept experiments their applicability in retrieval settings. We also provide an example of a musicological study that can be replicated with OMR outputs at much lower costs. Taken together, this indicates that in some settings, current OMR can be used as a general tool for enriching digital libraries.

J. Hajič, M. Kolárová, A. Pacha, J. Calvo-Zaragoza: "How Current Optical Music Recognition Systems Are Becoming Useful for Digital Libraries"; Talk: 5th International Conference on Digital Libraries for Musicology, Paris, France; 09-28-2018; in: "Proceedings of the 5th International Conference on Digital Libraries for Musicology", (2018), 57 - 61.

Optical Music Recognition in Mensural Notation with Region-Based Convolutional Neural Networks

2018-10-04T07:27:34+02:00

by Alexander Pacha and Jorge Calvo-Zaragoza

In this work, we present an approach for the task of optical music recognition (OMR) using deep neural networks. Our intention is to simultaneously detect and categorize musical symbols in handwritten scores, written in mensural notation. We propose the use of region-based convolutional neural networks, which are trained in an end-toend fashion for that purpose. Additionally, we make use of a convolutional neural network that predicts the relative position of a detected symbol within the staff, so that we cover the entire image-processing part of the OMR pipeline. This strategy is evaluated over a set of 60 ancient scores in mensural notation, with more than 15000 annotated symbols belonging to 32 different classes. The results reflect the feasibility and capability of this approach, with a weighted mean average precision of around 76% for symbol detection, and over 98% accuracy for predicting the position.

A. Pacha, J. Calvo-Zaragoza: "Optical Music Recognition in Mensural Notation with Region-Based Convolutional Neural Networks"; Talk: 19th International Society for Music Information Retrieval Conference, Paris, France; 09-23-2018 - 09-27-2018; in: "Proceedings of the 19th International Society for Music Information Retrieval Conference", (2018), 240 - 247.

Advancing OMR as a Community: Best Practices for Reproducible Research

2018-10-04T07:27:34+02:00

by Alexander Pacha

Optical Music Recognition has been under investigation for over 60 years but remains an unsolved problem, because research happens distributedly, often without reusability in mind. As scientists, it should be one of the goals to share knowledge in a way, that it becomes accessible and useful for someone else to build on top. Without that, one´s effort is often doomed to rot in a drawer. To oppose this development, not only the paper, but also the source code, datasets, and executables should be made publicly available for the community to finally advance beyond the state, where the wheel is reinvented every time a new researcher joins the field.

A. Pacha: "Advancing OMR as a Community: Best Practices for Reproducible Research"; Talk: 1st International Workshop on Reading Music Systems, Paris, France; 09-20-2018; in: "Proceedings of the 1st International Workshop on Reading Music Systems", (2018), 19 - 20.

Camera-based pose estimation in dynamic environments - concept and status

2018-12-14T17:27:50+01:00

by Matthias Schörghuber, Martin Humenberger, and Margrit Gelautz

This PhD tackles the challenge of camera-based methods for navigation and environmental sensing in dynamic environments. The goal is to design a robust real-time localization and mapping algorithm which can reliably cope with dynamic (e.g. people, cars) and changing (e.g. structural changes, weather) environments.

We plan to introduce semantics into the traditional geometric processing cues, which allows for explicit treatment of dynamic and changing environments in order to improve mapping and, consequently, pose estimation. As a second goal we leverage the semantic information to introduce enhanced image retrieval techniques to improve the large-scale localization and mapmaintenance in multi-session scenarios.

M. Schörghuber, M. Humenberger, M. Gelautz: "Camera-based pose estimation in dynamic environments - concept and status"; Poster: Prairie Artificial Intelligence Summer School, Grenoble; 07-02-2018 - 07-06-2018.