Interactive Media Systems, TU Vienna

Discrimination and Retrieval of Environmental Sounds

Thesis by Dalibor Mitrovic


The human auditory sense may be regarded as the second most important sense after the sense of sight. This valuation is reflected in the field of information retrieval where until recently research concentrated on visual information retrieval. Even research in audio retrieval (AR) focused on one single aspect of hearing, namely understanding of speech. With the upcoming of large music databases in recent years, a second area of AR gained importance: music information retrieval (MIR). The goal of MIR is to enable efficient search and retrieval in the music databases mentioned above. The latest research area in the domain of audio retrieval is the retrieval of environmental sounds. One may argue that environmental sound retrieval deserves a more prominent role than it has. Most sounds humans hear are neither speech nor music but various environmental sounds. By incorporating environmental sounds into retrieval systems, a vast amount of additional information becomes available. In this thesis the applicability of a range of audio features in the domain of environmental sound retrieval is investigated. Furthermore state-of-the-art techniques in audio retrieval are identified by a broad survey of relevant literature covering all three areas of AR (speech, music, and environmental sounds). The quality of the features is examined with three different classification techniques. Finally, a set of novel audio features, developed by the author, is compared to established features. Results indicate that further research is necessary. There is particularly a lack of low-dimensional and computationally cheap audio descriptors suitable for the use in environmental sound retrieval. 2


