Chair of
Multimedia Communications and Signal Processing
Prof. Dr.-Ing. André Kaup

Comparison of voice activity detection (VAD) methods for robot audition

Supervisor:Dipl.-Ing. Stefan Meier (Room 01.178)
Faculty:Prof. Dr.-Ing. Walter Kellermann
Student:Mack, Wolfgang

A common problem in audio signal processing is the detection of time frames with an active target source, which is typically a human speaker. Possible applications include automatic speech recognition, where the speech recognizer should only be active during target source activity, or system identification, where the acoustic path between the target source and the microphones should be estimated during noise pauses. Conventional voice activity detection methods exploit spectral and temporal characteristics of speech signals in order to distinguish speech (which is relatively nonstationary and exhibits harmonic structures) from background noise (which is typically stationary over longer time intervals).

In the past, various methods for voice activity detection have been proposed in the literature, which should be evaluated in this thesis. The thesis involves a thorough literature research in order to gain an overview of the state of the art. Promising methods should be implemented and compared with each other. The evaluation should address different kinds of background noise (e.g., white noise, babble noise) and should be performed under different environmental conditions (e.g., reverberation, signal-to-noise ratio). Finally, the most promising method(s) should be identified.

Well-documented and well-structured software is important. The thesis can be written in German or English.

TypeResearch Internship