Chair of
Multimedia Communications and Signal Processing
Prof. Dr.-Ing. André Kaup
The content of the English page is outdated, please use the updated German version of our page at the moment:

Evaluation of Numerical Optimization Methods for Robust Distant-Talking Speech Recognition based on REMOS

Supervisor:Prof. Dr.-Ing. Armin Sehr (Room 5.10)
Faculty:Prof. Dr.-Ing. Walter Kellermann
Student:Roland Maas
Info:Robust distant-talking speech recognition is very desirable for many applications. Due to multi-path propagation in most acoustic environments, the microphone does not only pick up the desired signal but also the reverberation of the desired signal. This seriously degrades the performance of state-of-the-art automatic speech recognition (ASR) systems. Since room reverberation has a dispersive effect on speech feature sequences, traditional signal enhancement and model adaptation approaches developed for additive distortions are not effective in reverberant environments.

A novel concept called REverberation MOdeling for Speech recognition (REMOS) which uses a combination of a Hidden Markov Model (HMM) and a reverberation model, yields very promising results even in strongly reverberant environments. The HMM models the clean speech while the reverberation model describes the effect of the room reverberation directly in the feature domain. For speech recognition, an extended version of the Viterbi algorithm is used, which performs an inner optimization in each iteration in order to determine the most likely contribution of the HMM and the reverberation model to the current reverberant observation. So far, the approach has been implemented only for mel-spectral features.

To extend this approach to more powerful speech features, like logarithmic mel-spectral features or MFCCs, numerical optimization approaches have to be used for the inner optimization. In this thesis, different formulations and different numerical solutions for the inner optimization problem shall be evaluated. For the implementation of the numerical methods, the available C-Code of the recognizer, based on the Hidden Markov Model Toolkit (HTK), shall be extended using IPOPT (Inner Point Optimizer), an open source software package for large-scale nonlinear optimization.

TypeMaster Thesis