|Supervisor:||Prof. Dr.-Ing. Armin Sehr (Room 5.10)|
|Faculty:||Prof. Dr.-Ing. Walter Kellermann|
|Info:||Robust distant-talking speech recognition is very desirable for many applications. Due to multi-path propagation in most acoustic environments, the microphone does not only pick up the desired signal but also the reverberation of the desired signal. This seriously degrades the performance of state-of-the-art automatic speech recognition (ASR) systems. Since room reverberation has a dispersive effect on speech feature sequences, traditional signal enhancement and model adaptation approaches developed for additive distortions are not effective in reverberant environments.
A novel concept called REverberation MOdeling for Speech recognition (REMOS) which uses a combination of a Hidden Markov Model (HMM) and a reverberation model, yields very promising results even in strongly reverberant environments. The HMM models the clean speech while the reverberation model describes the effect of the room reverberation directly in the feature domain. For speech recognition, an extended version of the Viterbi algorithm is used, which performs an inner optimization in each iteration in order to determine the most likely contribution of the HMM and the reverberation model to the current reverberant observation. So far, the approach has been implemented only for mel-spectral features.
To extend this approach to more powerful speech features, like logarithmic mel-spectral features or MFCCs, numerical optimization approaches have to be used for the inner optimization. In this thesis, different formulations and different numerical solutions for the inner optimization problem shall be evaluated. For the implementation of the numerical methods, the available C-Code of the recognizer, based on the Hidden Markov Model Toolkit (HTK), shall be extended using IPOPT (Inner Point Optimizer), an open source software package for large-scale nonlinear optimization.