Chair of
Multimedia Communications and Signal Processing
Prof. Dr.-Ing. André Kaup


Field of activity: Audio and Acoustic Signal Processing
Research topic: Acoustic Scene Analysis
Staff: Prof. Dr.-Ing. Walter Kellermann
PhD Shmulik Markovich-Golan

Acoustic source localization aims at extracting the localization information of one or several sound sources from signals captured by a number of spatially distinct microphones. By exploiting the spatial diversity offered by an array of several microphones, acoustic source localization techniques allow to estimate the position of one or several sound sources in a two-dimensional plane or in a three-dimensional space without any prior knowledge about the observed acoustical scene. Accurate localization of one or several sound sources can serve in many applications as a preliminary step to other processes like, e.g., steering a beamformer or pointing a camera in the direction of a sound source. A wide variety of algorithms exist, each addressing different acoustical scenarios depending on the nature of the source (broadband or narrowband, stationary or non-stationary...), the room reverberation or the amount of background noise. Figure 1 provides an overview of existing approaches. We can identify two different strategies:

At the LMS, also temperature issues of acoustic source localization have been investigated, where changes of the speed of sound due to varying room temperatures are taken into account.


Fig. 1: Overview of existing acoustic source localization approaches.

The direct approach

In the direct approach, the position of the active sound source(s) is characterized by an acoustical energy map of the search space. Depending on the localization task, the search space can be a discrete set of grid points in a plane or in a 3D space. It can also be a discrete set of directions, when the range from the sources to the sensors is disregarded. The latter case is usually referred to as the far-field search (i.e., for sources located far away from the sensors), in contrast to the near-field search. Figure 2 depicts two exemplary search grids.

Grid 1Grid 2

Fig. 2: Search grids for near-field (left) and far-field (right) acoustic source localization.

Computed directly from the observed sensor signals, the energy map reflects the activity of the source(s) in the search space. The position of the active source(s) can then be estimated by identifying the local extrema in the energy map, as depicted in Fig. 3.

Map 1Map 2

Fig. 3: Energy map reflecting the source position in the near field (left) and in the far field (right).

Therefore, localization strategies following the direct approach differ only from the way the acoustical map is computed. Among the existing methods, we can identify three categories of algorithms:

    • Approaches based on beamforming consist in maximizing the output signal power of a steered beamformer resulting in a systematic scanning of the environment. The conventional approach is based on the simple delay-and-sum beamformer and is denoted as Steered Response Power (SRP) algorithm. An easy way to improve the localization performance of the conventional SRP in realistic environments is to apply a PHAse Transform (PHAT) to prewhiten the sensor signals, which leads to the widely used SRP-PHAT method [DiBiase 2000].

    • Approaches based on subspace decomposition perform an eigenvalue decomposition of the microphone signal correlation matrix. By observing the eigenstructure of the sensor correlation matrix, a signal subspace (by opposition to the remaining noise subspace) can be extracted and the position of several sources can be estimated. This class of algorithms include the MUSIC [Schmidt 1986] and ESPRIT [Roy 1989] algorithms. Research at the Telecommunications Laboratory of the University of Erlangen-Nuremberg also lead to the Eigen-Beam ESPRIT (EB-ESPRIT) which relies on a wave-field decomposition to serve as a basis for ESPRIT.

  • Approaches based on blind interference cancellation exploit the self-steering capability of Independent Component Analysis (ICA) techniques when applied to convolutive mixtures. Actually, while accurate source location information is usually necessary to steer a conventional beamformer, an ICA-based blind adaptive system offers the possibility to recover the original source signals from a (possibly reverberant) sound mixture without this prior knowledge. Intuitively, this self-steering capability should therefore imply that the demixing system of blind signal processing algorithms contain useful information on the location of each source. Different methods of extracting the location information from ICA filters have been investigated at the Telecommunications Laboratory of the University of Erlangen-Nuremberg, yielding the ICA Averaged Directivity Pattern (ICA-ADP) approach.

The TDOA-based approach

Approaches based on Time Differences Of Arrival (TDOA) rely on a two-step procedure. In a first step one or several time delays between different pairs of microphones (i.e., the TDOAs) are estimated. Figure 4 shows the locus of potential positions corresponding to a given TDOA. The microphone pair is depicted by the two black balls. In general, the locus of potential positions corresponds to one half of a hyperboloïd of two sheets (see the left surface in Fig. 4), with the sensor positions as foci. The asymptotes of the hyperboloïd are shown by red dashed lines in the figure. Using a set of TDOA estimates computed from different sensor pairs, the position of the sources can be calculated in a second step as the intersection of the different hyperboloïds. Assuming a source located far-away from the sensors (i.e., in the far field), the hyperboloïd can be approximated as a cone (see the right surface in Fig. 4). This reduces the dimensionality of the problem since only the Direction-Of-Arrival (DOA) needs to be taken into account, hence disregarding the range coordinate.

3D 13D 2

Fig. 4: Cone of potential positions for a given TDOA (left) and its far-field approximation (right).

Most of the direct methods listed abovecan be reformulated for the extraction of TDOAs by considering only a single pair of sensors. Other TDOA estimation techniques can be classified into two categories:

    • Approaches based on synchrony maximize a measure of synchrony between delayed versions of the sensor signals. Among the available synchrony measures, the Generalized Cross-Correlation with PHAse Transform (GCC-PHAT) is the most popular method [Knapp 1976]. Using a filtered version of the cross-correlation function between the two sensor signals as synchrony measure, it can be considered as a special case of the SRP-PHAT approach [DiBiase 2000] when using only two microphones. Alternative synchrony measures include the Averaged Magnitude Difference Function (AMDF) [Ross 1974] or the Averaged Magnitude Sum Function (AMSF) [Chen 2005].

  • Approaches based on Blind System Identification (BSI) focus on the estimation of the impulse responses between the sources and the microphones. It can be applied for a single source using the Adaptive Eigenvalue Decomposition (AED) algorithm [Benesty 2000], which performs Single-Input-Multiple-Output (SIMO) BSI. Contrary to the simpler synchrony-based methods, the AED explicitly accounts for the room reverberation in its signal model. It can be therefore expected to be more robust against reverberation. A generalization of the AED to perform Multiple-Input-Multiple-Output (MIMO) BSI has been developed at the Telecommunications Laboratory of the University of Erlangen-Nuremberg. It allows the localization of multiple sources and exploits an information-theoretic criterion based on BSS.

Some experimental setups

Setup 1Setup 2Setup 3
Setup 4Setup 5Setup 6



P. Annibale, R. Rabenstein
   [link]   [doi]   [bib]

Closed-Form Estimation of the Speed of Propagating Waves from Time Measurements
Springer Journal on Multidimensional Systems and Signal Processing (MDSSP) Vol. 25, Num. 2, Pages: 361-378, 2014
K. Kowalczyk, E.A.P. Habets, W. Kellermann, P.A. Naylor
   [link]   [doi]   [bib]

Blind system identification using sparse learning for TDOA estimation of room reflections
IEEE Signal Processing Letters (IEEE SPL) Vol. 20, Online Publication, Num. 7, Pages: 653--656, 2013
P. Annibale, J. Filos, P. A. Naylor, R. Rabenstein
   [link]   [doi]   [bib]

TDOA-based Speed of Sound Estimation for Air Temperature and Room Geometry Inference
IEEE Transactions on Audio, Speech and Language Processing (IEEE TASLP) Vol. 21, Num. 2, Pages: 234 - 246, Feb. 2013
H. Sun, W. Kellermann, E. Mabande, K. Kowalczyk
   [pdf]   [link]   [doi]   [bib]

Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing
J. Acoust. Soc. Am. (JASA) Vol. 131, Num. 4, Pages: 2828--2840, Apr. 2012