Automatic speaker localization and tracking can be achieved by using an adaptive microphone array and sophisticated digital signal processing algorithms. We developed and implemented a filterbank beamformer based on the fast Fourier transform (FFT). The array pattern main lobe is automatically steered to a moving speaker.
The array pattern image shows the estimated azimuth trace (yellow curve superimposed to the image) of a moving speaker. The main lobe (dark red area) clearly follows the speaker’s movement. A settling period of the adaptive algorithm can be observed during the time interval from 0 to approx. 4 seconds.
The system has been tested with MATLAB. A program written in C runs in real-time with 16 kHz sampling frequency on any modern PC.
A detailed description of the algorithms can be found in G. Doblinger, “Localization and Tracking of Acoustical Sources”, published in the book “Topics in Acoustic Echo and Noise Control”, Eberhard Hänsler and Gerhard Schmidt (Editors), Springer Verlag 2006, Chapter 4, pp. 91-120, ISBN: 3-540-33212-X.
|Hands-free mobile communications in cars||Teleconferencing|
Matlab programs are available for all algorithms as presented in the book. The programs are published under the GNU License (see file COPYING supplied with the MATLAB files).
In the following demos, the speaker moves from azimuth 90° (broadside) to 0° (endfire), back to 90°, and finally to 180° (see array pattern image above). An online recording of a C program running under the Linux® operating system with ALSA and an 8 channel Terratec® EWS88MT sound system has been produced. A plugin is needed to run the demos. Alternatively, you can use the mp4-files to show the video clips. You will watch the array pattern while the speaker moves, and you can listen to the speaker’s voice.
For each demo there is an swf-file in the zip-archive:
Demo 1 (file bf_90.swf, or bf_90.mp4):
Desired direction is fixed to 90° (broadside). Settling of the adaptive algorithm and suppression of sounds from other directions can be observed
Demo 2 (file bf_0.swf, or bf_0.mp4):
Desired direction is fixed to 0° (endfire). Settling of the adaptive algorithm and suppression of sounds from other directions can be observed.
Demo 3 (file bf_doa.swf, or bf_doa.mp4):
Speaker tracking is activated. The array pattern follows speaker movements.