Gerhard Doblinger noise reduction microphone array

Adaptive Microphone Array for Noise Reduction

Multi-channel noise reduction systems based on adaptive array processing are superior to single-channel
systems since they incorporate both spatial and temporal information of the sound field.


Figure 1 shows a sound capture setup in a noisy acoustical environment. A set of N microphones with FIR filters is used to adaptively minimize the mean-squared error E{e2[n]} between the desired speech signal s[n], and the output signal y[n]. The acoustical environment is described by channel impulse responses hi[r,t] which include the direct paths to each microphone, plus echoes, and reverberation. The noise field contains all disturbing sources (jammers).


In Fig. 2 the frequency-domain beamformer/postfilter combination of the discrete-time microphone array is sketched. FIR filtering corresponds to multiplication of the microphone signal spectra Xi with complex-valued weights. These beamformer weights are matched to the actual noise spatio-spectral correlation matrix Svv. The beamformer minimizes the output signal power but maintains signals from the desired direction. A single-channel noise reduction system reduces noise from the desired speaker direction. The whole system is implemented with an overlap-add FFT filterbank. Beamformer weights and postfilter coefficients are optimized at each FFT frequency point and for each signal frame. A 16 kHz sampling frequency and a 512 point FFT is used. Real-time operation is possible on today’s signal processing hardware. A detailed description of the algorithms is published in a paper entitled An Adaptive Microphone Array for Optimum Beamforming and Noise Reduction, at the 14th European Signal Processing Conference, EUSIPCO 2006, September 4-8, 2006, Florence, Italy.

A representative experimental result:

Speech recordings have been made in a real acoustical environment with strong reverberation and disturbing noise. The desired speaker direction is broadside (perpendicular to the array axis). A uni-directional noise jammer with approximate 1/f spectral power density is emitting from 25° (with respect to the array axis). The 8-channel array is located in the middle of a large office room with measured reverberation time of 0.84 seconds. Therefore, the disturbing noise is a mixture of uni-directional and diffuse noise. The measured input segmental SNR is 0 dB.

Logarithmically scaled spectrograms of selected signals are presented in Fig. 3. The upper image shows the spectrogram at microphone input channel # 1 (noisy input signal).  The image in the middle of Fig. 3. is the spectrogram at the “diffuse” beamformer output. This beamformer is designed under the assumption of a (theoretical) diffuse noise field. The lower picture is the output spectrogram of an improved beamformer where the design is matched to the estimated spatio-spectral correlation matrix. A far better noise reduction behavior as compared to the conventional design  (diffuse noise field) can be observed.


The influence of the postfilter is shown in Fig. 4 with a “diffuse” beamformer (upper image), and an optimized beamformer matched to the actual noise field (spectrogram in the middle). For comparison, the lower image is the output spectrogram of the postfilter without beamformer pre-processing. Significantly higher noise (with musical tones) and a notable speech distortion can be observed in the lower spectrogram.

Audio examples:

The following audio files correspond to the spectrograms as presented in Fig. 3, and Fig. 4, respectively.

x0dB1.mp3  (noisy speech at microphone input #1, segSNR = 0 dB)
yd.mp3  (output of “diffuse” beamformer)
ye.mp3  (output of improved beamformer matched to the actual noise field)

yde.mp3  (output of “diffuse” beamformer + postfilter)
yee.mp3  (output of improved beamformer + postfilter)
y1e.mp3  (output of postfilter without beamformer pre-processing, i.e. single channel noise reduction)