Multichannel Binaural Speech Enhancement using Deep Complex Convolutional Recurrent Networks

The demos consist of Binaural signals and headphones are recommended for listening.

The binaural signals are generated using measured HRIRs from the Kayser Database [3]
The speech signals are taken from the VCTK Corpus [2]
The Code for implementation can be accessed from below

This page was generated using trackswitch.js in [1].

The listening files are organized as follows

The input SNRs are -6, -3, 0, 6, 12 dB

Isotropic Noise at -6dB SNR

Anechoic speech with Isotropic Noise at -6 dB.

Isotropic Noise at -3dB SNR

Anechoic speech with Isotropic Noise at -3 dB.

Isotropic Noise at 0 dB SNR

Anechoic speech with Isotropic Noise at 0 dB.

Isotropic Noise at 6 dB SNR

Anechoic speech with Isotropic Noise at 6 dB.

Isotropic Noise at 12 dB SNR

Anechoic speech with Isotropic Noise at 12 dB.

Anechoic speech with Isotropic Noise at 12 dB.(Files to be updated)

References

[1] Werner, Nils, et al. "trackswitch.js: A Versatile Web-Based Audio Player for Presenting Scientifc Results." 3rd web audio conference, London, UK. 2017.
[2] H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Database of multichannel in-ear and behind-the-Ear head-related and binaural room impulse responses,” EURASIP J. on Advances in Signal Process., vol. 2009, no.1, p.298605, Jul. 2009
[3] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK Corpus: English multispeaker corpus for CSTR voice cloning toolkit (version 0.92),” University of Edinburgh, The Centre for Speech Technology Research (CSTR), 2019
[4] C. Han, Y. Luo, and N. Mesgarani, “Real-Time Binaural Speech Separation with Preserved Spatial Cues,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP),May 2020, pp. 6404–6408.
[5] V. Tokala, M. Brookes, and P. A. Naylor, “Binaural Speech Enhancement Using STOI-optimal Masks,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2022, pp. 1–5