Multichannel Binaural Speech Enhancement using Deep Complex Convolutional Recurrent Networks

The demos consist of Binaural signals and headphones are recommended for listening.

  • The binaural signals are generated using measured HRIRs from the Kayser Database [3]
  • The speech signals are taken from the VCTK Corpus [2]
  • The Code for implementation can be accessed from below

This page was generated using trackswitch.js in [1].

The listening files are organized as follows

  • The input SNRs are -6, -3, 0, 6, 12 dB

Isotropic Noise at -6dB SNR

Anechoic speech with Isotropic Noise at -6 dB.

Anechoic speech with Isotropic Noise at -6 dB.

Isotropic Noise at -3dB SNR

Anechoic speech with Isotropic Noise at -3 dB.

Anechoic speech with Isotropic Noise at -3 dB.

Isotropic Noise at 0 dB SNR

Anechoic speech with Isotropic Noise at 0 dB.

Anechoic speech with Isotropic Noise at 0 dB.

Isotropic Noise at 6 dB SNR

Anechoic speech with Isotropic Noise at 6 dB.

Anechoic speech with Isotropic Noise at 6 dB.

Isotropic Noise at 12 dB SNR

Anechoic speech with Isotropic Noise at 12 dB.

Anechoic speech with Isotropic Noise at 12 dB.(Files to be updated)


[1] Werner, Nils, et al. "trackswitch.js: A Versatile Web-Based Audio Player for Presenting Scientifc Results." 3rd web audio conference, London, UK. 2017.
[2] H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Database of multichannel in-ear and behind-the-Ear head-related and binaural room impulse responses,” EURASIP J. on Advances in Signal Process., vol. 2009, no.1, p.298605, Jul. 2009
[3] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK Corpus: English multispeaker corpus for CSTR voice cloning toolkit (version 0.92),” University of Edinburgh, The Centre for Speech Technology Research (CSTR), 2019
[4] C. Han, Y. Luo, and N. Mesgarani, “Real-Time Binaural Speech Separation with Preserved Spatial Cues,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP),May 2020, pp. 6404–6408.
[5] V. Tokala, M. Brookes, and P. A. Naylor, “Binaural Speech Enhancement Using STOI-optimal Masks,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2022, pp. 1–5