Binaural Speech Enhancement using Deep Complex Transformer Networks

The demos consist of Binaural signals and headphones are recommended for listening.

  • The binaural signals are generated using measured HRIRs from the Kayser Database [3]
  • The speech signals are taken from the VCTK Corpus [2]
  • The code for the method can be found in the Github Repo
<> This page was generated using trackswitch.js in [1].

The listening files are organized as follows

  • The input SNRs are -6, -3, 0, 6, 12 dB

Isotropic Noise at -6 dB SNR

Anechoic speech with Isotropic Noise at -6 dB.

Anechoic speech with Isotropic Noise at -6 dB.

Anechoic speech with Isotropic Noise at -6 dB.

Isotropic Noise at -3dB SNR

Anechoic speech with Isotropic Noise at -3 dB.

Anechoic speech with Isotropic Noise at -3 dB.

Anechoic speech with Isotropic Noise at -3 dB.


Isotropic Noise at 0dB SNR

Anechoic speech with Isotropic Noise at 0dB.

Anechoic speech with Isotropic Noise at 0dB.

Anechoic speech with Isotropic Noise at 0dB.


Isotropic Noise at 6 dB SNR

Anechoic speech with Isotropic Noise at 6 dB.

Anechoic speech with Isotropic Noise at 6 dB.

Anechoic speech with Isotropic Noise at 6 dB.


Isotropic Noise at 12 dB SNR

Anechoic speech with Isotropic Noise at 12 dB.

Anechoic speech with Isotropic Noise at 12 dB.

Anechoic speech with Isotropic Noise at 12 dB.


References

[1] Werner, Nils, et al. "trackswitch.js: A Versatile Web-Based Audio Player for Presenting Scientifc Results." 3rd web audio conference, London, UK. 2017.
[2] H. Kayser, S. D. Ewert, J. Anemüller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, “Database of multichannel in-ear and behind-the-Ear head-related and binaural room impulse responses,” EURASIP J. on Advances in Signal Process., vol. 2009, no.1, p.298605, Jul. 2009
[3] J. Yamagishi, C. Veaux, and K. MacDonald, “CSTR VCTK Corpus: English multispeaker corpus for CSTR voice cloning toolkit (version 0.92),” University of Edinburgh, The Centre for Speech Technology Research (CSTR), 2019
[4] C. Han, Y. Luo, and N. Mesgarani, “Real-Time Binaural Speech Separation with Preserved Spatial Cues,” in Proc. IEEE Int. Conf. on Acoust., Speech and Signal Process. (ICASSP),May 2020, pp. 6404–6408.
[5] V. Tokala, M. Brookes, and P. A. Naylor, “Binaural Speech Enhancement Using STOI-optimal Masks,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2022, pp. 1–5