Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules for Noise and Reverberation Reduction

Hsin-Tien Chiang1    Hao Zhang2    Yong Xu2    Meng Yu2    Dong Yu2

1The University of Texas at Dallas, Richardson, TX, USA

2Tencent AI Lab, Bellevue, WA, USA

 

Abstract

In challenging environments with significant noise and reverberation, traditional speech enhancement (SE) methods often lead to over-suppressed speech, creating artifacts during listening and harming downstream tasks performance. To overcome these limitations, we propose a novel approach called Restorative SE (RestSE), which combines a lightweight SE module with a generative codec module to progressively enhance and restore speech quality. The SE module initially reduces noise, while the codec module subsequently performs dereverberation and restores speech using generative capabilities. We systematically explore various quantization techniques within the codec module to optimize performance. Additionally, we introduce a weighted loss function and feature fusion that merges the SE output with the original mixture, particularly at segments where the SE output is heavily distorted. Experimental results demonstrate the effectiveness of our proposed method in enhancing speech quality under adverse conditions.

 

Results

Simulated

Mixture

LSTM (pretrained)

RestSE

Wang et al. [1]

Dry Clean

Mixture

LSTM (pretrained)

RestSE

Wang et al. [1]

Dry Clean

Real

Mixture

LSTM (pretrained)

RestSE

Wang et al. [1]

 

References

[1] H. Wang, M. Yu, H. Zhang, C. Zhang, Z. Xu, M. Yang, Y. Zhang, and D. Yu, "Unifying robustness and fidelity: A comprehensive study of pretrained generative methods for speech enhancement in adverse conditions," arXiv preprint arXiv:2309.09028, 2023.

Last update: September 8, 2024