Hui Song 2020.07.25
An Overview of Deep
Learning Based Speech
Separation Technology
Formulation
Monaural Speech Separation
Array-based Speech Separation
• State of the art
• An
encoder-separator-decoder
formulation
• Frequency-domain or time-domain?
• Frequency-domain speech separation/extraction methods
• Time-domain speech separation/extraction methods
• Some interesting variants
• Separation-based methods
• Beamforming-based methods
01
02
03
CONTENTS
Conclusions and Future Challenges
04
Formulation
1
State of the art
• Encoder
- Transform the input signal into a domain (latent space)
suitable for source separation.
• Separator (+ Extractor)
- Estimates a mask for each source in the latent space, and
outputs an estimate of each source in the latent space by
mask multiplication, or beamforming.
• Decoder
- Transform the extracted source signals back to time-domain.
Formulation
Fig. 1. Generic view of source separation system
[1]