Workflow
For both training and test datasets:
• Clean audio files (detectSpeech)
• Concatenate speech signals (silence of max 2s in between)
• Speech signal + Noise = Noisy Signal (low SNR)
• Baseline: Speech-only signal
• Extract features
Training (and validation):
• Train network using these features
Testing:
• Compare network’s accuracy to the VAD baseline