# Speech-Emotion-Recognition-using-ML-and-DL
Emotions are important part of understanding human interactions. Research is going into finding methods that can at the very least mimic human ability to recognise emotions displayed in the form of facial expressions, changes in tone while speaking, etc. Speech Emotion Recognition (SER) is one of such fields. Using deep learning and machine learning algorithms with the help of Ravdess and TESS dataset we aim to design an automatic emotion recognition system.
**Feature set information**
For this task, the dataset is built using 5252 samples from:
- the [Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset](https://zenodo.org/record/1188976#.XsAXemgzaUk)
- the [Toronto emotional speech set (TESS) dataset](https://tspace.library.utoronto.ca/handle/1807/24487)
The samples include:
- 1440 speech files and 1012 Song files from **RAVDESS**. This dataset includes recordings of 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each file was rated 10 times on emotional validity, intensity, and genuineness. Ratings were provided by 247 individuals who were characteristic of untrained adult research participants from North America. A further set of 72 participants provided test-retest data. High levels of emotional validity, interrater reliability, and test-retest intrarater reliability were reported. Validation data is open-access, and can be downloaded along with our paper from [PLoS ONE](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196391).
- 2800 files from **TESS**. A set of 200 target words were spoken in the carrier phrase "Say the word _____' by two actresses (aged 26 and 64 years) and recordings were made of the set portraying each of seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are 2800 stimuli in total. Two actresses were recruited from the Toronto area. Both actresses speak English as their first language, are university educated, and have musical training. Audiometric testing indicated that both actresses have thresholds within the normal range.
**Metrics**
*Model summary*
![Link to model](media/model.png)
*Loss and accuracy plots*
![Link to loss](media/test7_2.JPG)
![Link to accuracy](media/test7_3.JPG)
*Classification report*
![Link do classification report](media/test7_1.JPG)
*Confusion matrix*
![Link do classification report](media/test7_4.JPG)
**How to use the code inside this repository**
1) ```git clone https://github.com/abhay8463/Speech-Emotion-Recognition-using-ML-and-DL.git ``` OR, as an alternative, all the optional steps below.
2) Download Audio_Song_Actors_01-24.zip and Audio_Speech_Actors_01-24.zip, unzip and merge the content of the folders (e.g. Actor_01 should include both Speech and Song) and then add it to the ```features``` folder.
2) Create two empty folders, ```Actor_25``` and ```Actor_26```, into the ```features``` folder.
3) Download the TESS dataset and unzip it into the ```TESS_Toronto_emotional_speech_set_data``` folder.
The format you need to have to make the following steps work is:
```
TESS_Toronto_emotional_speech_set_data
--OAF_angry
--OAF_disgust
--Other Folders..
```
4) Run ```tess_pipeline.py```: this will copy the files in the ```Actor_25``` and ```Actor_26``` folders with a usable naming convention. For details, read the docstrings of ```tess_pipeline.py```.
6) *ONLY IF YOU WANT TO CREATE NEW FEATURES*: run ```create_features.py```. Please note this is NOT necessary as in the ```features``` folder there are already the joblib files created with ```create_features.py```.
7) *ONLY IF YOU WANT TO CREATE A NEW MODEL*: run the python notebook [SER(Deep_Learning).ipynb](https://github.com/abhay8463/Speech-Emotion-Recognition-using-ML-and-DL/blob/master/Deep%20Learning/SER(Deep_Learning).ipynb). Please note this is NOT necessary as in the ```DeepLearning``` folder there is already a [pre_trained model](https://github.com/abhay8463/Speech-Emotion-Recognition-using-ML-and-DL/blob/master/Deep%20Learning/SER_model.h5) to use.
**How to test the model created in this work**
Let's be clear. When we talk about emotions understanding, we are talking about a very difficult task.
I have pasted two files in the ```examples``` folder:
a) 03-01-01-01-01-02-05.wav is an example of WRONG prediction: it is a NEUTRAL file, the model predicts CALM. Try to listen to the audio yourself. Which is the emotion for you? For me CALM seems a fair prediction. That speaker is classified as neutral, but he is not angry at all. You see my point?
b) 10-16-07-29-82-30-63.wav is a DISGUST file. The model is getting it.
Feel free to try with other files or record your voice. I still have to try this last one but I am very curious about the result.
*Important note*: the classes are encoded from 0 to 7 in the code. In the dataset, from 01 to 08. Be aware when you try. If the model predicts 0 and you are using a NEUTRAL file (01), this is correct and the expected behavior. Keras wants the predictions to start from 0 and not from 1, so the code is adjusted to cope with this requirement.
Team Members:
Abhay Gupta,
Aditya Karmokar,
Khadija Mohamad Haneefa,
Chennaboina Hemantha Lakshmi
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
情绪是理解人类互动的重要组成部分。 研究正在寻找至少可以模仿人类识别以面部表情、说话时语气变化等形式表现的情绪的能力的方法。语音情绪识别(SER)就是这样的领域之一。 在 Ravdess 和 TESS 数据集的帮助下,使用深度学习和机器学习算法,我们的目标是设计一个自动情绪识别系统。
资源推荐
资源详情
资源评论
收起资源包目录
Speech-Emotion-Recognition-using-ML-and-DL-master.zip (23个子文件)
tess_pipeline.py 5KB
Machine Learning Algorithm - SVM
SER - SVM - Feature Extraction-.ipynb 17KB
SER_SVM.ipynb 40KB
media
model.png 114KB
test7_2.JPG 17KB
test7_4.JPG 26KB
test7_3.JPG 19KB
test7_1.JPG 78KB
Deep Learning
SER_model.h5 1.64MB
TestingLive.ipynb 6KB
SER(Deep_Learning).ipynb 134KB
examples
10-16-07-29-82-30-63.wav 112KB
03-01-01-01-01-02-05.wav 389KB
Docs
SpringerPaper_Team_21.pdf 521KB
Poster_Team_21.pptx 873KB
Ppt_Team_21.pptx 2.27MB
MLP.ipynb 22KB
dataset_features
y.joblib 21KB
X.joblib 1.6MB
create_features.py 3KB
MLP classifier
MLP 1B
MLP.ipynb 22KB
README.md 5KB
共 23 条
- 1
资源评论
Meta.Qing
- 粉丝: 2w+
- 资源: 120
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功