<p align="center">
<img src="assets/musicGenereClassification.png?raw=true" alt="MusicGenreClassification" width="250">
</p>
# MusicGenreClassification
Academic research in the field of **Deep Learning (Deep Neural Networks) and Sound Processing**, Tel Aviv University.
Featured in [Medium](https://medium.com/@matanlachmish/music-genre-classification-470aaac9833d).
## Abstract
This paper discuss the task of classifying the music genre of a sound sample.
## Introduction
When I decided to work on the field of sound processing I thought that genre classification is a parallel problem to the image classification. To my surprise I did not found too many works in deep learning that tackled this exact problem. One paper that did tackle this classification problem is Tao Feng’s paper [1] from the university of Illinois. I did learned a lot from this paper, but honestly, they results the paper presented were not impressive.
So I had to look on other, related but not exact papers. A very influential paper was Deep content-based music recommendation [2] This paper is about content-base music recommendation using deep learning techniques. The way they got the dataset, and the preprocessing they had done to the sound had really enlightened my implementation. Also, this paper was mentioned lately on “Spotify” blog [3]. Spotify recruited a deep learning intern that based on the above work implemented a music recommendation engine. His simple yet very efficient network made me think that Tao’s RBM was not the best approach and there for my implementation included a CNN instead like in the Spotify blog. One very important note is that Tao’s work published result only for 2,3 and 4 classes classification. Obviously he got really good result for 2 classes classification, but the more classes he tried to classify the poorer the result he got. My work classify the whole 10 classes challenge, a much more difficult task. A sub task for this project was to learn a new SDK for deep learning, I have been waiting for an opportunity to learn Google’s new TensorFlow[4]. This project is implemented in Python and the Machine Learning part is using TensorFlow.
## The Dataset
Getting the dataset might be the most time consuming part of this work. Working with music is a big pain, every file is usually a couple of MBs, there are variety of qualities and parameters of recording (Number of frequencies, Bits per second, etc…). But the biggest pain is copyrighting, there are no legit famous songs dataset as they would cost money. Tao’s paper based on a dataset called GTZAN[5]. This dataset is quit small (100 songs per genre X 10 genres = overall 1,000 songs), and the copyright permission is questionable. This is from my perspective one of the reasons that held him from getting better results. So, I looked up for generating more data to learn from. Eventually I found MSD[6] dataset (Million Song Dataset). It is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Around 280 GB of pure metadata. There is a project on top of MSD called tagtraum[7] which classify MSD songs into genres. The problem now was to get the sound itself, here is where I got a little creative. I found that one of the tags every song have in the dataset is an id from a provider called 7Digital[8]. 7Digital is a SaaS provider for music application, it basically let you stream music for money. I signed up to 7Digital as a developer and after their approval i could access their API. Still any song stream costs money, But I found out that they are enabling to preview random 30 seconds of a song to the user before paying for them. This is more than enough for my deep learning task, So I wrote “previewDownloader.py” that downloads for every song in the MSD dataset a 30 sec preview. Unfortunately I had only my laptop for this mission, so I had to settle with only 1% of the dataset (around 2.8GB).
The genres I am classifying are:
1. blues<br>
2. classical<br>
3. country<br>
4. disco <br>
5. hiphop<br>
6. jazz<br>
7. metal<br>
8. pop<br>
9. reggae<br>
10.rock<br>
<p align="center">
<img src="assets/music_popularity.png?raw=true" alt="Music genre popularity" width="500">
</p>
## Preprocessing the data
Having a big data set isn't enough, in oppose to image tasks I cannot work straight on the raw sound sample, a quick calculation: 30 seconds × 22050 sample/sec- ond = 661500 length of vector, which would be heavy load for a convention machine learning method.
Following all the papers I read and researching a little on acoustic analysis, It is quit obvious that the industry is using Mel-frequency cepstral coefficients (MFCC) as the feature vector for the sound sample, I used librosa[9] implementation.
MFCCs are derived as follows:
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale,
using triangular overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform of the list of mel log powers, as if it
were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum.
I had tried several window size and stride values, the best result I got was for size of 100ms and a stride of 40ms.
One more point was that Tao’s paper used MFCC features (step 5) while Sander used strait mel-frequencies (step 2).
<p align="center">
<img src="assets/mel_power_over_time.png?raw=true" alt="MEL ppower over time" width="650">
</p>
I tried both approaches and found out that I got extremely better results using just the mel-frequencies, but the trade-off was the training time of-course.
Before continue to building a network I wanted to visualise the preprocessed data set, I implemented this through the t-SNE[10] algorithm.Below you can see the t-SNE graph for MFCC (step 5) and Mel-Frequencies (step 2):
<p align="center">
<img src="assets/tsne_mfcc.png?raw=true" alt="t-SNE MFCC samples as genres" width="500">
</p>
<p align="center">
<img src="assets/tsne_mel_spec.png?raw=true" alt="t-SNE mel-spectogram samples as genres" width="500">
</p>
## The Graph
After seeing the results Tao and Sander reached I decided to go with a convolu- tional neural network implementation. The network receive a 599 vector of mea-frequen- cy beans, each containing 128 frequencies which describe their window. The network consist with 3 hidden layers and between them I am doing a max pooling. Finally a fully connected layer and than softmax to end up with a 10 dimensional vector for our ten genre classes
<p align="center">
<img src="assets/nural_network.png?raw=true" alt="Nural Network" width="500">
</p>
I did implement another network for MFCC feature instead of mel-frequencies, the only differences are in the sizes (13 frequencies per window instead of 128).
Visualisation of various filters (from Sander’s paper):
<p align="center">
<img src="assets/filters.png?raw=true" alt="Filters visualization" width="250">
</p>
• Filter 14 seems to pick up vibrato singing.
• Filter 242 picks up some kind of ringing ambience.
• Filter 250 picks up vocal thirds, i.e. multiple singers singing
the same thing, but the notes are a major third (4 semitones) apart.
• Filter 253 picks up various types of bass drum sounds.
## Results
As I explained in the introduction, the papers I based my work on did not solve the exact problem I did, for example Tao’s paper published results for classifying 2,3 and 4 classes (Genres).
<p align="center">
<img src="assets/results_feng.png?raw=true" alt="Tao Feng's results" width="750">
</p>
I did looked for benchmarks outside the deep learning field and I found a paper titled “A BENCHMARK DATASET FOR AUDIO CLASSIFICATION AND CLUSTERING” [11]. This paper benchmark a very similar task to mine, the genres it classifies: Blues, Electronic, Jazz, Pop, HipHop, Rock, Folk, Alternative,
没有合适的资源?快使用搜索试试~ 我知道了~
MusicGenreClassification:使用神经网络对10秒钟声音流中的音乐流派进行分类
共58个文件
py:18个
png:14个
txt:10个
5星 · 超过95%的资源 需积分: 46 10 下载量 35 浏览量
2021-02-06
07:33:32
上传
评论 2
收藏 3.68MB ZIP 举报
温馨提示
音乐流派分类 特拉维夫大学,深度学习(深度神经网络)和声音处理领域的学术研究。 精选于 。 抽象 本文讨论了对声音样本的音乐流派进行分类的任务。 介绍 当我决定从事声音处理领域时,我认为体裁分类与图像分类是一个平行的问题。 令我惊讶的是,在深度学习中没有发现太多解决这个确切问题的作品。 确实解决了该分类问题的一篇论文是来自伊利诺伊大学的陶峰的论文[1]。 我确实从本文中学到了很多东西,但是老实说,他们给出的论文结果并不令人印象深刻。 所以我不得不看其他相关但不确切的论文。 很有影响力的论文是基于深度内容的音乐推荐[2]。该文章是关于使用深度学习技术的基于内容的音乐推荐。 他们获取数据集的方
资源详情
资源评论
资源推荐
收起资源包目录
MusicGenreClassification-master.zip (58个子文件)
MusicGenreClassification-master
.gitignore 3KB
Music Genre Classification Using Deep Learning.pdf 366KB
assets
filters.png 292KB
music_popularity.png 236KB
results_benchmark.png 90KB
tsne_mel_spec.png 516KB
musicGenereClassification.png 44KB
TheBigFatNinja.png 7KB
nural_network.png 331KB
results_mine.png 47KB
results_feng.png 34KB
mel_power_over_time.png 758KB
tsne_mfcc.png 537KB
LICENSE 1KB
mel-spec
output
t-SNE v2 - bin of 2.png 147KB
train_log.txt 274KB
code
formatInput.py 3KB
preproccess.py 2KB
lib
oauth7digital.py 3KB
previewDownloader.py 2KB
hdf5_getters.py 21KB
__init__.py 0B
api_settings.py 207B
py7D.egg-info
top_level.txt 32B
zip-safe 1B
SOURCES.txt 247B
PKG-INFO 324B
dependency_links.txt 95B
requires.txt 17B
py7D.py 6KB
train.py 7KB
input
labels 313KB
README.md 11KB
.idea
misc.xml 750B
encodings.xml 159B
MusicGenreClassification.iml 398B
vcs.xml 180B
modules.xml 300B
benchmark
failed_output 402KB
t-SNE v2 - bin of 2.png 186KB
mfcc
output
t-SNE v2 - mfcc.png 152KB
train_log.txt 248KB
code
formatInput.py 3KB
preproccess.py 2KB
lib
oauth7digital.py 3KB
previewDownloader.py 2KB
hdf5_getters.py 21KB
__init__.py 0B
api_settings.py 207B
py7D.egg-info
top_level.txt 32B
zip-safe 1B
SOURCES.txt 247B
PKG-INFO 324B
dependency_links.txt 95B
requires.txt 17B
py7D.py 6KB
train.py 7KB
input
labels 313KB
共 58 条
- 1
穆庭秋
- 粉丝: 26
- 资源: 4671
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Screenshot_20240427_031602.jpg
- 网页PDF_2024年04月26日 23-46-14_QQ浏览器网页保存_QQ浏览器转格式(6).docx
- 直接插入排序,冒泡排序,直接选择排序.zip
- 在排序2的基础上,再次对快排进行优化,其次增加快排非递归,归并排序,归并排序非递归版.zip
- 实现了7种排序算法.三种复杂度排序.三种nlogn复杂度排序(堆排序,归并排序,快速排序)一种线性复杂度的排序.zip
- 冒泡排序 直接选择排序 直接插入排序 随机快速排序 归并排序 堆排序.zip
- 课设-内部排序算法比较 包括冒泡排序、直接插入排序、简单选择排序、快速排序、希尔排序、归并排序和堆排序.zip
- Python排序算法.zip
- C语言实现直接插入排序、希尔排序、选择排序、冒泡排序、堆排序、快速排序、归并排序、计数排序,并带图详解.zip
- 常用工具集参考用于图像等数据处理
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论1