# pyAudioProcessing
![pyaudioprocessing](https://user-images.githubusercontent.com/16875926/131924198-e34abe7e-12d8-41f9-926d-db199734dcaa.png)
A Python based library for processing audio data into features (GFCC, MFCC, spectral, chroma) and building Machine Learning models.
This was initially written using `Python 3.7`, and updated several times using `Python 3.8` and `Python 3.9`, and has been tested to work with Python >= 3.6, <3.10.
## Getting Started
1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip
```
pip install pyAudioProcessing
```
To upgrade to the latest version of pyAudioProcessing, the following pip command can be used.
```
pip install -U pyAudioProcessing
```
2. Or, you could also clone the project and get it setup
```
git clone git@github.com:jsingh811/pyAudioProcessing.git
cd pyAudioProcessing
pip install -e .
```
You can also get the requirements by running
```
pip install -r requirements/requirements.txt
```
## Contents
[Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring)
[Feature and Classifier model options](https://github.com/jsingh811/pyAudioProcessing#options)
[Pre-trained models](https://github.com/jsingh811/pyAudioProcessing#classifying-with-pre-trained-models)
[Extracting numerical features from audio](https://github.com/jsingh811/pyAudioProcessing#extracting-features-from-audios)
[Building custom classification models](https://github.com/jsingh811/pyAudioProcessing#training-and-classifying-audio-files)
[Audio cleaning](https://github.com/jsingh811/pyAudioProcessing#audio-cleaning)
[Audio format conversion](https://github.com/jsingh811/pyAudioProcessing#audio-format-conversion)
[Audio visualization](https://github.com/jsingh811/pyAudioProcessing#audio-visualization)
Please refer to the [Wiki](https://github.com/jsingh811/pyAudioProcessing/wiki) for more details.
## Citation
Using pyAudioProcessing in your research? Please cite as follows.
```
Singh, J. (2022). pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling. In Proceedings of the Python in Science Conference. Python in Science Conference. SciPy. https://doi.org/10.25080/majora-212e5952-017
```
Bibtex
```
@InProceedings{ jyotika_singh-proc-scipy-2022,
author = { {J}yotika {S}ingh },
title = { py{A}udio{P}rocessing: {A}udio {P}rocessing, {F}eature {E}xtraction, and {M}achine {L}earning {M}odeling },
booktitle = { {P}roceedings of the 21st {P}ython in {S}cience {C}onference },
pages = { 152 - 158 },
year = { 2022 },
doi = { 10.25080/majora-212e5952-017 }
}
```
To cite the software version
```
Jyotika Singh. (2021, July 22). jsingh811/pyAudioProcessing: Audio processing, feature extraction and classification (Version v1.2.0). Zenodo. http://doi.org/10.5281/zenodo.5121041
```
[![DOI](https://zenodo.org/badge/197088356.svg)](https://zenodo.org/badge/latestdoi/197088356)
Bibtex
```
@software{jyotika_singh_2021_5121041,
author = {Jyotika Singh},
title = {{jsingh811/pyAudioProcessing: Audio processing,
feature extraction and classification}},
month = jul,
year = 2021,
publisher = {Zenodo},
version = {v1.2.0},
doi = {10.5281/zenodo.5121041},
url = {https://doi.org/10.5281/zenodo.5121041}
}
```
## Options
### Feature options
You can choose between features `gfcc`, `mfcc`, `spectral`, `chroma` or any combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files for classification or just saving extracted feature for other uses.
### Classifier options
You can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`.
Hyperparameter tuning is included in the code for each using grid search.
## Training and Testing Data structuring (Optional)
The library works with data structured as per this section or alternatively with taking an input dictionary object specifying location paths of the audio files.
Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:
```bash
.
├── training_data
├── music
│ ├── music_sample1.wav
│ ├── music_sample2.wav
│ ├── music_sample3.wav
│ ├── music_sample4.wav
├── speech
│ ├── speech_sample1.wav
│ ├── speech_sample2.wav
│ ├── speech_sample3.wav
│ ├── speech_sample4.wav
```
Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as
```bash
.
├── testing_data
├── music
│ ├── music_sample5.wav
│ ├── music_sample6.wav
├── speech
│ ├── speech_sample5.wav
│ ├── speech_sample6.wav
```
If you want to classify audio samples without any known labels, structure the data similarly as
```bash
.
├── data
├── unknown
│ ├── sample1.wav
│ ├── sample2.wav
```
## Classifying with Pre-trained Models
There are three models that have been pre-trained and provided in this project. They are as follows.
`music genre`: Contains pre-trained SVM classifier to classify audio into 10 music genres - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. This classifier was trained using MFCC, GFCC, spectral, and chroma features.
`musicVSspeech`: Contains pre-trained SVM classifier that classifying audio into two possible classes - music and speech. This classifier was trained using MFCC, spectral, and chroma features.
`musicVSspeechVSbirds`: Contains pre-trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using GFCC, spectral, and chroma features.
There are three ways to specify the data you want to classify.
1. Classifying a single audio file specified by input `file`.
```
from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre
# musicVSspeech classification
results_music_speech = classify_ms(file="/Users/xyz/Documents/audio.wav")
# musicVSspeechVSbirds classification
results_music_speech_birds = classify_msb(file="/Users/xyz/Documents/audio.wav")
# music genre classification
results_music_genre = classify_genre(file="/Users/xyz/Documents/audio.wav")
```
2. Using `file_names` specifying locations of audios as follows.
```
# {"audios_1" : [<path to audio>, <path to audio>, ...], "audios_2": [<path to audio>, ...],}
# Examples.
file_names = {
"music" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav"],
"birds": [ "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"]
}
file_names = {
"audios" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav", "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"]
}
```
The following commands in Python can be used to classify your data.
```
from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre
# musicVSspeech classification
results_music_speech = classify_ms(file_names=file_names)
# musicVSspeechVSbirds classification
results_music_speech_birds = classify_msb(file_names=file_names)
# music genre classification
results_music_genre = classify_genre(file_names=file_names)
```
3. Using data structured as specified in [structuring guidelines](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) and passing the parent folder path as `folder_path` input.
The following commands in Python can be used to