音频特征提取与分类_Python_下载.zip资源-CSDN文库

共65个文件

py：23个

wav：20个

md：8个

版权申诉

6 浏览量 2023-04-26 11:02:03 上传评论收藏 22.97MB ZIP 举报

在音频处理领域，特征提取与分类是至关重要的步骤，它们为音乐识别、语音识别、情感分析等应用提供了基础。Python作为一种强大的编程语言，因其丰富的库支持而被广泛用于音频处理。在这个“音频特征提取与分类_Python_下载.zip”压缩包中，包含的是一个名为“pyAudioProcessing-master”的项目，它很可能是对Python音频处理的一个实现。我们要理解音频特征提取的概念。特征提取是从原始音频信号中提取出具有代表性的、对分类任务有用的统计量或结构。常见的音频特征包括： 1. **MFCC（梅尔频率倒谱系数）**：模仿人类听觉系统的特性，将音频频谱转换成一系列的梅尔滤波器输出，常用于语音识别。 2. **PCA（主成分分析）**：用于减少数据维度，通过线性变换将高维数据转换为一组各维度线性无关的表示。 3. **零交叉率**：衡量音频信号静默部分的比例，常用于语音活动检测。 4. **谱熵**：反映音频信号频谱的均匀性，用于区分不同类型的音频信号。 5. **节奏特征**：如拍子、节奏模式，对于音乐分类很有帮助。 6. **时域特征**：如均方根能量、过零率，可用于检测语音的强度和变化。 7. **频域特征**：如傅立叶变换后的频谱，可以揭示音频信号的频率组成。接下来是音频分类，这是基于提取的特征将音频分为预定义类别的过程。常见的分类算法有： 1. **支持向量机（SVM）**：通过构建最大边距超平面进行分类，适用于小样本学习。 2. **神经网络**：如多层感知器和深度学习模型（如卷积神经网络CNN、循环神经网络RNN），能处理复杂非线性关系。 3. **随机森林**：利用多个决策树的集成方法，具有很好的泛化能力和抗过拟合能力。 4. **K近邻（K-NN）**：基于训练集中的最近邻进行分类，简单但效果稳定。 5. **朴素贝叶斯**：假设特征之间相互独立，适用于文本分类和邮件过滤。 “pyAudioProcessing-master”这个项目可能包含了实现这些功能的Python代码，例如使用librosa库进行MFCC计算，scikit-learn库进行机器学习模型的训练和评估。项目可能还涉及了数据预处理（如去噪、归一化）、特征选择、模型优化等环节。要深入学习这个项目，你需要熟悉Python编程，了解音频处理的基本概念，熟悉librosa、numpy、pandas、matplotlib等Python库，以及至少一种或多种机器学习框架。通过阅读项目源码，你可以学习如何将理论知识应用于实际问题，提升你在音频处理领域的技能。

资源推荐

资源详情

资源评论

收起资源包目录

音频特征提取与分类_Python_下载.zip （65个子文件）

pyAudioProcessing-master

LICENSE.md 34KB

setup.py 1KB

.github

ISSUE_TEMPLATE

feature_request.md 595B

bug_report.md 834B

workflows

python-package.yml 1KB

greetings.yml 403B

requirements

requirements.txt 262B

tests

__init__.py 0B

test_feature_extraction.py 1KB

test_plot.py 500B

CODE_OF_CONDUCT.md 5KB

pyAudioProcessing

utils.py 3KB

__init__.py 0B

clean.py 1KB

features

__init__.py 0B

getGfcc.py 3KB

spectral.py 4KB

feature_computations.py 10KB

mfcc.py 3KB

filters.py 8KB

plot.py 1KB

extract_features.py 3KB

convert_audio.py 3KB

trainer

__init__.py 0B

audioTrainTest.py 26KB

models

__init__.py 0B

speechVSmusicVSbirds

__init__.py 0B

svm_clfMEANS 1KB

svm_clf.arff 94KB

svm_clf 40KB

model_training_logs.md 145B

speechVSmusic

__init__.py 0B

svm_clfMEANS 1KB

svm_clf.arff 63KB

svm_clf 21KB

model_training_logs.md 109B

music genre

__init__.py 0B

svm_clfMEANS 2KB

svm_clf.arff 810KB

svm_clf 620KB

model_training_logs.md 673B

run_classification.py 17KB

.gitignore 1KB

setup.cfg 238B

README.md 16KB

data_samples

testing

music

opera.wav 1.26MB

nearhou.wav 1.26MB

speech

smoking.wav 1.26MB

sleep.wav 1.26MB

training

music

ballad.wav 1.26MB

beat.wav 1.26MB

bagpipe.wav 1.26MB

birdland.wav 1.26MB

blues.wav 1.26MB

bigband.wav 1.26MB

beatles.wav 1.26MB

bartok.wav 1.26MB

speech

amal.wav 1.26MB

allison.wav 1.26MB

chant.wav 1.26MB

acomic.wav 1.26MB

austria.wav 1.26MB

charles.wav 1.26MB

china.wav 1.26MB

bathroom1.wav 1.26MB

# pyAudioProcessing ![pyaudioprocessing](https://user-images.githubusercontent.com/16875926/131924198-e34abe7e-12d8-41f9-926d-db199734dcaa.png) A Python based library for processing audio data into features (GFCC, MFCC, spectral, chroma) and building Machine Learning models. This was initially written using `Python 3.7`, and updated several times using `Python 3.8` and `Python 3.9`, and has been tested to work with Python >= 3.6, <3.10. ## Getting Started 1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip ``` pip install pyAudioProcessing ``` To upgrade to the latest version of pyAudioProcessing, the following pip command can be used. ``` pip install -U pyAudioProcessing ``` 2. Or, you could also clone the project and get it setup ``` git clone git@github.com:jsingh811/pyAudioProcessing.git cd pyAudioProcessing pip install -e . ``` You can also get the requirements by running ``` pip install -r requirements/requirements.txt ``` ## Contents [Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) [Feature and Classifier model options](https://github.com/jsingh811/pyAudioProcessing#options) [Pre-trained models](https://github.com/jsingh811/pyAudioProcessing#classifying-with-pre-trained-models) [Extracting numerical features from audio](https://github.com/jsingh811/pyAudioProcessing#extracting-features-from-audios) [Building custom classification models](https://github.com/jsingh811/pyAudioProcessing#training-and-classifying-audio-files) [Audio cleaning](https://github.com/jsingh811/pyAudioProcessing#audio-cleaning) [Audio format conversion](https://github.com/jsingh811/pyAudioProcessing#audio-format-conversion) [Audio visualization](https://github.com/jsingh811/pyAudioProcessing#audio-visualization) Please refer to the [Wiki](https://github.com/jsingh811/pyAudioProcessing/wiki) for more details. ## Citation Using pyAudioProcessing in your research? Please cite as follows. ``` Singh, J. (2022). pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling. In Proceedings of the Python in Science Conference. Python in Science Conference. SciPy. https://doi.org/10.25080/majora-212e5952-017 ``` Bibtex ``` @InProceedings{ jyotika_singh-proc-scipy-2022, author = { {J}yotika {S}ingh }, title = { py{A}udio{P}rocessing: {A}udio {P}rocessing, {F}eature {E}xtraction, and {M}achine {L}earning {M}odeling }, booktitle = { {P}roceedings of the 21st {P}ython in {S}cience {C}onference }, pages = { 152 - 158 }, year = { 2022 }, doi = { 10.25080/majora-212e5952-017 } } ``` To cite the software version ``` Jyotika Singh. (2021, July 22). jsingh811/pyAudioProcessing: Audio processing, feature extraction and classification (Version v1.2.0). Zenodo. http://doi.org/10.5281/zenodo.5121041 ``` [![DOI](https://zenodo.org/badge/197088356.svg)](https://zenodo.org/badge/latestdoi/197088356) Bibtex ``` @software{jyotika_singh_2021_5121041, author = {Jyotika Singh}, title = {{jsingh811/pyAudioProcessing: Audio processing, feature extraction and classification}}, month = jul, year = 2021, publisher = {Zenodo}, version = {v1.2.0}, doi = {10.5281/zenodo.5121041}, url = {https://doi.org/10.5281/zenodo.5121041} } ``` ## Options ### Feature options You can choose between features `gfcc`, `mfcc`, `spectral`, `chroma` or any combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files for classification or just saving extracted feature for other uses. ### Classifier options You can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`. Hyperparameter tuning is included in the code for each using grid search. ## Training and Testing Data structuring (Optional) The library works with data structured as per this section or alternatively with taking an input dictionary object specifying location paths of the audio files. Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example: ```bash . ├── training_data ├── music │ ├── music_sample1.wav │ ├── music_sample2.wav │ ├── music_sample3.wav │ ├── music_sample4.wav ├── speech │ ├── speech_sample1.wav │ ├── speech_sample2.wav │ ├── speech_sample3.wav │ ├── speech_sample4.wav ``` Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as ```bash . ├── testing_data ├── music │ ├── music_sample5.wav │ ├── music_sample6.wav ├── speech │ ├── speech_sample5.wav │ ├── speech_sample6.wav ``` If you want to classify audio samples without any known labels, structure the data similarly as ```bash . ├── data ├── unknown │ ├── sample1.wav │ ├── sample2.wav ``` ## Classifying with Pre-trained Models There are three models that have been pre-trained and provided in this project. They are as follows. `music genre`: Contains pre-trained SVM classifier to classify audio into 10 music genres - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. This classifier was trained using MFCC, GFCC, spectral, and chroma features. `musicVSspeech`: Contains pre-trained SVM classifier that classifying audio into two possible classes - music and speech. This classifier was trained using MFCC, spectral, and chroma features. `musicVSspeechVSbirds`: Contains pre-trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using GFCC, spectral, and chroma features. There are three ways to specify the data you want to classify. 1. Classifying a single audio file specified by input `file`. ``` from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre # musicVSspeech classification results_music_speech = classify_ms(file="/Users/xyz/Documents/audio.wav") # musicVSspeechVSbirds classification results_music_speech_birds = classify_msb(file="/Users/xyz/Documents/audio.wav") # music genre classification results_music_genre = classify_genre(file="/Users/xyz/Documents/audio.wav") ``` 2. Using `file_names` specifying locations of audios as follows. ``` # {"audios_1" : [<path to audio>, <path to audio>, ...], "audios_2": [<path to audio>, ...],} # Examples. file_names = { "music" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav"], "birds": [ "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"] } file_names = { "audios" : ["/Users/abc/Documents/opera.wav", "/Users/abc/Downloads/song.wav", "/Users/abc/Documents/b1.wav", "/Users/abc/Documents/b2.wav", "/Users/abc/Desktop/birdsound.wav"] } ``` The following commands in Python can be used to classify your data. ``` from pyAudioProcessing.run_classification import classify_ms, classify_msb, classify_genre # musicVSspeech classification results_music_speech = classify_ms(file_names=file_names) # musicVSspeechVSbirds classification results_music_speech_birds = classify_msb(file_names=file_names) # music genre classification results_music_genre = classify_genre(file_names=file_names) ``` 3. Using data structured as specified in [structuring guidelines](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) and passing the parent folder path as `folder_path` input. The following commands in Python can be used to

评论收藏

内容反馈

版权申诉