<p align="center">
<img src="https://raw.githubusercontent.com/speechbrain/speechbrain/develop/docs/images/speechbrain-logo.svg" alt="SpeechBrain Logo"/>
</p>
[![Typing SVG](https://readme-typing-svg.demolab.com?font=Fira+Code&size=40&duration=7000&pause=1000&random=false&width=1200&height=100&lines=Simplify+Conversational+AI+Development)](https://git.io/typing-svg)
| ð [Tutorials](https://speechbrain.github.io/tutorial_basics.html) | ð [Website](https://speechbrain.github.io/) | ð [Documentation](https://speechbrain.readthedocs.io/en/latest/index.html) | ð¤ [Contributing](https://speechbrain.readthedocs.io/en/latest/contributing.html) | ð¤ [HuggingFace](https://huggingface.co/speechbrain) | â¶ï¸ [YouTube](https://www.youtube.com/@SpeechBrainProject) | ð¦ [X](https://twitter.com/SpeechBrain1) |
![GitHub Repo stars](https://img.shields.io/github/stars/speechbrain/speechbrain?style=social) *Please, help our community project. Star on GitHub!*
**Exciting News (January, 2024):** Discover what is new in SpeechBrain 1.0 [here](https://colab.research.google.com/drive/1IEPfKRuvJRSjoxu22GZhb3czfVHsAy0s?usp=sharing)!
#
# ð£ï¸ð¬ What SpeechBrain Offers
- SpeechBrain is an **open-source** [PyTorch](https://pytorch.org/) toolkit that accelerates **Conversational AI** development, i.e., the technology behind *speech assistants*, *chatbots*, and *large language models*.
- It is crafted for fast and easy creation of advanced technologies for **Speech** and **Text** Processing.
## ð Vision
- With the rise of [deep learning](https://www.deeplearningbook.org/), once-distant domains like speech processing and NLP are now very close. A well-designed neural network and large datasets are all you need.
- We think it is now time for a **holistic toolkit** that, mimicking the human brain, jointly supports diverse technologies for complex Conversational AI systems.
- This spans *speech recognition*, *speaker recognition*, *speech enhancement*, *speech separation*, *language modeling*, *dialogue*, and beyond.
## ð Training Recipes
- We share over 200 competitive training [recipes](https://github.com/speechbrain/speechbrain/tree/develop/recipes) on more than 40 datasets supporting 20 speech and text processing tasks (see below).
- We support both training from scratch and fine-tuning pretrained models such as [Whisper](https://huggingface.co/openai/whisper-large), [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2), [WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm), [Hubert](https://huggingface.co/docs/transformers/model_doc/hubert), [GPT2](https://huggingface.co/gpt2), [Llama2](https://huggingface.co/docs/transformers/model_doc/llama2), and beyond. The models on [HuggingFace](https://huggingface.co/) can be easily plugged in and fine-tuned.
- For any task, you train the model using these commands:
```python
python train.py hparams/train.yaml
```
- The hyperparameters are encapsulated in a YAML file, while the training process is orchestrated through a Python script.
- We maintained a consistent code structure across different tasks.
- For better replicability, training logs and checkpoints are hosted on Dropbox.
## <a href="https://huggingface.co/speechbrain" target="_blank"> <img src="https://huggingface.co/front/assets/huggingface_logo.svg" alt="drawing" width="40"/> </a> Pretrained Models and Inference
- Access over 100 pretrained models hosted on [HuggingFace](https://huggingface.co/speechbrain).
- Each model comes with a user-friendly interface for seamless inference. For example, transcribing speech using a pretrained model requires just three lines of code:
```python
from speechbrain.pretrained import EncoderDecoderASR
asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-conformer-transformerlm-librispeech", savedir="pretrained_models/asr-transformer-transformerlm-librispeech")
asr_model.transcribe_file("speechbrain/asr-conformer-transformerlm-librispeech/example.wav")
```
## <a href="https://speechbrain.github.io/" target="_blank"> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Google_Colaboratory_SVG_Logo.svg/1200px-Google_Colaboratory_SVG_Logo.svg.png" alt="drawing" width="50"/> </a> Documentation
- We are deeply dedicated to promoting inclusivity and education.
- We have authored over 30 [tutorials](https://speechbrain.github.io/) on Google Colab that not only describe how SpeechBrain works but also help users familiarize themselves with Conversational AI.
- Every class or function has clear explanations and examples that you can run. Check out the [documentation](https://speechbrain.readthedocs.io/en/latest/index.html) for more details ð.
## ð¯ Use Cases
- ð **Research Acceleration**: Speeding up academic and industrial research. You can develop and integrate new models effortlessly, comparing their performance against our baselines.
- â¡ï¸ **Rapid Prototyping**: Ideal for quick prototyping in time-sensitive projects.
- ð **Educational Tool**: SpeechBrain's simplicity makes it a valuable educational resource. It is used by institutions like [Mila](https://mila.quebec/en/), [Concordia University](https://www.concordia.ca/), [Avignon University](https://univ-avignon.fr/en/), and many others for student training.
#
# ð Quick Start
To get started with SpeechBrain, follow these simple steps:
## ð ï¸ Installation
### Install via PyPI
1. Install SpeechBrain using PyPI:
```bash
pip install speechbrain
```
2. Access SpeechBrain in your Python code:
```python
import speechbrain as sb
```
### Install from GitHub
This installation is recommended for users who wish to conduct experiments and customize the toolkit according to their needs.
1. Clone the GitHub repository and install the requirements:
```bash
git clone https://github.com/speechbrain/speechbrain.git
cd speechbrain
pip install -r requirements.txt
pip install --editable .
```
2. Access SpeechBrain in your Python code:
```python
import speechbrain as sb
```
Any modifications made to the `speechbrain` package will be automatically reflected, thanks to the `--editable` flag.
## âï¸ Test Installation
Ensure your installation is correct by running the following commands:
```bash
pytest tests
pytest --doctest-modules speechbrain
```
## ðââï¸ Running an Experiment
In SpeechBrain, you can train a model for any task using the following steps:
```python
cd recipes/<dataset>/<task>/
python experiment.py params.yaml
```
The results will be saved in the `output_folder` specified in the YAML file.
## ð Learning SpeechBrain
- **Website:** Explore general information on the [official website](https://speechbrain.github.io).
- **Tutorials:** Start with [basic tutorials](https://speechbrain.github.io/tutorial_basics.html) covering fundamental functionalities. Find advanced tutorials and topics in the Tutorials menu on the [SpeechBrain website](https://speechbrain.github.io).
- **Documentation:** Detailed information on the SpeechBrain API, contribution guidelines, and code is available in the [documentation](https://speechbrain.readthedocs.io/en/latest/index.html).
#
# ð§ Supported Technologies
- SpeechBrain is a versatile framework designed for implementing a wide range of technologies within the field of Conversational AI.
- It excels not only in individual task implementations but also in combining various technologies into complex pipelines.
## ðï¸ Speech/Audio Processing
| Tasks | Datasets | Technologies/Models |
| ------------- |-------------| -----|
| Speech Recognition | [AISHELL-1](https://github.com/speechbrain/speechbrain/tree/develop/recipes/AISHELL-1), [CommonVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice), [DVoice](https://github.com/speechbrain/speechbrain/tree/develop/recipes/DVoice), [KsponSpeech](https:/
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
PyTorch的语音工具包是**torchaudio**,它是PyTorch官方提供的一个专门用于处理音频数据的库。torchaudio提供了一系列功能和工具,可用于加载、处理、转换和分析音频数据,帮助开发者在语音识别、音频分类等领域进行深度学习模型的训练和应用。以下是关于PyTorch的语音工具包torchaudio的一些重要概念: 1. **音频数据加载**: - torchaudio可以方便地加载多种格式的音频文件,如WAV、MP3、FLAC等,并将其转换为PyTorch的Tensor格式,便于后续在模型中使用。 2. **音频变换**: - torchaudio提供了一系列音频信号处理的函数,如谱图转换、MFCC特征提取、时域扩展等操作,可用于对音频数据进行预处理和特征提取。 3. **数据增强**: - torchaudio还支持一些音频数据增强的方法,如添加噪声、变速、变调等,有助于增加数据多样性,改善模型的泛化能力。 4. **支持常用数据集**: - torchaudio提供了对常用语音数据集的加载接口,如LibriSpeech、VCTK
资源推荐
资源详情
资源评论
收起资源包目录
PyTorch的语音工具包 (1297个子文件)
asr-crdnn-rnnlm-librispeech 92B
CITATION.cff 4KB
normalizer.ckpt 66B
asr.ckpt 61B
tokenizer.ckpt 53B
lm.ckpt 46B
UrbanSound8k_speechbrain.csv 612KB
UrbanSound8K.csv 483KB
tokenizer.csv 52KB
dev-clean.csv 52KB
LibriSpeech.csv 30KB
CommonVoice.csv 25KB
Voicebank.csv 8KB
WHAMandWHAMR.csv 8KB
full_inference.csv 7KB
timers-and-such.csv 6KB
LJSpeech.csv 5KB
Switchboard.csv 5KB
DVoice.csv 5KB
WSJ0Mix.csv 5KB
VoxCeleb.csv 5KB
IWSLT22_lowresource.csv 5KB
AISHELL-1.csv 4KB
BinauralWSJ0Mix.csv 4KB
TIMIT.csv 4KB
ASR_train_plda.csv 4KB
Aishell1Mix.csv 4KB
IEMOCAP.csv 3KB
ASR_test_librispeech_clean.csv 3KB
ESC50.csv 3KB
SLURP.csv 3KB
AMI.csv 3KB
AudioMNIST.csv 2KB
MEDIA.csv 2KB
KsponSpeech.csv 2KB
Fisher-Callhome-Spanish.csv 2KB
MultiWOZ.csv 2KB
LibriMix.csv 2KB
Google-speech-commands.csv 1KB
fluent-speech-commands.csv 1KB
Tedlium2.csv 1KB
ASR_train.csv 1KB
ASR_train_stereo.csv 1KB
LibriTTS.csv 1KB
CVSS.csv 1KB
ZaionEmotionDataset.csv 1KB
LibriParty.csv 1KB
DNS.csv 1KB
REAL-M.csv 1KB
UrbanSound8k.csv 1006B
CommonLanguage.csv 908B
VoxLingua107.csv 882B
RescueSpeech.csv 765B
separation_train_stereo.csv 381B
separation_dev_stereo.csv 374B
multi_annotation.csv 372B
separation_dev.csv 367B
separation_train.csv 367B
esc50_speechbrain.csv 365B
noise_paths.csv 325B
LM_train.csv 300B
speech.csv 298B
noise.csv 295B
esc50.csv 289B
RIRs.csv 199B
single_recording.csv 145B
LM_dev.csv 76B
noise_diffuse.flac 1.05MB
noise_0.70225_-0.70225_0.11704.flac 975KB
speech_-0.82918_0.55279_-0.082918.flac 593KB
speech_-0.98894_0_0.14834.flac 378KB
example1.flac 59KB
example1.flac 52KB
example1.flac 49KB
example1.flac 48KB
example1.flac 48KB
example1.flac 46KB
example1.flac 43KB
example2.flac 39KB
example2.flac 30B
.flake8 296B
.gitignore 2KB
LM_train.txt.gz 197B
hparams_conformer 572B
hparams_ecapa_tdnn 545B
hparams_RNNLM 62B
hparams_save_teachers 1KB
hparams_train_ecapa_tdnn 30B
hparams_train_kd 124B
hparams_train_w2v2_st 254B
hparams_transformer 17B
hparams_transformer 13B
hparams_verification_plda_xvector 116B
hparams_xvectors 543B
pytest.ini 164B
ASR_train.json 12KB
ASR_train_39p.json 5KB
response_generation_train_multiwoz.json 5KB
ASR_dev.json 3KB
Diarization_train.json 2KB
共 1297 条
- 1
- 2
- 3
- 4
- 5
- 6
- 13
资源评论
百锦再@新空间代码工作室
- 粉丝: 1w+
- 资源: 806
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- matlab实现混沌映射与比特重组的图像加密解密算法开发-混沌映射-比特重组-图像加密解密算法-matlab
- 微信朋友圈导出工具 Wemo V1.1.0.mp4
- 微信助手v1.0.0.18自动回复群聊一键管理微信.mp4
- 使用Python和Tkinter构建的音乐播放器源代码
- matlab实现一种基于误差四元数的飞行器姿态跟踪系统的滑模控制器设计-飞行器姿态跟踪-matlab
- 围棋入门快易精围棋入门王元围棋视频讲座.mp4
- 我爱喝沪上阿姨内置版v2 每周三沪上阿姨抢购,新增查券功能.mp4
- 我的电视my-tv0 v1.3.8.7可自定义电视直播.mp4
- 前端开发领域的JavaScript基础与应用:语法特点、数据类型及其在DOM与事件处理中的运用
- 全网被动引流玩法揭秘,一天200+精准客户.mp4
- 全网最全的移动日包流量合集.mp4
- 毕业设计基于Uniapp+SpringBoot+Vue的外卖点餐小程序源码+数据库+使用说明
- 全新版本码支付个人免签支付系统源码 ThinkPHP框架开发 全开源 亲测.mp4
- 热门短剧搜索网站+内置1.2万条短视频数据+无授权开心版.mp4
- 人人影视字幕组分享出来的备份字幕和软件源码.mp4
- 如何拥有一个150BTC的老钱包。.mp4
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功