# wav2vec 2.0
wav2vec 2.0 learns speech representations on unlabeled data as described in [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)](https://arxiv.org/abs/2006.11477).
We learned speech representations in multiple languages as well in [Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)](https://arxiv.org/abs/2006.13979).
We also combined wav2vec 2.0 with self-training in [Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)](https://arxiv.org/abs/2010.11430).
We combined speech data from multiple domains in [Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)](https://arxiv.org/abs/2104.01027).
We finetuned XLSR-53 on multiple languages to transcribe unseen languages in [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition (Xu et al., 2021)](https://arxiv.org/abs/2109.11680).
## Pre-trained models
Model | Finetuning split | Dataset | Model
|---|---|---|---
Wav2Vec 2.0 Base | No finetuning | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt)
Wav2Vec 2.0 Base | 10 minutes | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_10m.pt)
Wav2Vec 2.0 Base | 100 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_100h.pt)
Wav2Vec 2.0 Base | 960 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_960h.pt)
Wav2Vec 2.0 Large | No finetuning | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/libri960_big.pt)
Wav2Vec 2.0 Large | 10 minutes | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_10m.pt)
Wav2Vec 2.0 Large | 100 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_100h.pt)
Wav2Vec 2.0 Large | 960 hours | [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_big_960h.pt)
Wav2Vec 2.0 Large (LV-60)* | No finetuning | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt)
Wav2Vec 2.0 Large conformer - rel_pos (LV-60)* | No finetuning | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_relpos_PT_no_FT)
Wav2Vec 2.0 Large conformer - rope (LV-60)* | No finetuning | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_rope_PT_no_FT)
Wav2Vec 2.0 Large (LV-60)* | 10 minutes | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_10m_new.pt)
Wav2Vec 2.0 Large (LV-60)* | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_100h_new.pt)
Wav2Vec 2.0 Large conformer - rel_pos (LV-60)* | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_relpos_PT_100h_FT.pt)
Wav2Vec 2.0 Large conformer - rope (LV-60)* | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_rope_PT_100h_FT.pt)
Wav2Vec 2.0 Large (LV-60)* | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec2_vox_960h_new.pt)
Wav2Vec 2.0 Large conformer - rel_pos (LV-60)* | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_relpos_PT_960h_FT.pt)
Wav2Vec 2.0 Large conformer - rope (LV-60)* | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) | [download](s3://dl.fbaipublicfiles.com/fairseq/conformer/wav2vec2/librilight/LL_rope_PT_960h_FT.pt)
Wav2Vec 2.0 Large (LV-60) + Self Training * | 10 minutes | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_10m_pl.pt)
Wav2Vec 2.0 Large (LV-60) + Self Training * | 100 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_100h_pl.pt)
Wav2Vec 2.0 Large (LV-60) + Self Training * | 960 hours | [Libri-Light](https://github.com/facebookresearch/libri-light) + [Librispeech](http://www.openslr.org/12) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_960h_pl.pt)
Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) ** | No finetuning | [Libri-Light](https://github.com/facebookresearch/libri-light) + [CommonVoice](https://commonvoice.mozilla.org/en/languages) + [Switchboard](https://catalog.ldc.upenn.edu/LDC97S62) + [Fisher](https://catalog.ldc.upenn.edu/LDC2004T19) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/w2v_large_lv_fsh_swbd_cv.pt)
Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) ** | 960 hours Librispeech | [Libri-Light](https://github.com/facebookresearch/libri-light) + [CommonVoice](https://commonvoice.mozilla.org/en/languages) + [Switchboard](https://catalog.ldc.upenn.edu/LDC97S62) + [Fisher](https://catalog.ldc.upenn.edu/LDC2004T19) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/w2v_large_lv_fsh_swbd_cv_ftls960_updated.pt)
Wav2Vec 2.0 Large (LV-60 + CV + SWBD + FSH) ** | 300 hours Switchboard | [Libri-Light](https://github.com/facebookresearch/libri-light) + [CommonVoice](https://commonvoice.mozilla.org/en/languages) + [Switchboard](https://catalog.ldc.upenn.edu/LDC97S62) + [Fisher](https://catalog.ldc.upenn.edu/LDC2004T19) | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/w2v_large_lv_fsh_swbd_cv_ftsb300_updated.pt)
\* updated (Oct. 24, 2020)\
** updated (Nov. 13, 2021)
We also release multilingual pre-trained wav2vec 2.0 (XLSR) models:
Model | Architecture | Hours | Languages | Datasets | Model
|---|---|---|---|---|---
XLSR-53 | Large | 56k | 53 | MLS, CommonVoice, BABEL | [download](https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt)
The XLSR model uses the following datasets for multilingual pretraining:
* **[MLS: Multilingual LibriSpeech](https://indico2.conference4me.psnc.pl/event/35/contributions/3585/attachments/1060/1101/Wed-2-6-10.pdf)** (8 languages, 50.7k hours): *Dutch, English, French, German, Italian, Polish, Portuguese, Spanish*
* **[CommonVoice](https://commonvoice.mozilla.org/en/languages)** (36 languages, 3.6k hours): *Arabic, Basque, Breton, Chinese (CN), Chinese (HK), Chinese (TW), Chuvash, Dhivehi, Dutch, English, Esperanto, Estonian, French, German, Hakh-Chin, Indonesian, Interlingua, Irish, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Mongolian, Persian, Portuguese, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Welsh* (see also [finetuning splits]([https://dl.fbaipublicfiles.com/cpc_audio/common_voices_splits.tar.gz]) from [this paper](https://arxiv.org/abs/2002.02848)).
* **[Babel](https://catalog.ldc.upenn.edu/byyear)** (17 languages, 1.7k hours): *Assamese, Bengali, Cantonese, Cebuano, Georgian, Haitian, Kazakh, Kurmanji, Lao, Pashto, Swahili, Tagalog, Tamil, Tok, Turkish, Vietnamese, Zulu*
We also finet
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
人工智能领域的序列到序列(Sequence-to-Sequence,Seq2Seq)模型是一种常见的深度学习架构,主要用于处理序列数据之间的映射和转换任务。Seq2Seq模型通常由两个部分组成:编码器(Encoder)和解码器(Decoder)。编码器将输入序列编码成一个固定长度的向量,解码器将该向量解码为输出序列。以下是人工智能研究序列到序列时可能涉及的一些关键方面: 1. **模型架构**:Seq2Seq模型的架构设计是人工智能研究序列到序列的关键方面之一。研究人员可以探索不同类型的编码器和解码器结构,如循环神经网络(RNN)、长短时记忆网络(LSTM)、注意力机制等,以提高模型的性能和效率。 2. **损失函数**:设计合适的损失函数对于训练Seq2Seq模型至关重要。常见的损失函数包括交叉熵损失函数、平滑的交叉熵损失函数等,研究人员可以根据具体任务选择合适的损失函数。 3. **训练策略**:研究人员可以探索不同的训练策略来提高Seq2Seq模型的性能,如教师强制训练、强化学习等。此外,训练过程中的超参数调优也是一个重要的研究方向。 4. **注意力机制**:注意力机制
资源推荐
资源详情
资源评论
收起资源包目录
用Python编写的Facebook人工智能研究序列到序列工具 (1618个子文件)
make.bat 805B
CSDN关注我不迷路.bmp 2.79MB
add-self-loop-simple.cc 3KB
setup.cfg 107B
CODEOWNERS 932B
docutils.conf 25B
edit_dist.cpp 6KB
alignment_train_cpu.cpp 5KB
balanced_assignment.cpp 4KB
libbleu.cpp 3KB
binding.cpp 2KB
dynamicconv_cuda.cpp 1KB
lightconv_cuda.cpp 1KB
ngram_repeat_block_cuda.cpp 1KB
module.cpp 814B
dynamiconv_cpu.cpp 805B
alignment_train_cuda.cpp 649B
alignment_train_kernel.cu 11KB
lightconv_cuda_kernel.cu 10KB
edit_dist.cu 10KB
cuda_utils.cu 6KB
dynamicconv_cuda_kernel.cu 5KB
ngram_repeat_block_cuda_kernel.cu 2KB
lightconv_cuda.cuh 2KB
dynamicconv_cuda.cuh 1KB
6313-76958-0021.flac 219KB
fairseq.gif 2.54MB
.gitignore 2KB
.gitignore 2KB
.gitignore 232B
.gitignore 16B
.gitmodules 162B
edit_dist.h 627B
utils.h 528B
alignment_train_cuda.h 389B
MANIFEST.in 28B
MMS_ASR_Inference_Colab.ipynb 624KB
MMS_TTS_Inference_Colab.ipynb 361KB
MMS_LID_Inference_Colab.ipynb 21KB
asr_model_cfgs.json 6KB
sample.base.L9.km500.km 2KB
sample.base.L9.len 4B
sample.large.L20.len 4B
sample.xlarge.L30.len 4B
LICENSE 1KB
mustc_noise.list 776B
punctuations.lst 6KB
convert_model.lua 3KB
convert_dictionary.lua 787B
Makefile 607B
data_augmentation.md 26KB
README.md 25KB
README.md 20KB
README.md 17KB
README.md 17KB
data_card.md 17KB
pre-training.md 16KB
README.md 15KB
README.md 15KB
README.md 14KB
README.md 14KB
covost_example.md 13KB
README.md 13KB
README.md 12KB
README.md 12KB
README.md 11KB
model_card.md 11KB
README.md 11KB
README.md 11KB
mustc_example.md 11KB
model_card.md 11KB
README.md 11KB
direct_s2st_discrete_units.md 10KB
hydra_integration.md 10KB
mtedx_example.md 10KB
README.md 9KB
README.md 9KB
README.md 9KB
README.md 9KB
enhanced_direct_s2st_discrete_units.md 8KB
README.md 8KB
README.md 8KB
README.md 8KB
simulst_mustc_example.md 8KB
README.md 8KB
ende-mustc.md 7KB
README.md 7KB
README.md 7KB
README.md 7KB
README.xsum.md 7KB
README.md 6KB
ljspeech_example.md 6KB
scripts.md 6KB
README.finetuning.md 6KB
README.md 6KB
README.md 6KB
README.md 6KB
README.md 6KB
README.md 6KB
README.md 6KB
共 1618 条
- 1
- 2
- 3
- 4
- 5
- 6
- 17
资源评论
专家-百锦再
- 粉丝: 7433
- 资源: 731
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 论文(最终)_20240430235101.pdf
- 基于python编写的Keras深度学习框架开发,利用卷积神经网络CNN,快速识别图片并进行分类
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
- 林周瑜-论文.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功