# Neural Machine Translation
## Pre-trained models
Model | Description | Dataset | Download
---|---|---|---
`conv.wmt14.en-fr` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2) <br> newstest2012/2013: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.ntst1213.tar.bz2)
`conv.wmt14.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT14 English-German](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-de.newstest2014.tar.bz2)
`conv.wmt17.en-de` | Convolutional <br> ([Gehring et al., 2017](https://arxiv.org/abs/1705.03122)) | [WMT17 English-German](http://statmt.org/wmt17/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt17.v2.en-de.fconv-py.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt17.v2.en-de.newstest2014.tar.bz2)
`transformer.wmt14.en-fr` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT14 English-French](http://statmt.org/wmt14/translation-task.html#Download) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt14.en-fr.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt14.en-fr.joined-dict.newstest2014.tar.bz2)
`transformer.wmt16.en-de` | Transformer <br> ([Ott et al., 2018](https://arxiv.org/abs/1806.00187)) | [WMT16 English-German](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) | model: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/wmt16.en-de.joined-dict.transformer.tar.bz2) <br> newstest2014: <br> [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/data/wmt16.en-de.joined-dict.newstest2014.tar.bz2)
`transformer.wmt18.en-de` | Transformer <br> ([Edunov et al., 2018](https://arxiv.org/abs/1808.09381)) <br> WMT'18 winner | [WMT'18 English-German](http://www.statmt.org/wmt18/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt18.en-de.ensemble.tar.gz) <br> See NOTE in the archive
`transformer.wmt19.en-de` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-German](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-de.joined-dict.ensemble.tar.gz)
`transformer.wmt19.de-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 German-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.de-en.joined-dict.ensemble.tar.gz)
`transformer.wmt19.en-ru` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 English-Russian](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.en-ru.ensemble.tar.gz)
`transformer.wmt19.ru-en` | Transformer <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) <br> WMT'19 winner | [WMT'19 Russian-English](http://www.statmt.org/wmt19/translation-task.html) | model: <br> [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/wmt19.ru-en.ensemble.tar.gz)
## Example usage (torch.hub)
Interactive translation via PyTorch Hub:
```python
import torch
# List available models
torch.hub.list('pytorch/fairseq') # [..., 'transformer.wmt16.en-de', ... ]
# Load a transformer trained on WMT'16 En-De
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
# The underlying model is available under the *models* attribute
assert isinstance(en2de.models[0], fairseq.models.transformer.TransformerModel)
# Translate a sentence
en2de.translate('Hello world!')
# 'Hallo Welt!'
```
## Example usage (CLI tools)
Generation with the binarized test sets can be run in batch mode as follows, e.g. for WMT 2014 English-French on a GTX-1080ti:
```bash
mkdir -p data-bin
curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf - -C data-bin
curl https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2 | tar xvjf - -C data-bin
fairseq-generate data-bin/wmt14.en-fr.newstest2014 \
--path data-bin/wmt14.en-fr.fconv-py/model.pt \
--beam 5 --batch-size 128 --remove-bpe | tee /tmp/gen.out
# ...
# | Translated 3003 sentences (96311 tokens) in 166.0s (580.04 tokens/s)
# | Generate test with beam=5: BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
# Compute BLEU score
grep ^H /tmp/gen.out | cut -f3- > /tmp/gen.out.sys
grep ^T /tmp/gen.out | cut -f2- > /tmp/gen.out.ref
fairseq-score --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
# BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
```
## Preprocessing
These scripts provide an example of pre-processing data for the NMT task.
### prepare-iwslt14.sh
Provides an example of pre-processing for IWSLT'14 German to English translation task: ["Report on the 11th IWSLT evaluation campaign" by Cettolo et al.](http://workshop2014.iwslt.org/downloads/proceeding.pdf)
Example usage:
```bash
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..
# Binarize the dataset:
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/iwslt14.tokenized.de-en
# Train the model (better for a single GPU setup):
mkdir -p checkpoints/fconv
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
--lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--lr-scheduler fixed --force-anneal 200 \
--arch fconv_iwslt_de_en --save-dir checkpoints/fconv
# Generate:
fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/fconv/checkpoint_best.pt \
--batch-size 128 --beam 5 --remove-bpe
```
To train transformer model on IWSLT'14 German to English:
```bash
# Preparation steps are the same as for fconv model.
# Train the model (better for a single GPU setup):
mkdir -p checkpoints/transformer
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
-a transformer_iwslt_de_en --optimizer adam --lr 0.0005 -s de -t en \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 \
--min-lr '1e-09' --lr-scheduler inverse_sqrt --weight-decay 0.0001 \
--criterion label_smoothed_cross_entropy --max-update 50000 \
--warmup-updates 4000 --warmup-init-lr '1e-07' \
--adam-betas '(0.9, 0.98)' --save-dir checkpoints/transformer
# Average 10 latest checkpoints:
python scripts/average_checkpoints.py --inputs checkpoints/transformer \
--num-epoch-checkpoints 10 --output checkpoints/transformer/model.pt
# Generate:
fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/transformer/model.pt \
--batch-size 128 --beam 5 --remove-bpe
```
### prepare-wmt14en2de.sh
The WMT English to German dataset can be preprocessed using the `prepare-wmt14en2de.sh` script.
By default it will produce a dataset that was mode
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
基于CMLM的语义一致性数据增强方法python实现源码(提高神经机器翻译的性能、IWSLT14 DE-EN数据集验证 这个项目是一个用于神经机器翻译的语义一致性数据增强方法,基于条件掩码语言模型(CMLM)。该项目是从fairseq v0.8.0 fork 而来。 主要功能点 提出了一种基于CMLM的语义一致性数据增强方法,用于提高神经机器翻译的性能。 在IWSLT14 DE-EN数据集上进行了实验验证,展示了该方法的有效性。 提供了训练CMLM模型和使用CMLM进行数据增强的代码示例。 技术栈 Python fairseq 框架
资源推荐
资源详情
资源评论
收起资源包目录
基于CMLM的语义一致性数据增强方法python实现源码(提高神经机器翻译的性能、IWSLT14 DE-EN数据集验证).zip (308个子文件)
make.bat 805B
docutils.conf 25B
libbleu.cpp 3KB
module.cpp 791B
theme_overrides.css 192B
fairseq.gif 2.54MB
.gitignore 16B
LICENSE 1KB
convert_model.lua 3KB
convert_dictionary.lua 787B
Makefile 607B
README.md 13KB
README.md 10KB
README.md 8KB
README.md 5KB
README.glue.md 4KB
README.custom_classification.md 4KB
README.cqa.md 4KB
README.md 4KB
README.md 4KB
README.wsc.md 4KB
README.md 3KB
README.md 3KB
README.md 3KB
README.race.md 2KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 1KB
README.md 1KB
CONTRIBUTING.md 1KB
README.md 625B
CODE_OF_CONDUCT.md 241B
fairseq_logo.png 71KB
bert.py 52KB
lightconv.py 33KB
transformer.py 33KB
vggtransformer.py 30KB
options.py 28KB
fconv.py 27KB
sequence_generator.py 25KB
lstm.py 24KB
fconv_self_att.py 23KB
test_binaries.py 22KB
trainer.py 20KB
test_noising.py 19KB
semisupervised_translation.py 19KB
asr_test_base.py 19KB
wav2vec.py 18KB
multilingual_translation.py 16KB
indexed_dataset.py 16KB
fairseq_model.py 15KB
masked_lm.py 15KB
test_sequence_generator.py 15KB
transformer_lm.py 14KB
multihead_attention.py 14KB
fp16_optimizer.py 14KB
model.py 14KB
train.py 13KB
train.py 13KB
adam.py 13KB
checkpoint_utils.py 13KB
block_pair_dataset.py 13KB
masked_lm_dataset.py 12KB
noising.py 12KB
language_pair_dataset.py 12KB
transformer_layer.py 12KB
utils.py 11KB
search.py 11KB
dictionary.py 11KB
iterators.py 10KB
file_utils.py 10KB
fairseq_task.py 10KB
preprocess.py 10KB
preprocess.py 10KB
dynamic_convolution.py 10KB
lightconv_lm.py 10KB
progress_bar.py 10KB
downsampled_multihead_attention.py 10KB
adafactor.py 9KB
lightweight_convolution.py 9KB
wsc_task.py 9KB
multilingual_transformer.py 9KB
language_modeling.py 9KB
translation_moe.py 9KB
data_utils.py 9KB
token_block_dataset.py 8KB
translation.py 8KB
cmlm_dataset.py 8KB
masked_lm.py 8KB
hub_interface.py 8KB
eval_lm.py 8KB
eval_lm.py 8KB
transformer_sentence_encoder.py 8KB
infer.py 8KB
wsc_utils.py 8KB
hub_utils.py 7KB
generate.py 7KB
generate.py 7KB
monolingual_dataset.py 7KB
共 308 条
- 1
- 2
- 3
- 4
资源评论
.whl
- 粉丝: 3824
- 资源: 4648
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功