## Introduction
This directory contains our TF implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our gpu codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
- `*large_tpu.sh` are for the SoTA setting on TPUs. These are exactly the commands we used to obtained our best results.
- `*base_gpu.sh` are for the base models which can be run on a few GPUs.
## Prerequisite
- Python 2.7
- Tensorflow [1.12.0](https://github.com/tensorflow/tensorflow/releases/tag/v1.12.0)
## Obtain and evaluate pretrained SoTA models
#### 1. Download preprocessed data (vocab) & pretrained models
(a) Set your own `DATA_ROOT` in `sota/download.sh` (default to `./`), which will be the root diretory of downloaded model.
(b) Then, download the model & data by `bash sota/download.sh`. After downloading, the expected directory structure is as follows
```markdown
pretrained_xl
tf_enwik8/
data/
cache.pkl
corpus-info.json
model/
checkpoint
model.ckpt*
tf_wt103/
...
...
```
**Note**: we include preprocessed data in the download files to make sure the **same vocabulary** is used. Please see the code `tf/data_utils.py` to understand the data structure.
#### 2. Run evaluation scripts to replicate SoTA results on GPUs
- **enwik8**: modify the script `sota/enwik8.sh` accordingly (see below)
- set `DATA_ROOT` to the same folder used in the download step (default to `./`)
- set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 2 GPUs => about 60 mins
- run the script: `bash sota/enwik8.sh`
- **lm1b**: modify the script `sota/lm1b.sh` accordingly (see below)
- set `DATA_ROOT` to the same folder used in the download step (default to `./`)
- set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 1 GPUs => less than 5 mins
- run the script: `bash sota/lm1b.sh`
- **wt103**: modify the script `sota/wt103.sh` accordingly (see below)
- set `DATA_ROOT` to the same folder used in the download step (default to `./`)
- set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 1 GPUs => less than 5 mins
- run the script: `bash sota/wt103.sh`
- **text8**: modify the script `sota/text8.sh` accordingly (see below)
- set `DATA_ROOT` to the same folder used in the download step (default to `./`)
- set `TEST_NUM_CORE ` (number of GPUs to use): we recommend 2 GPUs => about 60 mins
- run the script: `bash sota/text8.sh`
#### 3. Resources Needed for SoTA Model Training
We used 32, 32, 64, and 512 TPU cores for training our best models on enwik8, text8, wt103, and lm1b respectively. The training time for each model ranges from 2 to 5 days.
## Train "Transformer-XL" from scratch with GPUs or TPUs
### 1. Download raw data
`bash getdata.sh`
### 2. Preprocess, training and evaluation
For `dataset` in `[enwik8, lm1b, wt103, text8]`:
- check out `scripts/dataset_base_gpu.sh` for GPU training and evaluation
- check out `scripts/dataset_large_tpu.sh` for TPU training and evaluation
#### (1) Preprocess raw data and create tfrecords
**NOTE**: The preprocessing for GPU and TPU are different. So, you have to run them separately.
GPU:
- create training and validation data: `bash scripts/dataset_bas_gpu.sh train_data`
- create test data: `bash scripts/dataset_base_gpu.sh test_data`
TPU:
- Set the Google storage URL in `scripts/dataset_large_tpu.sh`:
- `GSDATA`: data URL
- `GSEXP`: experiment URL
- create training and validation data: `bash scripts/dataset_large_tpu.sh train_data`
- create test data: `bash scripts/dataset_large_tpu.sh test_data`
#### (2) Run training
Base models on GPUs:
- Modify the configurations in `scripts/dataset_base_gpu.sh` according to your needs.
- `bash scripts/dataset_base_gpu.sh train`
- If enough resources are available, increasing the model sizes (e.g., `N_LAYER`, `D_MODEL`, `D_EMBED`, `D_HEAD`, `D_INNER`) so that they are closer to the values defined in `scripts/dataset_large_tpu.sh`. Likewise, when resources are limited, decrease the model sizes. It is recommended to ensure that `D_MODEL == D_EMBED` and `D_MODEL == N_HEAD x D_HEAD`. When the model sizes increase, remember to increase `warmup_steps` accordingly to alleviate optimization difficulties.
- Adjust the `NUM_CORE` parameter to reflect the number of GPUs to use.
Larger models on TPUs:
- Modify the configurations in `scripts/dataset_large_tpu.sh` according to your needs.
- `bash scripts/dataset_large_tpu.sh train`
#### (3) Run evaluation
Base models on GPUs:
- `bash scripts/dataset_base_gpu.sh eval --eval_ckpt_path PATH_TO_CKPT`
Larger models on TPUs:
- `bash scripts/dataset_base_tpu.sh eval --eval_ckpt_path PATH_TO_CKPT`
没有合适的资源?快使用搜索试试~ 我知道了~
transformer-xl-master_xl_transformer_
共53个文件
sh:23个
py:20个
xml:4个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
1星 1 下载量 134 浏览量
2021-09-29
02:04:13
上传
评论
收藏 109KB RAR 举报
温馨提示
基于transfomer改进的transfomer-xl,适合长文本的输入,拓宽了使用的字数限制
资源推荐
资源详情
资源评论
收起资源包目录
transformer-xl-master.rar (53个子文件)
transformer-xl-master
getdata_use.sh 1KB
prep_text8.py 939B
getdata.sh 3KB
.idea
misc.xml 294B
transformer-xl-master.iml 464B
workspace.xml 19KB
encodings.xml 138B
modules.xml 301B
LICENSE 11KB
README.md 1KB
pytorch
run_lm1b_base.sh 989B
run_text8_base.sh 920B
.DS_Store 6KB
train.py 24KB
utils
proj_adaptive_softmax.py 6KB
log_uniform_sampler.py 5KB
vocabulary.py 6KB
data_parallel.py 4KB
exp_utils.py 1KB
adaptive_softmax.py 3KB
data_utils.py 10KB
run_wt103_base.sh 955B
run_enwik8_large.sh 933B
test.py 1KB
run_enwik8_base.sh 924B
mem_transformer.py 33KB
eval.py 4KB
README.md 3KB
run_wt103_large.sh 987B
run_lm1b_large.sh 990B
run_text8_large.sh 855B
tf
train.py 16KB
data_utils.py 20KB
scripts
wt103_base_gpu.sh 3KB
enwik8_large_tpu.sh 3KB
lm1b_base_gpu.sh 3KB
wt103_large_tpu.sh 4KB
text8_large_tpu.sh 3KB
lm1b_large_tpu.sh 4KB
text8_base_gpu.sh 2KB
enwik8_base_gpu.sh 2KB
gpu_utils.py 2KB
model.py 21KB
sota
wt103.sh 1KB
text8.sh 1KB
download.sh 2KB
lm1b.sh 1KB
enwik8.sh 1KB
train_gpu.py 16KB
tpu_estimator.py 137KB
avg_checkpoints.py 5KB
README.md 5KB
vocabulary.py 5KB
共 53 条
- 1
资源评论
- m0_626779982022-04-09你这不是入侵检测啊,资源发错了吧
心若悬河
- 粉丝: 52
- 资源: 3957
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功