# PyTorch implementation of OpenAI's Finetuned Transformer Language Model
This is a PyTorch implementation of the [TensorFlow code](https://github.com/openai/finetune-transformer-lm) provided with OpenAI's paper ["Improving Language Understanding by Generative Pre-Training"](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
This implementation comprises **a script to load in the PyTorch model the weights pre-trained by the authors** with the TensorFlow implementation.
![Transformer Language Model](assets/ftlm.png)
The model classes and loading script are located in [model_pytorch.py](model_pytorch.py).
The names of the modules in the PyTorch model follow the names of the Variable in the TensorFlow implementation. This implementation tries to follow the original code as closely as possible to minimize the discrepancies.
This implementation thus also comprises a modified Adam optimization algorithm as used in OpenAI's paper with:
- fixed weights decay following the work of [Loshchilov et al.](https://arxiv.org/abs/1711.05101), and
- scheduled learning rate as [commonly used for Transformers](http://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer).
## Requirements
To use the model it-self by importing [model_pytorch.py](model_pytorch.py), you just need:
- PyTorch (version >=0.4)
To run the classifier training script in [train.py](train.py) you will need in addition:
- tqdm
- sklearn
- spacy
- ftfy
- pandas
You can download the weights of the OpenAI pre-trained version by cloning [Alec Radford's repo](https://github.com/openai/finetune-transformer-lm) and placing the `model` folder containing the pre-trained weights in the present repo.
## Using the pre-trained model as a Transformer Language Model
The model can be used as a transformer language model with OpenAI's pre-trained weights as follow:
```python
from model_pytorch import TransformerModel, load_openai_pretrained_model, DEFAULT_CONFIG
args = DEFAULT_CONFIG
model = TransformerModel(args)
load_openai_pretrained_model(model)
```
This model generates Transformer's hidden states. You can use the `LMHead` class in [model_pytorch.py](model_pytorch.py) to add a decoder tied with the weights of the encoder and get a full language model. You can also use the `ClfHead` class in [model_pytorch.py](model_pytorch.py) to add a classifier on top of the transformer and get a classifier as described in OpenAI's publication. (see an example of both in the `__main__` function of [train.py](train.py))
To use the positional encoder of the transformer, you should encode your dataset using the `encode_dataset()` function of [utils.py](utils.py). Please refer to the beginning of the `__main__` function in [train.py](train.py) to see how to properly define the vocabulary and encode your dataset.
## Fine-tuning the pre-trained model on a classification task
This model can also be integrated in a classifier as detailed in [OpenAI's paper](https://blog.openai.com/language-unsupervised/). An example of fine-tuning on the ROCStories Cloze task is included with the training code in [train.py](train.py)
The ROCStories dataset can be downloaded from the associated [website](http://cs.rochester.edu/nlp/rocstories/).
As with the [TensorFlow code](https://github.com/openai/finetune-transformer-lm), this code implements the ROCStories Cloze Test result reported in the paper which can be reproduced by running:
```bash
python -m spacy download en
python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]
```
#### First experiments on the ROCStories test set
Finetuning the PyTorch model for 3 Epochs on ROCStories takes 10 minutes to run on a single NVidia K-80.
The single run test accuracy of this PyTorch version is 85.84%, while the authors reports a median accuracy with the TensorFlow code of 85.8% and the paper reports a best single run accuracy of 86.5%.
The authors implementations uses 8 GPU and can thus accomodate a batch of 64 samples while the present implementation is single GPU and is in consequence limited to 20 instances on a K80 for memory reasons. In our test, increasing the batch size from 8 to 20 samples increased the test accuracy by 2.5 points. A better accuracy may be obtained by using a multi-GPU setting (not tried yet).
The previous SOTA on the ROCStories dataset is 77.6% ("Hidden Coherence Model" of Chaturvedi et al. published in "Story Comprehension for Predicting What Happens Next" EMNLP 2017, which is a very nice paper too!)
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
用pytorch实现了基于方面的情感分析中的一些经典模型,比如atae-lstm、acsa、bilstm_att_g等。 atae_lstm 77.86/65.59 68.34/62.64 acsa_gcae 78.12/65.59 70.85/64.66 bilstm_att_g 76.34/63.65 69.91/63.20 ram 78.66/66.66 73.82/68.80 tnet 78.93/63.65 72.57/65.13
资源推荐
资源详情
资源评论
收起资源包目录
基于方面的情感分析pytorch实现 (117个子文件)
.gitignore 1KB
parameters_names.json 4KB
parameters_names.json 4KB
parameters_names.json 4KB
LICENSE 1KB
LICENSE 1KB
README.md 4KB
README.md 4KB
readme.md 2KB
model_pytorch_gate.py 14KB
model_pytorch_mem.py 14KB
model_pytorch.py 13KB
model_pytorch_cnn.py 13KB
model_pytorch_lstm.py 13KB
model_pytorch.py 13KB
preproc.py 11KB
preproc.py 11KB
preproc.py 11KB
preproc.py 11KB
train.py 11KB
train.py 11KB
main.py 10KB
main.py 10KB
preproc.py 10KB
main.py 9KB
preproc.py 9KB
preproc.py 9KB
preproc.py 9KB
preproc.py 9KB
main.py 9KB
main.py 9KB
preproc.py 9KB
main.py 9KB
preproc.py 9KB
main.py 9KB
main.py 9KB
main.py 8KB
preproc.py 8KB
main.py 8KB
main.py 8KB
main.py 7KB
acsa_gcae_g.py 6KB
bilstm_att_g.py 6KB
basic.py 5KB
basic.py 5KB
basic.py 5KB
basic.py 5KB
basic.py 5KB
tnet.py 5KB
bigru_vp_attention.py 5KB
opt.py 4KB
opt.py 4KB
datasets.py 4KB
datasets.py 4KB
text_utils.py 4KB
text_utils.py 4KB
config.py 3KB
hair.py 3KB
basic.py 3KB
basic.py 3KB
basic.py 3KB
basic.py 3KB
basic.py 3KB
basic.py 3KB
gcm.py 3KB
config.py 3KB
config.py 3KB
basic.py 3KB
config.py 3KB
config.py 3KB
config.py 3KB
config.py 3KB
config.py 3KB
config.py 3KB
atae_lstm.py 3KB
config.py 3KB
config.py 3KB
config.py 3KB
loss.py 3KB
utils.py 3KB
utils.py 3KB
loss.py 3KB
bigru_gate.py 3KB
acsa_gcae.py 3KB
ram.py 2KB
cnn_gate.py 2KB
memnet.py 2KB
bigru_attention.py 2KB
dataset.py 2KB
dataset.py 2KB
dataset.py 2KB
dataset.py 2KB
lstm.py 2KB
dataset.py 1KB
dataset.py 1KB
dataset.py 1KB
seq.py 1KB
dataset.py 1KB
read_bad_case.py 1KB
dataset.py 1KB
共 117 条
- 1
- 2
资源评论
cppowboy
- 粉丝: 4
- 资源: 8
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功