# SeqMatchSeq
Implementations of three models described in the three papers related to sequence matching:
- [Learning Natural Language Inference with Lstm](https://arxiv.org/abs/1512.08849) by Shuohang Wang, Jing Jiang
- [Machine Comprehension Using Match-LSTM and Answer Pointer](https://arxiv.org/abs/1608.07905) by Shuohang Wang, Jing Jiang
- [A Compare-Aggregate Model for Matching Text Sequences](https://arxiv.org/abs/1611.01747) by Shuohang Wang, Jing Jiang
# Learning Natural Language Inference with Lstm
### Requirements
- [Torch7](https://github.com/torch/torch7)
- [nn](https://github.com/torch/nn)
- [nngraph](https://github.com/torch/nngraph)
- [optim](https://github.com/torch/optim)
- Python 2.7
### Datasets
- [The Stanford Natural Language Inference (SNLI) Corpus](http://nlp.stanford.edu/projects/snli/)
- [GloVe: Global Vectors for Word Representation](http://nlp.stanford.edu/data/glove.840B.300d.zip)
### Usage
```
sh preprocess.sh snli
cd main
th main.lua -task snli -model mLSTM -dropoutP 0.3 -num_classes 3
```
`sh preprocess.sh snli` will download the datasets and preprocess the SNLI corpus into the files
(train.txt dev.txt test.txt) under the path "data/snli/sequence" with the format:
>sequence1(premise) \t sequence2(hypothesis) \t label(from 1 to num_classes) \n
`main.lua` will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. "dropoutP" is the main prarameter we tuned.
### Docker
You may try to use Docker for running the code.
- [Docker Install](https://github.com/codalab/codalab-worksheets/wiki/Installing-Docker)
- [Image](https://hub.docker.com/r/shuohang/seqmatchseq/): docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
```
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua"
```
# Machine Comprehension Using Match-LSTM and Answer Pointer
### Requirements
- [Torch7](https://github.com/torch/torch7)
- [nn](https://github.com/torch/nn)
- [nngraph](https://github.com/torch/nngraph)
- [optim](https://github.com/torch/optim)
- [parallel](https://github.com/clementfarabet/lua---parallel)
- Python 2.7
- Python Packages: [NLTK](http://www.nltk.org/install.html), collections, json, argparse
- [NLTK Data](http://www.nltk.org/data.html): punkt
- Multiple-cores CPU
### Datasets
- [Stanford Question Answering Dataset (SQuAD)](https://rajpurkar.github.io/SQuAD-explorer/)
- [GloVe: Global Vectors for Word Representation](http://nlp.stanford.edu/data/glove.840B.300d.zip)
### Usage
```
sh preprocess.sh squad
cd main
th mainDt.lua
```
`sh preprocess.sh squad` will download the datasets and preprocess the SQuAD corpus into the files
(train.txt dev.txt) under the path "data/squad/sequence" with the format:
>sequence1(Doument) \t sequence2(Question) \t sequence of the positions where the answer appear
in Document (e.g. 3 4 5 6) \n
`mainDt.lua` will first initialize the preprossed data and word embeddings into a Torch format and
then run the alogrithm. As this code is run through multiple CPU cores, the initial parameters are
written in the file "main/init.lua".
- `opt.num_processes`: 5. The number of threads used.
- `opt.batch_size` : 6. Batch size for each thread. (Then the mini_batch would be 5*6 .)
- `opt.model` : boundaryMPtr / sequenceMPtr
## Docker
You may try to use Docker for running the code.
- [Docker Install](https://github.com/codalab/codalab-worksheets/wiki/Installing-Docker)
- [Image](https://hub.docker.com/r/shuohang/seqmatchseq/): docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
```
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh squad"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th mainDt.lua"
```
# A Compare-Aggregate Model for Matching Text Sequences
### Requirements
- [Torch7](https://github.com/torch/torch7)
- [nn](https://github.com/torch/nn)
- [nngraph](https://github.com/torch/nngraph)
- [optim](https://github.com/torch/optim)
- Python 2.7
### Datasets
- [The Stanford Natural Language Inference (SNLI) Corpus](http://nlp.stanford.edu/projects/snli/)
- [MovieQA: Story Understanding Benchmark](http://movieqa.cs.toronto.edu/home/)
- [InsuranceQA Corpus V1: Answer Selection Task](https://github.com/shuzi/insuranceQA)
- [WikiQA: A Challenge Dataset for Open-Domain Question Answering](https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-dataset-for-open-domain-question-answering/)
- [GloVe: Global Vectors for Word Representation](http://nlp.stanford.edu/data/glove.840B.300d.zip)
For now, this code only support SNLI and WikiQA data sets.
### Usage
SNLI task (The preprocessed format follows the previous description):
```
sh preprocess.sh snli
cd main
th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3
```
WikiQA task:
```
sh preprocess.sh wikiqa (Please first dowload the file "WikiQACorpus.zip" to the path SeqMatchSeq/data/wikiqa/ through address: https://www.microsoft.com/en-us/download/details.aspx?id=52419)
cd main
th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150
```
- `model` (model name) : compAggSNLI / compAggWikiqa
- `comp_type` (8 different types of word comparison): submul / sub / mul / weightsub / weightmul / bilinear / concate / cos
### Docker
You may try to use Docker for running the code.
- [Docker Install](https://github.com/codalab/codalab-worksheets/wiki/Installing-Docker)
- [Image](https://hub.docker.com/r/shuohang/seqmatchseq/): docker pull shuohang/seqmatchseq:1.0
After installation, just run the following codes (/PATH/SeqMatchSeq need to change):
For SNLI:
```
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh snli"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task snli -model compAggSNLI -comp_type submul -learning_rate 0.002 -mem_dim 150 -dropoutP 0.3"
```
For WikiQA
```
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt shuohang/seqmatchseq:1.0 /bin/bash -c "sh preprocess.sh wikiqa"
docker run -it -v /PATH/SeqMatchSeq:/opt --rm -w /opt/main shuohang/seqmatchseq:1.0 /bin/bash -c "th main.lua -task wikiqa -model compAggWikiqa -comp_type mul -learning_rate 0.004 -dropoutP 0.04 -batch_size 10 -mem_dim 150"
```
# Copyright
Copyright 2015 Singapore Management University (SMU). All Rights Reserved.
没有合适的资源?快使用搜索试试~ 我知道了~
compare-aggregate模型 基于文档的问答系统(基于深度attention_LSTM)
1星 需积分: 10 23 下载量 17 浏览量
2017-12-11
09:50:48
上传
评论
收藏 111KB GZ 举报
温馨提示
共63个文件
sample:18个
lua:17个
head:5个
对论文A Compare-Aggregate Model for Matching Text Sequences中模型进行复现 并增加了attention 代码基于python tensorflow cpu版本
资源推荐
资源详情
资源评论
收起资源包目录
compare-aggregate.tar.gz (63个子文件)
compare-aggregate
.git
info
exclude 240B
objects
pack
info
HEAD 23B
config 92B
refs
heads
tags
branches
hooks
pre-applypatch.sample 424B
pre-push.sample 1KB
commit-msg.sample 896B
pre-commit.sample 2KB
applypatch-msg.sample 478B
prepare-commit-msg.sample 1KB
update.sample 4KB
post-update.sample 189B
pre-rebase.sample 5KB
description 73B
SeqMatchSeq
preprocess.sh 3KB
models
LSTM.lua 4KB
LSTMwwatten.lua 5KB
Embedding.lua 3KB
CNNwwSimatten.lua 5KB
pointNet.lua 8KB
snli
compAggSNLI.lua 8KB
mLSTM.lua 8KB
test.py 248B
nn
DMax.lua 3KB
CAddRepTable.lua 2KB
.idea
misc.xml 212B
workspace.xml 32KB
SeqMatchSeq.iml 459B
inspectionProfiles
profiles_settings.xml 228B
modules.xml 274B
vcs.xml 180B
wikiqa
compAggWikiqa.lua 15KB
util
loadFiles.lua 12KB
utils.lua 3KB
README.md 7KB
.git
logs
HEAD 189B
refs
heads
master 189B
remotes
origin
HEAD 189B
packed-refs 107B
info
exclude 240B
index 2KB
objects
pack
pack-38e5bf7bf935ab68eef544fd5190d444c96fe964.pack 60KB
pack-38e5bf7bf935ab68eef544fd5190d444c96fe964.idx 6KB
info
HEAD 23B
config 272B
refs
heads
master 41B
tags
remotes
origin
HEAD 32B
branches
hooks
pre-applypatch.sample 424B
pre-push.sample 1KB
commit-msg.sample 896B
pre-commit.sample 2KB
applypatch-msg.sample 478B
prepare-commit-msg.sample 1KB
update.sample 4KB
post-update.sample 189B
pre-rebase.sample 5KB
description 73B
trainedmodel
evaluation
squad
txt2js.py 2KB
squad
sequenceMPtr.lua 12KB
boundaryMPtr.lua 13KB
main
mainDt.lua 6KB
main.lua 4KB
init.lua 2KB
preprocess.py 4KB
共 63 条
- 1
资源评论
- 紫陌毛毛2018-07-16骗人的,明明是作者github上的torch版本却说是tensorflow版本的,讨厌
-一世纪末
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功