# knowledge-driven-dialogue-2019-lic
2019语言与智能技术竞赛[知识驱动对话](http://lic2019.ccf.org.cn/talk) B榜第5名方案<br>
由于线上部署对时间有要求,最终提交人工评估的版本删掉了一些全局主题特征,导致模型结果有所下降,最终人工评估第9名。A榜第四 B榜第五
## Overview
For building a proactive dialogue chatbot, we used a so-called generation-reranking method. First, the generative models(Multi-Seq2Seq) produce some candidate replies. Next, the re-ranking model is responsible for performing query-answer matching, to choice a reply as informative as possible over the produced candidates. A detailed paper to describle our solution is now avaliable at https://arxiv.org/pdf/1907.03590.pdf, please check.
### Data Augmentation
We used four data augmentation techniques, Entity Generalization,Knowledge Selection,Switch,Conversation Extraction to construct multiple different dataset for training Seq2Seq models. One can use the scripts Seq2Seq/preclean_*.py to with slight modification of parameters to get 6 datasets.
### Seq2Seq Model
For ensemble purpose we choose different encoders and decoders, i.e. LSTM cells and the Transformer. <br>
#### Training
- python preprocess.py
- python train.py
#### Testing
python translate.py <br>
All the config file of training & testing can be easily modified in the config/*.yml <br>
In total, we trained 27 Seq2Seq model for ensemble.
### Answer rank
We used a GBDT regressor for ranking. One may arugue that Why not use a neural network, such as BERT for ranking. Actually We tried, but it doesn't work well.
#### Creating ranking dataset
python create_gbdt_dataset.py
#### Feature extraction
python feature_util_multiprocess.py <br>
The feature extractions partly reference the [Kaggle_HomeDepot](https://github.com/ChenglongChen/Kaggle_HomeDepot) by ChenglongChen
### Checkpoints
It might take some extra time to upload the checkpoints because they are rather large in size.
没有合适的资源?快使用搜索试试~ 我知道了~
2019语言与智能技术竞赛-知识驱动对话B榜第5名源码和模型.zip
共298个文件
pyc:126个
py:117个
md:8个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 23 浏览量
2023-10-22
20:55:00
上传
评论
收藏 37.21MB ZIP 举报
温馨提示
2019语言与智能技术竞赛-知识驱动对话B榜第5名源码和模型.zip
资源推荐
资源详情
资源评论
收起资源包目录
2019语言与智能技术竞赛-知识驱动对话B榜第5名源码和模型.zip (298个子文件)
nonbreaking_prefix.ca 249B
nonbreaking_prefix.cs 2KB
nonbreaking_prefix.de 2KB
nonbreaking_prefix.el 17KB
nonbreaking_prefix.en 1KB
nonbreaking_prefix.es 835B
nonbreaking_prefix.fi 1KB
nonbreaking_prefix.fr 1009B
nonbreaking_prefix.ga 171B
nonbreaking_prefix.hu 1KB
nonbreaking_prefix.is 1KB
nonbreaking_prefix.it 2KB
nonbreaking_prefix.lt 6KB
nonbreaking_prefix.lv 1KB
README.md 2KB
README.md 184B
README.md 78B
README.md 53B
README.md 46B
ss.md 37B
README.md 32B
README.md 0B
Google-word2vec-100d.model 5.17MB
nonbreaking_prefix.nl 2KB
tokenizer.perl 16KB
detokenize.perl 7KB
multi-bleu-detok.perl 6KB
multi-bleu.perl 5KB
nonbreaking_prefix.pl 1KB
test_model.pt 25.76MB
test_model2.pt 10.5MB
feature_util_multiprocess.py 72KB
feature_util.py 41KB
opts.py 37KB
translator.py 32KB
test_beam_search.py 24KB
sru.py 24KB
inputter.py 23KB
optimizers.py 19KB
translation_server.py 18KB
test_beam.py 18KB
trainer.py 17KB
decoder.py 15KB
test_random_sampling.py 13KB
preclean_baidu.py 13KB
beam_search.py 13KB
preclean_ab.py 12KB
preclean_baidu_aug.py 12KB
loss.py 11KB
test_models.py 11KB
beam.py 11KB
apply_bpe.py 11KB
preclean.py 10KB
learn_bpe.py 10KB
weight_norm.py 10KB
test_audio_dataset.py 9KB
embeddings.py 9KB
feature_base.py 9KB
copy_generator.py 9KB
transformer.py 8KB
audio_dataset.py 8KB
test_translation_server.py 8KB
multi_headed_attn.py 8KB
model_builder.py 8KB
config.py 8KB
global_attention.py 8KB
test_text_dataset.py 7KB
text_dataset.py 7KB
dataset_base.py 6KB
random_sampling.py 6KB
test_preprocess.py 6KB
embeddings_to_torch.py 6KB
translation.py 6KB
test_embeddings.py 6KB
ngram_utils.py 6KB
create_dataset.py 6KB
ensemble.py 6KB
audio_encoder.py 6KB
earlystopping.py 6KB
test_copy_generator.py 5KB
decode_strategy.py 5KB
report_manager.py 5KB
preprocess.py 5KB
create_gbdt_dataset.py 5KB
eval.py 5KB
parse.py 5KB
cnn_decoder.py 5KB
image_encoder.py 5KB
statistics.py 4KB
rnn_encoder.py 4KB
transformer.py 4KB
train_single.py 4KB
average_attn.py 4KB
eval.py 4KB
distributed.py 4KB
misc.py 4KB
model_saver.py 4KB
penalties.py 4KB
train.py 4KB
test_image_dataset.py 4KB
共 298 条
- 1
- 2
- 3
资源评论
天天501
- 粉丝: 599
- 资源: 4666
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功