人工智能-信息检索-检索系统-基于知识图谱和向量检索的医疗诊断问答系统资源-CSDN文库

共187个文件

py：71个

png：24个

pyc：22个

版权申诉

flask

知识图谱

健康医疗

python

信息检索

127 浏览量 2024-02-29 09:33:55 上传评论 1 收藏 23.61MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

人工智能-信息检索-检索系统-基于知识图谱和向量检索的医疗诊断问答系统（187个子文件）

run_ner_service.bat 64B

run_intent_recog_service.bat 60B

train.csv 7.02MB

train.csv 4.22MB

test.csv 718KB

dev.csv 715KB

train.csv 599KB

test.csv 67KB

.gitignore 0B

best_bilstm_crf_model.h5 5.81MB

bilstm_crf_model.h5 5.81MB

seq.in 603KB

seq.in 283KB

seq.in 51KB

seq.in 33KB

seq.in 32KB

in_vocab 94KB

intent_vocab 118B

微信图片_20210604135949.jpg 202KB

微信图片_20210604135943.jpg 180KB

微信图片_20210604135937.jpg 116KB

itchat登入.jpg 103KB

zsxq.jpg 89KB

diseases.json 234KB

eval.json 207KB

bert_sim_model.json 44KB

bert_sim_encoder.json 42KB

label 180KB

label 56KB

label 11KB

label 10KB

label 6KB

label 165B

LICENSE 11KB

README.md 40KB

实体关系抽取代表性SOTA论文速读.md 18KB

README.md 17KB

multilingual.md 11KB

介绍几个意图识别和槽位填充联合训练模型.md 8KB

README.MD 3KB

CONTRIBUTING.md 1KB

seq.out 929KB

seq.out 406KB

seq.out 79KB

seq.out 50KB

seq.out 49KB

seq.out 47KB

2020.emnlp-main.116.pdf 916KB

A Novel Cascade Binary Tagging Framework for.pdf 704KB

A Unified MRC Framework for Named Entity Recognition.pdf 675KB

TPLinker.pdf 489KB

gbdt.pkl 5.62MB

vec.pkl 65KB

LR.pkl 55KB

svc_clf.pkl 47KB

word_tag_id.pkl 37KB

id2label.pkl 98B

conlleval.pl 13KB

casrel.png 682KB

tplinker2.png 406KB

spent.png 329KB

tplinker1.png 296KB

mintent3.png 283KB

4.png 273KB

mintent2.png 260KB

5.png 243KB

mrcquery.png 184KB

atis-res.png 165KB

spent2.png 148KB

spanlen.png 146KB

bert-slot.png 145KB

slot2.png 134KB

slot1.png 116KB

snips-res.png 100KB

mintent.png 98KB

mrc.png 82KB

mrcqa.png 82KB

attn1.png 69KB

sgate.png 37KB

zhizhen.png 29KB

run_squad.py 45KB

modeling.py 37KB

crf_layer.py 32KB

run_classifier.py 31KB

__init__.py 30KB

bert_lstm_ner.py 27KB

thu_classification.py 25KB

extract_features.py 19KB

__init__.py 18KB

run_pretraining.py 18KB

graph.py 17KB

create_pretraining_data.py 15KB

metrics.py 14KB

build_kg_utils.py 13KB

共 187 条

# BERT **\*\*\*\*\* New November 15th, 2018: SOTA SQuAD 2.0 System \*\*\*\*\*** We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. **\*\*\*\*\* New November 5th, 2018: Third-party PyTorch and Chainer versions of BERT available \*\*\*\*\*** NLP researchers from HuggingFace made a [PyTorch version of BERT available](https://github.com/huggingface/pytorch-pretrained-BERT) which is compatible with our pre-trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a [Chainer version of BERT available](https://github.com/soskek/bert-chainer) (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. **\*\*\*\*\* New November 3rd, 2018: Multilingual and Chinese models available \*\*\*\*\*** We have made two new BERT models available: * **[`BERT-Base, Multilingual`](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip)**: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters * **[`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)**: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters We use character-based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out-of-the-box without any code changes. We did update the implementation of `BasicTokenizer` in `tokenization.py` to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the [Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md). **\*\*\*\*\* End new information \*\*\*\*\*** ## Introduction **BERT**, or **B**idirectional **E**ncoder **R**epresentations from **T**ransformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: [https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805). To give a few numbers, here are the results on the [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) | Test EM | Test F1 ------------------------------------- | :------: | :------: 1st Place Ensemble - BERT | **87.4** | **93.2** 2nd Place Ensemble - nlnet | 86.0 | 91.7 1st Place Single Model - BERT | **85.1** | **91.8** 2nd Place Single Model - nlnet | 83.5 | 90.1 And several natural language inference tasks: System | MultiNLI | Question NLI | SWAG ----------------------- | :------: | :----------: | :------: BERT | **86.7** | **91.1** | **86.3** OpenAI GPT (Prev. SOTA) | 82.2 | 88.1 | 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task-specific neural network architecture design. If you already know what BERT is and you just want to get started, you can [download the pre-trained models](#pre-trained-models) and [run a state-of-the-art fine-tuning](#fine-tuning-with-bert) in only a few minutes. ## What is BERT? BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first *unsupervised*, *deeply bidirectional* system for pre-training NLP. *Unsupervised* means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre-trained representations can also either be *context-free* or *contextual*, and contextual representations can further be *unidirectional* or *bidirectional*. Context-free models such as [word2vec](https://www.tensorflow.org/tutorials/representation/word2vec) or [GloVe](https://nlp.stanford.edu/projects/glove/) generate a single "word embedding" representation for each word in the vocabulary, so `bank` would have the same representation in `bank deposit` and `river bank`. Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre-training contextual representations — including [Semi-supervised Sequence Learning](https://arxiv.org/abs/1511.01432), [Generative Pre-Training](https://blog.openai.com/language-unsupervised/), [ELMo](https://allennlp.org/elmo), and [ULMFit](http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html) — but crucially these models are all *unidirectional* or *shallowly bidirectional*. This means that each word is only contextualized using the words to its left (or right). For example, in the sentence `I made a bank deposit` the unidirectional representation of `bank` is only based on `I made a` but not `deposit`. Some previous work does combine the representations from separate left-context and right-context models, but only in a "shallow" manner. BERT represents "bank" using both its left and right context — `I made a ... deposit` — starting from the very bottom of a deep neural network, so it is *deeply bidirectional*. BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional [Transformer](https://arxiv.org/abs/1706.03762) encoder, and then predict only the masked words. For example: ``` Input: the man went to the [MASK1] . he bought a [MASK2] of milk. Labels: [MASK1] = store; [MASK2] = gallon ``` In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences `A` and `B`, is `B` the actual next sentence that comes after `A`, or just a random sentence from the corpus? ``` Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence ``` ``` Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence ``` We then train a large model (12-layer to 24-layer Transformer) on a large corpus (Wikipedia + [BookCorpus](http://yknzhu.wixsite.com/mbweb)) for a long time (1M update steps), and that's BERT. Using BERT has two stages: *Pre-training* and *fine-tuning*. **Pre-training** is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one-time procedure for each language (current models are English-only, but multilingual models will be released in the near future). We are releasing a number of pre-trained models from the paper which were pre-trained at Google. Most NLP researchers will never need to pre-train their own model from scratch. **Fine-tuning** is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre-trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state-of-the-art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state-of-the-art results on sentence-level (e.g., SST-2), sentence-pair-level (e.g., MultiNLI), word-level (e.g., NER), and span-level (e.g., SQuAD) tasks with almost no task-specific modifications. ## What has been released in this repository? We are releasing the following: * TensorFlow code for the BERT model architecture (

评论收藏

内容反馈

版权申诉