python171医疗领域用户问答的意图识别算法研究

共529个文件

jar：79个

gif：75个

py：59个

版权申诉

python

项目源码

34 浏览量 2023-07-18 23:08:33 上传评论收藏 181.03MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

python171医疗领域用户问答的意图识别算法研究_django.zip （529个子文件）

neostore.transaction.db.0 250MB

neostore.transaction.db.1 29.03MB

neostore.counts.db.a 2KB

neostore.counts.db.a 96B

neostore.propertystore.db.arrays 4MB

neostore.propertystore.db.arrays 2.16MB

auth 113B

neostore.counts.db.b 2KB

neostore.counts.db.b 832B

cypher-shell.bat 2KB

neo4j.bat 1KB

neo4j-import.bat 1023B

neo4j-shell.bat 1022B

neo4j-admin.bat 1022B

neo4j.cert 1002B

_1w1r.cfe 305B

_0.cfe 305B

_1w1r.cfs 30KB

_0.cfs 5KB

neo4j.conf 10KB

layui.css 105KB

bootstrap.min.css 98KB

camera.css 26KB

font-awesome.min.css 20KB

layer.css 15KB

layer.css 14KB

layui.mobile.css 11KB

templatemo_style.css 10KB

laydate.css 9KB

laydate.css 7KB

mobile.css 2KB

code.css 1KB

font.css 512B

neostore.propertystore.db 14.11MB

neostore.relationshipstore.db 9.45MB

neostore.schemastore.db 4MB

neostore.relationshipgroupstore.db 4MB

neostore.nodestore.db 4MB

neostore.relationshiptypestore.db 4MB

neostore.propertystore.db 4MB

neostore.labeltokenstore.db 4MB

neostore.relationshipstore.db 4MB

neostore.nodestore.db 648KB

neostore.relationshipgroupstore.db 176KB

neostore.schemastore.db 8KB

neostore.relationshiptypestore.db 8KB

neostore.labeltokenstore.db 8KB

python医疗领域用户问答的意图识别算法研究-开题报告.doc 38KB

python医疗领域用户问答的意图识别算法研究.docx 1.83MB

文档.docx 12KB

fontawesome-webfont.eot 71KB

iconfont.eot 46KB

glyphicons-halflings-regular.eot 20KB

prunsrv-amd64.exe 107KB

prunsrv-i386.exe 85KB

failure-message 16KB

59.gif 10KB

22.gif 10KB

24.gif 8KB

13.gif 7KB

16.gif 7KB

39.gif 6KB

64.gif 6KB

63.gif 6KB

50.gif 6KB

loading-0.gif 6KB

4.gif 6KB

1.gif 5KB

42.gif 5KB

71.gif 5KB

21.gif 5KB

20.gif 5KB

29.gif 5KB

70.gif 4KB

5.gif 4KB

17.gif 4KB

27.gif 4KB

9.gif 4KB

44.gif 4KB

11.gif 4KB

8.gif 4KB

3.gif 4KB

23.gif 4KB

34.gif 4KB

41.gif 4KB

38.gif 4KB

65.gif 3KB

32.gif 3KB

45.gif 3KB

7.gif 3KB

12.gif 3KB

26.gif 3KB

60.gif 3KB

2.gif 3KB

40.gif 3KB

25.gif 3KB

19.gif 3KB

66.gif 3KB

共 529 条

# BERT **\*\*\*\*\* New November 15th, 2018: SOTA SQuAD 2.0 System \*\*\*\*\*** We released code changes to reproduce our 83% F1 SQuAD 2.0 system, which is currently 1st place on the leaderboard by 3%. See the SQuAD 2.0 section of the README for details. **\*\*\*\*\* New November 5th, 2018: Third-party PyTorch and Chainer versions of BERT available \*\*\*\*\*** NLP researchers from HuggingFace made a [PyTorch version of BERT available](https://github.com/huggingface/pytorch-pretrained-BERT) which is compatible with our pre-trained checkpoints and is able to reproduce our results. Sosuke Kobayashi also made a [Chainer version of BERT available](https://github.com/soskek/bert-chainer) (Thanks!) We were not involved in the creation or maintenance of the PyTorch implementation so please direct any questions towards the authors of that repository. **\*\*\*\*\* New November 3rd, 2018: Multilingual and Chinese models available \*\*\*\*\*** We have made two new BERT models available: * **[`BERT-Base, Multilingual`](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip)**: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters * **[`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)**: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters We use character-based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out-of-the-box without any code changes. We did update the implementation of `BasicTokenizer` in `tokenization.py` to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the [Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md). **\*\*\*\*\* End new information \*\*\*\*\*** ## Introduction **BERT**, or **B**idirectional **E**ncoder **R**epresentations from **T**ransformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: [https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805). To give a few numbers, here are the results on the [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) | Test EM | Test F1 ------------------------------------- | :------: | :------: 1st Place Ensemble - BERT | **87.4** | **93.2** 2nd Place Ensemble - nlnet | 86.0 | 91.7 1st Place Single Model - BERT | **85.1** | **91.8** 2nd Place Single Model - nlnet | 83.5 | 90.1 And several natural language inference tasks: System | MultiNLI | Question NLI | SWAG ----------------------- | :------: | :----------: | :------: BERT | **86.7** | **91.1** | **86.3** OpenAI GPT (Prev. SOTA) | 82.2 | 88.1 | 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task-specific neural network architecture design. If you already know what BERT is and you just want to get started, you can [download the pre-trained models](#pre-trained-models) and [run a state-of-the-art fine-tuning](#fine-tuning-with-bert) in only a few minutes. ## What is BERT? BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first *unsupervised*, *deeply bidirectional* system for pre-training NLP. *Unsupervised* means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre-trained representations can also either be *context-free* or *contextual*, and contextual representations can further be *unidirectional* or *bidirectional*. Context-free models such as [word2vec](https://www.tensorflow.org/tutorials/representation/word2vec) or [GloVe](https://nlp.stanford.edu/projects/glove/) generate a single "word embedding" representation for each word in the vocabulary, so `bank` would have the same representation in `bank deposit` and `river bank`. Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre-training contextual representations — including [Semi-supervised Sequence Learning](https://arxiv.org/abs/1511.01432), [Generative Pre-Training](https://blog.openai.com/language-unsupervised/), [ELMo](https://allennlp.org/elmo), and [ULMFit](http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html) — but crucially these models are all *unidirectional* or *shallowly bidirectional*. This means that each word is only contextualized using the words to its left (or right). For example, in the sentence `I made a bank deposit` the unidirectional representation of `bank` is only based on `I made a` but not `deposit`. Some previous work does combine the representations from separate left-context and right-context models, but only in a "shallow" manner. BERT represents "bank" using both its left and right context — `I made a ... deposit` — starting from the very bottom of a deep neural network, so it is *deeply bidirectional*. BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional [Transformer](https://arxiv.org/abs/1706.03762) encoder, and then predict only the masked words. For example: ``` Input: the man went to the [MASK1] . he bought a [MASK2] of milk. Labels: [MASK1] = store; [MASK2] = gallon ``` In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences `A` and `B`, is `B` the actual next sentence that comes after `A`, or just a random sentence from the corpus? ``` Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence ``` ``` Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence ``` We then train a large model (12-layer to 24-layer Transformer) on a large corpus (Wikipedia + [BookCorpus](http://yknzhu.wixsite.com/mbweb)) for a long time (1M update steps), and that's BERT. Using BERT has two stages: *Pre-training* and *fine-tuning*. **Pre-training** is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one-time procedure for each language (current models are English-only, but multilingual models will be released in the near future). We are releasing a number of pre-trained models from the paper which were pre-trained at Google. Most NLP researchers will never need to pre-train their own model from scratch. **Fine-tuning** is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre-trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state-of-the-art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state-of-the-art results on sentence-level (e.g., SST-2), sentence-pair-level (e.g., MultiNLI), word-level (e.g., NER), and span-level (e.g., SQuAD) tasks with almost no task-specific modifications. ## What has been released in this repository? We are releasing the following: * TensorFlow code for the BERT model architecture (

评论收藏

内容反馈

版权申诉