【免费】深度之眼比赛总结整理资料

共2055个文件

py：1313个

yaml：175个

txt：104个

自然语言处理

python

需积分: 0 141 浏览量 2024-01-23 19:04:06 上传评论收藏 338.57MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

深度之眼比赛总结整理资料（2055个子文件）

_stats.c 922KB

glove.c 25KB

cooccur.c 19KB

shuffle.c 8KB

vocab_count.c 7KB

common.c 5KB

.DS_Store 14KB

.DS_Store 10KB

.DS_Store 8KB

.DS_Store 6KB

normalization_rule.h 6.87MB

sentencepiece_model.pb.h 185KB

unicode_script_map.h 104KB

repeated_field.h 99KB

descriptor.h 94KB

wire_format_lite.h 82KB

extension_set.h 77KB

coded_stream.h 68KB

darts.h 51KB

map.h 47KB

sentencepiece.pb.h 40KB

strutil.h 39KB

parse_context.h 32KB

map_type_handler.h 31KB

generated_message_table_driven_lite.h 31KB

map_util.h 30KB

arena.h 29KB

message_lite.h 27KB

map_entry_lite.h 24KB

string_view.h 20KB

sentencepiece_processor.h 19KB

arena_impl.h 18KB

stringpiece.h 17KB

callback.h 17KB

zero_copy_stream_impl_lite.h 17KB

arenastring.h 16KB

unknown_field_set.h 14KB

port.h 12KB

zero_copy_stream_impl.h 12KB

generated_message_table_driven.h 12KB

extension_set_inl.h 12KB

int128.h 12KB

bytestream.h 11KB

util.h 11KB

spec_parser.h 10KB

zero_copy_stream.h 10KB

generated_message_util.h 10KB

model_interface.h 9KB

logging.h 9KB

共 2055 条

# BERT **\*\*\*\*\* New November 3rd, 2018: Multilingual and Chinese models avalable \*\*\*\*\*** We have made two new BERT models available: * **[`BERT-Base, Multilingual`](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip)**: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters * **[`BERT-Base, Chinese`](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)**: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters We use character-based tokenization for Chinese, and WordPiece tokenization for all other languages. Both models should work out-of-the-box without any code changes. We did update the implementation of `BasicTokenizer` in `tokenization.py` to support Chinese character tokenization, so please update if you forked it. However, we did not change the tokenization API. For more, see the [Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md). **\*\*\*\*\* End new information \*\*\*\*\*** ## Introduction **BERT**, or **B**idirectional **E**ncoder **R**epresentations from **T**ransformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. Our academic paper which describes BERT in detail and provides full results on a number of tasks can be found here: [https://arxiv.org/abs/1810.04805](https://arxiv.org/abs/1810.04805). To give a few numbers, here are the results on the [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) question answering task: SQuAD v1.1 Leaderboard (Oct 8th 2018) | Test EM | Test F1 ------------------------------------- | :------: | :------: 1st Place Ensemble - BERT | **87.4** | **93.2** 2nd Place Ensemble - nlnet | 86.0 | 91.7 1st Place Single Model - BERT | **85.1** | **91.8** 2nd Place Single Model - nlnet | 83.5 | 90.1 And several natural language inference tasks: System | MultiNLI | Question NLI | SWAG ----------------------- | :------: | :----------: | :------: BERT | **86.7** | **91.1** | **86.3** OpenAI GPT (Prev. SOTA) | 82.2 | 88.1 | 75.0 Plus many other tasks. Moreover, these results were all obtained with almost no task-specific neural network architecture design. If you already know what BERT is and you just want to get started, you can [download the pre-trained models](#pre-trained-models) and [run a state-of-the-art fine-tuning](#fine-tuning-with-bert) in only a few minutes. ## What is BERT? BERT is method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first *unsupervised*, *deeply bidirectional* system for pre-training NLP. *Unsupervised* means that BERT was trained using only a plain text corpus, which is important because an enormous amount of plain text data is publicly available on the web in many languages. Pre-trained representations can also either be *context-free* or *contextual*, and contextual representations can further be *unidirectional* or *bidirectional*. Context-free models such as [word2vec](https://www.tensorflow.org/tutorials/representation/word2vec) or [GloVe](https://nlp.stanford.edu/projects/glove/) generate a single "word embedding" representation for each word in the vocabulary, so `bank` would have the same representation in `bank deposit` and `river bank`. Contextual models instead generate a representation of each word that is based on the other words in the sentence. BERT was built upon recent work in pre-training contextual representations — including [Semi-supervised Sequence Learning](https://arxiv.org/abs/1511.01432), [Generative Pre-Training](https://blog.openai.com/language-unsupervised/), [ELMo](https://allennlp.org/elmo), and [ULMFit](http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html) — but crucially these models are all *unidirectional* or *shallowly bidirectional*. This means that each word is only contextualized using the words to its left (or right). For example, in the sentence `I made a bank deposit` the unidirectional representation of `bank` is only based on `I made a` but not `deposit`. Some previous work does combine the representations from separate left-context and right-context models, but only in a "shallow" manner. BERT represents "bank" using both its left and right context — `I made a ... deposit` — starting from the very bottom of a deep neural network, so it is *deeply bidirectional*. BERT uses a simple approach for this: We mask out 15% of the words in the input, run the entire sequence through a deep bidirectional [Transformer](https://arxiv.org/abs/1706.03762) encoder, and then predict only the masked words. For example: ``` Input: the man went to the [MASK1] . he bought a [MASK2] of milk. Labels: [MASK1] = store; [MASK2] = gallon ``` In order to learn relationships between sentences, we also train on a simple task which can be generated from any monolingual corpus: Given two sentences `A` and `B`, is `B` the actual next sentence that comes after `A`, or just a random sentence from the corpus? ``` Sentence A: the man went to the store . Sentence B: he bought a gallon of milk . Label: IsNextSentence ``` ``` Sentence A: the man went to the store . Sentence B: penguins are flightless . Label: NotNextSentence ``` We then train a large model (12-layer to 24-layer Transformer) on a large corpus (Wikipedia + [BookCorpus](http://yknzhu.wixsite.com/mbweb)) for a long time (1M update steps), and that's BERT. Using BERT has two stages: *Pre-training* and *fine-tuning*. **Pre-training** is fairly expensive (four days on 4 to 16 Cloud TPUs), but is a one-time procedure for each language (current models are English-only, but multilingual models will be released in the near future). We are releasing a number of pre-trained models from the paper which were pre-trained at Google. Most NLP researchers will never need to pre-train their own model from scratch. **Fine-tuning** is inexpensive. All of the results in the paper can be replicated in at most 1 hour on a single Cloud TPU, or a few hours on a GPU, starting from the exact same pre-trained model. SQuAD, for example, can be trained in around 30 minutes on a single Cloud TPU to achieve a Dev F1 score of 91.0%, which is the single system state-of-the-art. The other important aspect of BERT is that it can be adapted to many types of NLP tasks very easily. In the paper, we demonstrate state-of-the-art results on sentence-level (e.g., SST-2), sentence-pair-level (e.g., MultiNLI), word-level (e.g., NER), and span-level (e.g., SQuAD) tasks with almost no task-specific modifications. ## What has been released in this repository? We are releasing the following: * TensorFlow code for the BERT model architecture (which is mostly a standard [Transformer](https://arxiv.org/abs/1706.03762) architecture). * Pre-trained checkpoints for both the lowercase and cased version of `BERT-Base` and `BERT-Large` from the paper. * TensorFlow code for push-button replication of the most important fine-tuning experiments from the paper, including SQuAD, MultiNLI, and MRPC. All of the code in this repository works out-of-the-box with CPU, GPU, and Cloud TPU. ## Pre-trained models We are releasing the `BERT-Base` and `BERT-Large` models from the paper. `Uncased` means that the text has been lowercased before WordPiece tokenization, e.g., `John Smith` becomes `john smith`. The `Uncased` model also strips out any accent markers. `Cased` means that the true case and accent markers are preserved. Typically, the `Uncased` model is better unless you know tha

评论收藏

内容反馈