ERNIE2.0.pdf_ERNIE的发展中资源-CSDN文库

需积分: 48 162 浏览量 2020-05-07 14:38:21 上传评论收藏 423KB PDF 举报

ERNIE 2.0是由百度推出的一种预训练语言表示模型框架，它的核心是使用多层Transformer结构来实现自然语言理解（NLU）。该框架的提出是为了解决现有的预训练模型在训练时主要侧重于通过简单的任务来捕捉词汇和句子的共现信息，而忽略了训练语料中除了共现信息之外的其他有价值的信息，例如命名实体、语义相近度以及话语关系等。 ERNIE 2.0的关键思想是提出了一个连续的预训练框架，通过逐渐建立预训练任务，并通过连续的多任务学习来训练预训练模型，进而捕捉训练数据中的词汇、句法和语义信息。在此基础上，百度的研究团队构建了多个任务，并训练ERNIE 2.0模型来获取这些方面的信息。实验结果表明，ERNIE 2.0模型在包括英语任务在内的GLUE基准测试以及多个中文相关任务上，性能超越了BERT和XLNet。预训练语言表示模型如ELMo、OpenAI GPT、BERT、ERNIE 1.0和XLNet，已被证明能够提高各种自然语言处理任务的性能，例如情感分类、自然语言推理、命名实体识别等。这些模型通常基于词汇和句子的共现信息来训练模型。然而，训练语料库中除了共现信息外，还存在其他有价值的词汇、句法和语义信息。ERNIE 2.0框架正是为了从训练语料库中提取这些信息而设计。 ERNIE 2.0的多任务学习方式，意味着模型会在多个预训练任务上逐步学习并优化，这些任务会逐步构建起来，并在模型训练过程中动态地调整。与以往的模型不同，ERNIE 2.0不仅仅关注词汇和句子的共现，还考虑了包括命名实体、语义相近度和话语关系等在内的更丰富的信息，旨在更全面地理解和处理自然语言。 ERNIE 2.0的创新之处在于其连续的预训练机制和对语料中不同层次信息的抽取能力。它不仅提升了自然语言理解任务的性能，而且通过多任务学习的方法，能够使得模型在学习过程中不断融入新的语言知识。此外，ERNIE 2.0还通过在多任务学习框架中引入了任务构建的策略，使得预训练过程能够捕捉到比以往更全面的语言特征。由于ERNIE 2.0在多任务学习和连续预训练方面的优势，它在处理自然语言理解任务时，能够更有效地理解和运用语言中的各种信息。对于研究者和开发者来说，ERNIE 2.0框架的开源代码和预训练模型的发布，将有助于他们更深入地研究预训练语言模型，并在自己的自然语言处理应用中加以利用。此外，ERNIE 2.0的成功也展示了一个方向，即通过不断深入挖掘语料库中的信息，可以进一步提升自然语言处理模型的性能。 ERNIE 2.0是在ERNIE 1.0的基础上进一步发展的成果。ERNIE 1.0作为百度早期的成果，在中文自然语言处理领域表现突出。而ERNIE 2.0则在ERNIE 1.0的基础上，进一步提升了模型的预训练效率和多任务学习能力。通过这种递进式的预训练框架，ERNIE 2.0能够更好地对语言进行建模，从而在多种语言理解和生成任务上展现出强大的能力。 ERNIE 2.0的出现，为自然语言处理领域提供了更为强大和全面的预训练模型，标志着语言模型在理解和生成方面的又一重大进步。通过不断学习语料库中的丰富信息，ERNIE 2.0能够更深刻地把握语言的内在规律，并将这种理解应用到各种复杂的语言处理任务中去，为未来的自然语言处理技术的发展提供了新的思路和工具。

资源推荐

资源详情

资源评论

ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang

Baidu Inc., Beijing, China

{sunyu02, wangshuohuan, tianhao, wu hua,wanghaifeng}@baidu.com

Abstract

Recently pre-trained models have achieved state-of-the-art

results in various language understanding tasks. Current pre-

training procedures usually focus on training the model with

several simple tasks to grasp the co-occurrence of words or

sentences. However, besides co-occurring information, there

exists other valuable lexical, syntactic and semantic infor-

mation in training corpora, such as named entities, semantic

closeness and discourse relations. In order to extract the lexi-

cal, syntactic and semantic information from training corpora,

we propose a continual pre-training framework named ERNIE

2.0 which incrementally builds pre-training tasks and then

learn pre-trained models on these constructed tasks via contin-

ual multi-task learning. Based on this framework, we construct

several tasks and train the ERNIE 2.0 model to capture lexical,

syntactic and semantic aspects of information in the train-

ing data. Experimental results demonstrate that ERNIE 2.0

model outperforms BERT and XLNet on 16 tasks including

English tasks on GLUE benchmarks and several similar tasks

in Chinese. The source codes and pre-trained models have

been released at https://github.com/PaddlePaddle/ERNIE.

Introduction

Pre-trained language representations such as ELMo(Peters et

2018), OpenAI GPT(Radford et al

2018), BERT (Devlin

et al

2018), ERNIE 1.0 (Sun et al

2019)

and XLNet(Yang

et al

2019) have been proven to be effective for improving

the performances of various natural language understanding

tasks including sentiment classiﬁcation (Socher et al

2013),

natural language inference (Bowman et al

2015), named

entity recognition (Sang and De Meulder 2003) and so on.

Generally the pre-training of models often train the model

based on the co-occurrence of words and sentences. While

in fact, there are other lexical, syntactic and semantic infor-

mation worth examining in training corpora other than co-

occurrence. For example, named entities like person names,

location names, and organization names, may contain con-

ceptual information. Information like sentence order and



2020, Association for the Advancement of Artiﬁcial

In order to distinguish ERNIE 2.0 framework and the ERNIE

model, the latter is referred to as ERNIE 1.0.(Sun et al. 2019)

sentence proximity enables the models to learn structure-

aware representations. And semantic similarity at the doc-

ument level or discourse relations among sentences allow

the models to learn semantic-aware representations. In or-

der to discover all valuable information in training corpora,

be it lexical, syntactic or semantic representations, we pro-

pose a continual pre-training framework named ERNIE 2.0

which could incrementally build and train a large variety of

pre-training tasks through continual multi-task learning.

Our ERNIE framework supports the introduction of vari-

ous customized tasks continually, which is realized through

continual multi-task learning. When given one or more new

tasks, the continual multi-task learning method simultane-

ously trains the newly-introduced tasks together with the

original tasks in an efﬁcient way, without forgetting previ-

ously learned knowledge. In this way, our framework can

incrementally train the distributed representations based on

the previously trained parameters that it grasped. Moreover,

in this framework, all the tasks share the same encoding net-

works, thus making the encoding of lexical, syntactic and

semantic information across different tasks possible.

In summary, our contributions are as follows:

•

We propose a continual pre-training framework ERNIE

2.0, which efﬁciently supports customized training tasks

and continual multi-task learning in an incremental way.

•

We construct three kinds of unsupervised language pro-

cessing tasks to verify the effectiveness of the proposed

framework. Experimental results demonstrate that ERNIE

2.0 achieves signiﬁcant improvements over BERT and XL-

Net on 16 tasks including English GLUE benchmarks and

several Chinese tasks.

•

Our ﬁne-tuning code of ERNIE 2.0 and models pre-trained

on English corpora are available at https://github.com/

PaddlePaddle/ERNIE.

Related Work

Unsupervised Learning for Language

Representation

It is effective to learn general language representation by

pre-training a language model with a large amount of unan-

arXiv:1907.12412v2 [cs.CL] 21 Nov 2019

Sequential Multi-task learning

InferenceFine-tuning

Natural Language Inference

Text Similarity

Question Answering

Sentiment Analysis

Continual Pre-Training

Specific Tasks

ERNIE

Model

Pre-training Tasks Construction

Big Data Prior Knowledge

……

Sequentially Sequentially

Task 1

Task 2

Task 1 Task 2 Task 3

……

Task 1 Task 2 Task 3

……

Task n

Task 1

Task 2

Task n

……

Task 1

Task 2

Task 3

Task n

Figure 1: The framework of ERNIE 2.0, where the pre-training tasks can be incrementally constructed, the models are pre-trained

through continual multi-task learning, and the pre-trained model is ﬁne-tuned to adapt to various language understanding tasks.

notated data. Traditional methods usually focus on context-

independent word embedding. Methods such as Word2Vec

(Mikolov et al

2013) and GloVe (Pennington, Socher, and

Manning 2014) learn ﬁxed word embeddings based on word

co-occurring information on large corpora.

Recently, several studies centered on contextualized lan-

guage representations have been proposed and context-

dependent language representations have shown state-of-

the-art results in various natural language processing tasks.

ELMo (Peters et al

2018) proposes to extract context-

sensitive features from a language model. OpenAI GPT (Rad-

ford et al

2018) enhances the context-sensitive embedding

by adjusting the Transformer (Vaswani et al

2017). BERT

(Devlin et al

2018), however, adopts a masked language

model while adding a next sentence prediction task into the

pre-training. XLM (Lample and Conneau 2019) integrates

two methods to learn cross-lingual language models, namely

the unsupervised method that relies only on monolingual

data and the supervised method that leverages parallel bilin-

gual data. MT-DNN (Liu et al

2019) achieves a better result

through learning several supervised tasks in GLUE(Wang

et al

2018) together based on the pre-trained model, which

eventually leads to improvements on other supervised tasks

that are not learned in the stage of multi-task supervised

ﬁne-tuning. XLNet (Yang et al

2019) uses Transformer-XL

(Dai et al

2019) and proposes a generalized autoregressive

pre-training method that learns bidirectional contexts by max-

imizing the expected likelihood over all permutations of the

factorization order.

Continual Learning

Continual learning(Parisi et al

2019; Chen and Liu 2018)

aims to train the model with several tasks in sequence so that

it remembers the previously-learned tasks when learning the

new ones. These methods are inspired by the learning process

of humans, as humans are capable of continuously accumu-

Task 1

Task 2

Task 1

Task 2 Task 3

……

Task 1

Task 2 Task 3

……

Task n

Sequential Multi-task Learning

……

Multi-task Learning Continual Learning

Task 1

Task 2

Task 3

Task n

Task 1

Task 2

Task 3

Task n

Figure 2: The different methods of continual pre-training.

lating the information acquired by study or experience to

efﬁciently develop new skills. With continual learning, the

model should be able to performs well on new tasks thanks

to the knowledge acquired during previous training.

The ERNIE 2.0 Framework

As shown in Figure 1, the ERNIE 2.0 framework is built

based on an widely-used architecture of pre-training and ﬁne-

tuning. ERNIE 2.0 differs from the previous pre-training ones

in that, instead of training with a small number of pre-training

objectives, it could constantly introduce a large variety of pre-

training tasks to help the model efﬁciently learn the lexical,

syntactic and semantic representations. Based on this, ERNIE

2.0 framework keeps updating the pre-trained model with

continual multi-task learning. During ﬁne-tuning, the ERNIE

model is ﬁrst initialized with the pre-trained parameters, and

would be later ﬁne-tuned using data from speciﬁc tasks.

Continual Pre-training

The process of continual pre-training contains two steps.

Firstly, We continually construct unsupervised pre-training

tasks with big data and prior knowledge involved. Secondly,

We incrementally update the ERNIE model via continual

multi-task learning.

Pre-training Tasks Construction

We can construct differ-

ent kinds of tasks at each time, including word-aware tasks,

剩余7页未读，继续阅读

评论收藏

内容反馈

lizzy05

粉丝: 77
资源: 21

ERNIE2.0.pdf

ERNIE 2.0 是基于持续学习的语义理解预训练框架，使用多任务学习增量式构建预训练任务-python

ERNIE：Enhanced Language Representation with Informative Entities.pdf

ERNIE-Pytorch:ERNIE Pytorch版本

ernie-hoopstreet

LLM原理与ChatPDF实现.pdf

paddlehub的ernie模型介绍.zip

Python-ERNIE20是基于持续学习的语义理解预训练框架使用多任务学习增量式构建预训练任务

PyPI 官网下载 | keras-ernie-1.0.5.tar.gz

AICC2019人工智能计算大会资料

ChatGPT系列-百度文心一言解读20230315.pdf

Ernie.rar_摇奖机

1-1知识增强图语义理解技术.pdf

ChatGPT系列—百度文心一言解读20230315.pdf

GLM-130B v1.pdf

ERNIE_Pytorch_Version_ERNIE-Pytorch.zip

ChatGPT- 百度文心一言畅想.pdf

AI画作进阶.pdf

An_Implementation_of_ERNIE_For_Language_Understand_ERNIE.zip

Bert-Chinese-Text-Classification-Pytorch:使用Bert，ERNIE，进行中文文本分类

LARGE LANGUAGE MODEL HIGHLIGHTS (MAR:2024) .pdf

基于深度学习的中医典籍命名实体识别研究.pdf

大模型综述来了！一文带你理清全球AI巨头的大模型进化史.pdf

36氪采用AIGC技术增强内容.pdf

浙商证券ChatGPT研究框架.pdf

ChatGpt 原理分析.pdf

Python3实现写作.pdf

预训练语言模型的应用综述.pdf

5-8+百度事件图谱技术与应用.pdf

2024生成式人工智能(GenAI)在生物医药大健康行业应用进展报告.pdf

吉他制造商奏起“Linux乐章”.pdf

最新资源