NLP：ImprovingLanguageUnderstandingbyGenerativePre-Training资源-CSDN文库

版权申诉

5星 · 超过95%的资源 105 浏览量 2022-04-21 15:00:04 上传评论收藏 528KB PDF 举报

Improving Language Understanding by Generative Pre-Training 提出了半监督的方式来做语言理解，也就是无监督的pre-train，和有监督的fine-tune。该方法首先无监督的pre−trainpre-trainpre−train模型，学习到更加普遍、更适用的表征，然后模型以很小的微调迁移到众多特定的有监督学习任务上。在实验效果上，大幅超过了众多任务的state-of-art。不同于以无监督的方式学习到一些特征，然后利用这些特征喂给一些特定的有监督模型，这里是先无监督的pre−trainpre-trainpre−train模型，然后直接fine-tune预训练后的模型，迁移到一些特定的有监督任务上《通过生成式预训练提升语言理解》这篇论文探讨了如何使用半监督学习方法来改进自然语言处理（NLP）中的语言理解能力。这种方法的核心在于无监督的预训练（generative pre-training）和有监督的微调（fine-tuning）。由于大规模的无标注文本数据丰富，但针对特定任务的标注数据稀缺，因此直接用少量标注数据训练判别式模型往往表现不佳。研究者们证明，通过在大量无标注文本上预训练语言模型，然后对每个具体任务进行微调，可以在各种自然语言理解任务中实现显著的性能提升。论文中提出的方法与以往不同，它利用任务感知的输入转换（task-aware input transformations）在微调阶段进行有效迁移，同时几乎不改变模型架构。这一创新使得模型能适应各种不同的任务，而不必为每个任务设计特定的架构。实验结果显示，这种通用的、任务无关的模型在多个自然语言理解基准测试中超越了专门为每个任务设计的判别式模型，在所研究的12个任务中有9个取得了显著的性能提升。例如，在常识推理（Stories Cloze Test）中提升了8.9%，在问答任务（RACE）中提升了5.7%，在文本蕴含（MultiNLI）中提升了1.5%。自然语言理解的挑战在于其多样性，包括文本蕴含、问题回答、语义相似度评估和文档分类等。尽管无标注数据丰富，但获取针对特定任务的标注数据往往既费时又昂贵。因此，通过无监督学习从大量未标注文本中提取普适性更强的表示，然后用这些表示来微调模型以适应有监督任务，成为了减轻对监督学习依赖的一种有效策略。生成式预训练语言模型的学习过程分为两个阶段。模型在大量无标注文本上进行预训练，学习语言的内在规律和模式，生成高质量的文本。这个阶段的目的是让模型捕获语言的普遍结构和语义信息。然后，在预训练模型的基础上，针对每个具体的自然语言理解任务进行有监督的微调，调整模型参数使其能够精确地完成特定任务。由于使用了任务感知的输入转换，模型可以快速适应新的任务，而无需大幅度修改模型结构。这项工作的重要性在于，它不仅提高了自然语言处理的性能，还为处理资源稀缺领域的问题提供了新的思路。通过减少对大量标注数据的依赖，生成式预训练模型为解决现实世界中的NLP问题开辟了新的路径，对于推动人工智能的进步，尤其是自然语言理解和生成领域的发展具有深远的影响。

资源推荐

资源详情

资源评论

Improving Language Understanding

by Generative Pre-Training

Alec Radford

OpenAI

alec@openai.com

Karthik Narasimhan

OpenAI

karthikn@openai.com

Tim Salimans

OpenAI

tim@openai.com

Ilya Sutskever

OpenAI

ilyasu@openai.com

Abstract

Natural language understanding comprises a wide range of diverse tasks such

as textual entailment, question answering, semantic similarity assessment, and

document classiﬁcation. Although large unlabeled text corpora are abundant,

labeled data for learning these speciﬁc tasks is scarce, making it challenging for

discriminatively trained models to perform adequately. We demonstrate that large

gains on these tasks can be realized by generative pre-training of a language model

on a diverse corpus of unlabeled text, followed by discriminative ﬁne-tuning on each

speciﬁc task. In contrast to previous approaches, we make use of task-aware input

transformations during ﬁne-tuning to achieve effective transfer while requiring

minimal changes to the model architecture. We demonstrate the effectiveness of

our approach on a wide range of benchmarks for natural language understanding.

Our general task-agnostic model outperforms discriminatively trained models that

use architectures speciﬁcally crafted for each task, signiﬁcantly improving upon the

state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute

improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on

question answering (RACE), and 1.5% on textual entailment (MultiNLI).

1 Introduction

The ability to learn effectively from raw text is crucial to alleviating the dependence on supervised

learning in natural language processing (NLP). Most deep learning methods require substantial

amounts of manually labeled data, which restricts their applicability in many domains that suffer

from a dearth of annotated resources [

]. In these situations, models that can leverage linguistic

information from unlabeled data provide a valuable alternative to gathering more annotation, which

can be time-consuming and expensive. Further, even in cases where considerable supervision

is available, learning good representations in an unsupervised fashion can provide a signiﬁcant

performance boost. The most compelling evidence for this so far has been the extensive use of pre-

trained word embeddings [

] to improve performance on a range of NLP tasks [

Leveraging more than word-level information from unlabeled text, however, is challenging for two

main reasons. First, it is unclear what type of optimization objectives are most effective at learning

text representations that are useful for transfer. Recent research has looked at various objectives

such as language modeling [

], machine translation [

], and discourse coherence [

], with each

method outperforming the others on different tasks.

Second, there is no consensus on the most

effective way to transfer these learned representations to the target task. Existing techniques involve

a combination of making task-speciﬁc changes to the model architecture [

], using intricate

learning schemes [

] and adding auxiliary learning objectives [

]. These uncertainties have made

it difﬁcult to develop effective semi-supervised learning approaches for language processing.

https://gluebenchmark.com/leaderboard

Preprint. Work in progress.

In this paper, we explore a semi-supervised approach for language understanding tasks using a

combination of unsupervised pre-training and supervised ﬁne-tuning. Our goal is to learn a universal

representation that transfers with little adaptation to a wide range of tasks. We assume access to

a large corpus of unlabeled text and several datasets with manually annotated training examples

(target tasks). Our setup does not require these target tasks to be in the same domain as the unlabeled

corpus. We employ a two-stage training procedure. First, we use a language modeling objective on

the unlabeled data to learn the initial parameters of a neural network model. Subsequently, we adapt

these parameters to a target task using the corresponding supervised objective.

For our model architecture, we use the Transformer [

], which has been shown to perform strongly on

various tasks such as machine translation [

], document generation [

], and syntactic parsing [

This model choice provides us with a more structured memory for handling long-term dependencies in

text, compared to alternatives like recurrent networks, resulting in robust transfer performance across

diverse tasks. During transfer, we utilize task-speciﬁc input adaptations derived from traversal-style

approaches [

], which process structured text input as a single contiguous sequence of tokens. As

we demonstrate in our experiments, these adaptations enable us to ﬁne-tune effectively with minimal

changes to the architecture of the pre-trained model.

We evaluate our approach on four types of language understanding tasks – natural language inference,

question answering, semantic similarity, and text classiﬁcation. Our general task-agnostic model

outperforms discriminatively trained models that employ architectures speciﬁcally crafted for each

task, signiﬁcantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance,

we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test) [

5.7% on question answering (RACE) [

], 1.5% on textual entailment (MultiNLI) [

] and 5.5% on

the recently introduced GLUE multi-task benchmark [

]. We also analyzed zero-shot behaviors

of the pre-trained model on four different settings and demonstrate that it acquires useful linguistic

knowledge for downstream tasks.

2 Related Work

Semi-supervised learning for NLP

Our work broadly falls under the category of semi-supervised

learning for natural language. This paradigm has attracted signiﬁcant interest, with applications to

tasks like sequence labeling [

] or text classiﬁcation [

]. The earliest approaches used

unlabeled data to compute word-level or phrase-level statistics, which were then used as features in a

supervised model [33]. Over the last few years, researchers have demonstrated the beneﬁts of using

word embeddings [

], which are trained on unlabeled corpora, to improve performance on a

variety of tasks [

]. These approaches, however, mainly transfer word-level information,

whereas we aim to capture higher-level semantics.

Recent approaches have investigated learning and utilizing more than word-level semantics from

unlabeled data. Phrase-level or sentence-level embeddings, which can be trained using an unlabeled

corpus, have been used to encode text into suitable vector representations for various target tasks [

32, 1, 36, 22, 12, 56, 31].

Unsupervised pre-training

Unsupervised pre-training is a special case of semi-supervised learning

where the goal is to ﬁnd a good initialization point instead of modifying the supervised learning

objective. Early works explored the use of the technique in image classiﬁcation [

] and

regression tasks [

]. Subsequent research [

] demonstrated that pre-training acts as a regularization

scheme, enabling better generalization in deep neural networks. In recent work, the method has

been used to help train deep neural networks on various tasks like image classiﬁcation [

], speech

recognition [68], entity disambiguation [17] and machine translation [48].

The closest line of work to ours involves pre-training a neural network using a language modeling

objective and then ﬁne-tuning it on a target task with supervision. Dai et al. [

] and Howard and

Ruder [

] follow this method to improve text classiﬁcation. However, although the pre-training

phase helps capture some linguistic information, their usage of LSTM models restricts their prediction

ability to a short range. In contrast, our choice of transformer networks allows us to capture longer-

range linguistic structure, as demonstrated in our experiments. Further, we also demonstrate the

effectiveness of our model on a wider range of tasks including natural language inference, paraphrase

detection and story completion. Other approaches [

] use hidden representations from a

剩余11页未读，继续阅读

评论收藏

内容反馈

版权申诉

孙继先

2023-02-27

支持这个资源，内容详细，主要是能解决当下的问题，感谢大佬分享~
shawokyi

2022-08-27

感谢大佬分享的资源给了我灵感，果断支持！感谢分享~
qq_41782126

2024-01-11

这个资源值得下载，资源内容详细全面，与描述一致，受益匪浅。

方案互联

粉丝: 18
资源: 926

NLP：Improving Language Understanding by Generative Pre-Training

最新资源

NLP：Improving Language Understanding by Generative Pre-Training

nlp

Improving Language Understanding by Generative Pre-Training

ChatGPT的原理分析

三分钟看懂ChatGPT-ChatGPT是什么pdf

体验了刚发布的 GPT-4 之后，聊聊 GPT 为何成为了火遍全网的 AI 模型

人工智能 NLP GPT论文阅读

Scikit-Learn风格的NLP微调(Fine Tuning)模块-python

NLP自然语言处理10篇论文.zip

可能是目前效果最好的开源生成式聊天机器人项目—–深入理解“用于中文闲聊的GPT2模型”项目

stable-diffusion部署需要的包

大规模语言模型：从理论到实践

人工智能大模型介绍.pptx

Notepad++ 8.5.6最新版 64位安装包

diabetes糖尿病数据集

21个免费无限制免登录chatgpt资源， OpenAI GPT-4\3.5 模型的智能对话链接

libomp140.x86-64.dll

ChatGPT智能AI机器人微信小程序源码-带部署教程

transformer代码

线性代数-同济大学第七版

Matlab深度学习工具箱

最新AI软件系统源码+支持AI绘画(Midjourney)+文档分析+识图理解+电脑PC端+手机端H5+微信公众号对接

LM Studio windows版本安装

基于Qwen2.5-7B-Instruct的大模型微调实战指南

Matlab中安装MinGW电脑环境配置工具configuremingw

Python调用豆包大模型API及文本转语音TTS

一本关于ChatGPT的书《ChatGPT 革命：了解大型语言模型的力量》中文版

Speech Wav Resource

基于PyTorch实现的词向量模型

Build a Large Language Model (From Scratch)

最新资源