没有合适的资源?快使用搜索试试~ 我知道了~
读论文Rethinking the Role of Demonstrations What Makes In-Context
需积分: 0 1 下载量 133 浏览量
2024-03-09
13:28:25
上传
评论
收藏 9.51MB PDF 举报
温馨提示
试读
30页
【读论文】Rethinking the Role of Demonstrations What Makes In-Context Learning Work
资源推荐
资源详情
资源评论
"In-context Learning"(上下文学习)
是一种机器学习技术,旨在利用上下文信息来提高模型的学习效果。在这种方
法中,模型通过与特定环境或场景进行交互,从中获取信息并改进自身的表
现。这种学习方式尤其适用于自然语言处理等任务,因为语言通常是在特定的
语境中产生和理解的。通过在具体的上下文中进行学习,模型可以更好地理解
语言的含义和语境,并做出更准确的预测或响应。
《Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?》
作者是Sewon Min、Xinxi Lyu、Ari Holtzman、Mikel Artetxe、Hannaneh Hajishirzi和Luke Zettlemoyer
他们来自华盛顿大学、Meta AI和艾伦人工智能研究所。
论文主要探讨了大型语言模型(LMs)在上下文中学习(in-context learning)的机制,即如何通过少量的输入-标
签对(示范)来进行新任务的推断,并预测新输入的结果。
论文的核心观点和发现包括:
1. 上下文学习的有效性:大型语言模型能够通过上下文学习来执行新任务,即使只有少量的示范。这种学习方式
在多种任务上的表现优于零样本(zero-shot)推断。
2. 示范的真实性:作者发现,示范中的真实标签(ground truth labels)并不是必需的。即使在示范中随机替换标
签,对模型在分类和多选任务上的表现影响很小。这表明模型并不依赖于示范中的输入-标签映射来执行任务。
3. 示范的关键因素:论文进一步分析了示范中哪些方面对任务性能有贡献。研究发现,示范提供的少量示例对于
理解标签空间、输入文本的分布以及序列的整体格式是关键的。这些因素共同作用,使得上下文学习得以有效进
行。
4. 元训练的影响:通过元训练(meta-training)目标训练的模型(如MetaICL)在处理示范时,更倾向于利用示范
的格式而不是输入-标签映射。这表明元训练可能使模型更加专注于利用示范的简单方面。
5. 实验设置:作者在12种不同的模型上进行了实验,包括GPT-3家族,以及在26个数据集上评估了模型的性能。
这些数据集涵盖了情感分析、释义检测、自然语言推理、仇恨言论检测、问答和句子完成等任务。
6. 讨论与结论:论文讨论了模型在测试时的学习、语言模型的容量、与指令遵循模型的联系以及显著提高零样本
性能的可能性。作者还指出了研究的局限性,例如任务类型和数据集的影响,以及将实验扩展到生成任务的挑
战。
总的来说,这篇论文提供了对大型语言模型在上下文学习中如何工作的新理解,并提出了未来研究的方向,特别
是在如何更好地利用这些模型进行零样本学习方面。
【腾讯文档】
1 Rethinking_the_Role_of_Demonstrations-_What_Makes_In-Context_Learning_Work_.pdf
https://docs.qq.com/pdf/DVmpzQnZlelFvYWhS
Rethinking the Role of Demonstrations:
What Makes In-Context Learning Work?
Sewon Min
1,2
Xinxi Lyu
1
Ari Holtzman
1
Mikel Artetxe
2
Mike Lewis
2
Hannaneh Hajishirzi
1,3
Luke Zettlemoyer
1,2
1
University of Washington
2
Meta AI
3
Allen Institute for AI
{sewon,alrope,ahai,hannaneh,lsz}@cs.washington.edu
{artetxe,mikelewis}@meta.com
Abstract
Large language models (LMs) are able to in-
context learn—perform a new task via infer-
ence alone by conditioning on a few input-
label pairs (demonstrations) and making pre-
dictions for new inputs. However, there has
been little understanding of how the model
learns and which aspects of the demonstra-
tions contribute to end task performance. In
this paper, we show that ground truth demon-
strations are in fact not required—randomly
replacing labels in the demonstrations barely
hurts performance on a range of classification
and multi-choce tasks, consistently over 12 dif-
ferent models including GPT-3. Instead, we
find that other aspects of the demonstrations
are the key drivers of end task performance, in-
cluding the fact that they provide a few exam-
ples of (1) the label space, (2) the distribution
of the input text, and (3) the overall format of
the sequence. Together, our analysis provides
a new way of understanding how and why
in-context learning works, while opening up
new questions about how much can be learned
from large language models through inference
alone.
1 Introduction
Large language models (LMs) have shown impres-
sive performance on downstream tasks by simply
conditioning on a few input-label pairs (demonstra-
tions); this type of inference has been referred to as
in-context learning (Brown et al., 2020). Despite in-
context learning consistently outperforming zero-
shot inference on a wide range of tasks (Zhao et al.,
2021; Liu et al., 2021), there is little understanding
of how it works and which aspects of the demon-
strations contribute to end task performance.
In this paper, we show that ground truth demon-
strations are in fact not required for effective in-
context learning (Section 4). Specifically, replac-
ing the labels in demonstrations with random labels
barely hurts performance in a range of classifica-
tion and multi-choice tasks (Figure 1). The result
Figure 1: Results in classification (top) and multi-
choice tasks (bottom), using three LMs with varying
size. Reported on six datasets on which GPT-3 is eval-
uated; the channel method is used. See Section 4 for
the full results. In-context learning performance drops
only marginally when labels in the demonstrations are
replaced by random labels.
is consistent over 12 different models including the
GPT-3 family (Radford et al., 2019; Min et al.,
2021b; Wang and Komatsuzaki, 2021; Artetxe
et al., 2021; Brown et al., 2020). This strongly
suggests, counter-intuitively, that the model does
not rely on the input-label mapping in the demon-
strations to perform the task.
Further analysis investigates which parts of
demonstrations actually do contribute to the perfor-
mance. We identify possible aspects of demonstra-
tions (e.g., the label space and the distribution of
the input text) and evaluate a series of variants of
the demonstrations to quantify the impact of each
(Section 5). We find that: (1) the label space and
the distribution of the input text specified by the
demonstrations are both key to in-context learn-
ing (regardless of whether the labels are correct
for individual inputs); (2) specifying the overall
format is also crucial, e.g., when the label space
is unknown, using random English words as la-
bels is significantly better than using no labels; and
arXiv:2202.12837v2 [cs.CL] 20 Oct 2022
摘要:
大型语言模型(LMs)能够通过上下文学习——仅
通过条件化在少量输入-标签对(示范)上,通过
推断来执行新任务。然而,对于模型如何学习以及
示范的哪些方面对最终任务性能有贡献,我们的理
解还很有限。在本文中,我们展示了实际上并不需
要真实的示范——在示范中随机替换标签几乎不会
影响在一系列分类和多选任务上的性能,这一结果
在包括GPT-3在内的12种不同模型上始终如一。相
反,我们发现示范的其他方面是影响最终任务性能
的关键驱动因素,包括它们提供了关于(1)标签
空间的少量示例,(2)输入文本的分布,以及(3)
序列的整体格式。我们的分析为理解上下文学习如
何以及为什么有效提供了一种新的方式,同时也提
出了关于通过单独推断大型语言模型可以学到多少
的新问题。
Rethinking the Role of Demonstrations:
What Makes In-Context Learning Work?
Sewon Min
1,2
Xinxi Lyu
1
Ari Holtzman
1
Mikel Artetxe
2
Mike Lewis
2
Hannaneh Hajishirzi
1,3
Luke Zettlemoyer
1,2
1
University of Washington
2
Meta AI
3
Allen Institute for AI
{sewon,alrope,ahai,hannaneh,lsz}@cs.washington.edu
{artetxe,mikelewis}@meta.com
Abstract
Large language models (LMs) are able to in-
context learn—perform a new task via infer-
ence alone by conditioning on a few input-
label pairs (demonstrations) and making pre-
dictions for new inputs. However, there has
been little understanding of how the model
learns and which aspects of the demonstra-
tions contribute to end task performance. In
this paper, we show that ground truth demon-
strations are in fact not required—randomly
replacing labels in the demonstrations barely
hurts performance on a range of classification
and multi-choce tasks, consistently over 12 dif-
ferent models including GPT-3. Instead, we
find that other aspects of the demonstrations
are the key drivers of end task performance, in-
cluding the fact that they provide a few exam-
ples of (1) the label space, (2) the distribution
of the input text, and (3) the overall format of
the sequence. Together, our analysis provides
a new way of understanding how and why
in-context learning works, while opening up
new questions about how much can be learned
from large language models through inference
alone.
1 Introduction
Large language models (LMs) have shown impres-
sive performance on downstream tasks by simply
conditioning on a few input-label pairs (demonstra-
tions); this type of inference has been referred to as
in-context learning (Brown et al., 2020). Despite in-
context learning consistently outperforming zero-
shot inference on a wide range of tasks (Zhao et al.,
2021; Liu et al., 2021), there is little understanding
of how it works and which aspects of the demon-
strations contribute to end task performance.
In this paper, we show that ground truth demon-
strations are in fact not required for effective in-
context learning (Section 4). Specifically, replac-
ing the labels in demonstrations with random labels
barely hurts performance in a range of classifica-
tion and multi-choice tasks (Figure 1). The result
Figure 1: Results in classification (top) and multi-
choice tasks (bottom), using three LMs with varying
size. Reported on six datasets on which GPT-3 is eval-
uated; the channel method is used. See Section 4 for
the full results. In-context learning performance drops
only marginally when labels in the demonstrations are
replaced by random labels.
is consistent over 12 different models including the
GPT-3 family (Radford et al., 2019; Min et al.,
2021b; Wang and Komatsuzaki, 2021; Artetxe
et al., 2021; Brown et al., 2020). This strongly
suggests, counter-intuitively, that the model does
not rely on the input-label mapping in the demon-
strations to perform the task.
Further analysis investigates which parts of
demonstrations actually do contribute to the perfor-
mance. We identify possible aspects of demonstra-
tions (e.g., the label space and the distribution of
the input text) and evaluate a series of variants of
the demonstrations to quantify the impact of each
(Section 5). We find that: (1) the label space and
the distribution of the input text specified by the
demonstrations are both key to in-context learn-
ing (regardless of whether the labels are correct
for individual inputs); (2) specifying the overall
format is also crucial, e.g., when the label space
is unknown, using random English words as la-
bels is significantly better than using no labels; and
arXiv:2202.12837v2 [cs.CL] 20 Oct 2022
剩余29页未读,继续阅读
资源评论
计算机视觉-杨帆
- 粉丝: 1512
- 资源: 43
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功