谷歌PaLM2技术报告Palm2TechReport.pdf_palm2详解资源-CSDN文库

5星 · 超过95%的资源需积分: 1 169 浏览量 2023-05-11 09:03:36 上传评论 1 收藏 4.9MB PDF 举报

PaLM 2是谷歌推出的一款最新先进的语言模型，相较于其前身PaLM（Chowdhery等人，2022年）在多语种理解和推理能力上有了显著提升，并且计算效率更高。这款基于Transformer的模型采用了与UL2（Tay等人，2023年）相似的混合目标进行训练。通过在英语和多语种语言任务以及推理任务上的大量评估，PaLM 2在不同规模的下游任务中表现出显著的质量提升，同时在推理速度和效率方面优于PaLM。这使得PaLM 2能够更广泛地部署，并且能够更快响应，实现更自然的交互节奏。在推理能力方面，PaLM 2相比PaLM在BIG-Bench和其他推理任务上表现出大幅改进，展示了强大的推理功能。在负责任的人工智能评估套件中，PaLM 2表现出稳定的表现，并允许在推理时对毒性进行控制，而无需额外开销或影响其他功能。PaLM 2在各种任务和能力上都达到了最先进的性能水平。报告内容详细展开如下： 1. **介绍**：这部分可能涵盖了PaLM 2的开发背景、设计目标和主要创新点，包括模型架构的改进和新训练方法的应用。 2. **规模律实验**：这部分内容可能涉及了模型规模与性能之间的关系研究，即随着模型参数量增加，模型的性能如何变化。通常会包括对不同规模模型的训练和比较。 3. **训练数据集**：PaLM 2的训练数据集可能包含大量多语言和多领域的文本，以确保模型具有广泛的通用性和理解能力。 4. **评估**：评估部分详述了PaLM 2在各种任务上的表现，包括： - **语言熟练度考试**：检查模型理解和生成自然语言的能力。 - **分类和问答**：测试模型在信息检索、文本分类等任务中的准确性和效率。 - **推理**：评估模型在解决逻辑问题、进行复杂思考等方面的能力。 - **编程**：验证模型能否理解和生成代码，可能包括错误检测和修复。 - **翻译**：测试模型的多语言翻译质量，包括翻译准确度和流畅度。通过这些评估，PaLM 2证明了其在语言理解、推理、编码和跨语言转换等多个方面的优秀性能，同时强调了其在资源利用和响应速度上的优势，使其成为更高效、更全面的语言模型。这些进步对于AI助手、聊天机器人、自动代码生成、机器翻译和多语言信息检索等应用场景具有重要意义。

资源推荐

资源详情

资源评论

PaLM 2 Technical Report

Google *

Abstract

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities

and is more compute-efﬁcient than its predecessor PaLM (Chowdhery et al., 2022). PaLM 2 is a Transformer-based

model trained using a mixture of objectives similar to UL2 (Tay et al., 2023). Through extensive evaluations on English

and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has signiﬁcantly improved quality on

downstream tasks across different model sizes, while simultaneously exhibiting faster and more efﬁcient inference

compared to PaLM. This improved efﬁciency enables broader deployment while also allowing the model to respond

faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exempliﬁed by large

improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of

responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on

other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

See authorship section for a list of authors.

Contents

1 Introduction 3

2 Scaling law experiments 7

2.1 Scaling laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Downstream metric evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Training dataset 9

4 Evaluation 10

4.1 Language proﬁciency exams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Classiﬁcation and question answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.3 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.4 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.5 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.6 Natural language generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.7 Memorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Responsible usage 23

5.1 Inference-time control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.2 Recommendations for developers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Conclusion 27

A Detailed results 42

A.1 Scaling laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

A.2 Instruction tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

A.3 Multilingual commonsense reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.4 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

A.5 Natural language generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

B Examples of model capabilities 44

B.1 Multilinguality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

B.2 Creative generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

B.3 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

C Language proﬁciency exams 61

D Dataset language composition 61

E Responsible AI 62

E.1 Dataset analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

E.2 Evaluation approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

E.3 Dialog uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

E.4 Classiﬁcation uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

E.5 Translation uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

E.6 Question answering uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

E.7 Language modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

E.8 Measurement quality rubrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

E.9 CrowdWorksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

E.10 Model Card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

1 Introduction

Language modeling has long been an important research area since Shannon (1951) estimated the information in

language with next word prediction. Modeling began with

-gram based approaches (Kneser & Ney, 1995) but rapidly

advanced with LSTMs (Hochreiter & Schmidhuber, 1997; Graves, 2014). Later work showed that language modelling

also led to language understanding (Dai & Le, 2015). With increased scale and the Transformer architecture (Vaswani

et al., 2017), large language models (LLMs) have shown strong performance in language understanding and generation

capabilities over the last few years, leading to breakthrough performance in reasoning, math, science, and language tasks

(Howard & Ruder, 2018; Brown et al., 2020; Du et al., 2022; Chowdhery et al., 2022; Rae et al., 2021; Lewkowycz

et al., 2022; Tay et al., 2023; OpenAI, 2023b). Key factors in these advances have been scaling up model size (Brown

et al., 2020; Rae et al., 2021) and the amount of data (Hoffmann et al., 2022). To date, most LLMs follow a standard

recipe of mostly monolingual corpora with a language modeling objective.

We introduce PaLM 2, the successor to PaLM (Chowdhery et al., 2022), a language model unifying modeling advances,

data improvements, and scaling insights. PaLM 2 incorporates the following diverse set of research advances:

• Compute-optimal scaling

: Recently, compute-optimal scaling (Hoffmann et al., 2022) showed that data size is

at least as important as model size. We validate this study for larger amounts of compute and similarly ﬁnd that

data and model size should be scaled roughly 1:1 to achieve the best performance for a given amount of training

compute (as opposed to past trends, which scaled the model 3× faster than the dataset).

• Improved dataset mixtures

: Previous large pre-trained language models typically used a dataset dominated

by English text (e.g.,

∼

78% of non-code in Chowdhery et al. (2022)). We designed a more multilingual and

diverse pre-training mixture, which extends across hundreds of languages and domains (e.g., programming

languages, mathematics, and parallel multilingual documents). We show that larger models can handle more

disparate non-English datasets without causing a drop in English language understanding performance, and apply

deduplication to reduce memorization (Lee et al., 2021)

• Architectural and objective improvements

: Our model architecture is based on the Transformer. Past LLMs

have almost exclusively used a single causal or masked language modeling objective. Given the strong results of

UL2 (Tay et al., 2023), we use a tuned mixture of different pre-training objectives in this model to train the model

to understand different aspects of language.

The largest model in the PaLM 2 family, PaLM 2-L, is signiﬁcantly smaller than the largest PaLM model but uses

more training compute. Our evaluation results show that PaLM 2 models signiﬁcantly outperform PaLM on a variety

of tasks, including natural language generation, translation, and reasoning. These results suggest that model scaling

is not the only way to improve performance. Instead, performance can be unlocked by meticulous data selection

and efﬁcient architecture/objectives. Moreover, a smaller but higher quality model signiﬁcantly improves inference

efﬁciency, reduces serving cost, and enables the model’s downstream application for more applications and users.

PaLM 2 demonstrates signiﬁcant multilingual language, code generation and reasoning abilities, which we illustrate in

Figures 2 and 3. More examples can be found in Appendix B.

PaLM 2 performs signiﬁcantly better than PaLM on

real-world advanced language proﬁciency exams and passes exams in all evaluated languages (see Figure 1). For some

exams, this is a level of language proﬁciency sufﬁcient to teach that language. In this report, generated samples and

measured metrics are from the model itself without any external augmentations such as Google Search or Translate.

PaLM 2 includes control tokens to enable inference-time control over toxicity, modifying only a fraction of pre-training

as compared to prior work (Korbak et al., 2023). Special ‘canary’ token sequences were injected into PaLM 2 pre-

training data to enable improved measures of memorization across languages (Carlini et al., 2019, 2021). We ﬁnd

that PaLM 2 has lower average rates of verbatim memorization than PaLM, and for tail languages we observe that

memorization rates increase above English only when data is repeated several times across documents. We show that

PaLM 2 has improved multilingual toxicity classiﬁcation capabilities, and evaluate potential harms and biases across a

range of potential downstream uses. We also include an analysis of the representation of people in pre-training data.

These sections help downstream developers assess potential harms in their speciﬁc application contexts (Shelby et al.,

2023), so that they can prioritize additional procedural and technical safeguards earlier in development. The rest of this

report focuses on describing the considerations that went into designing PaLM 2 and evaluating its capabilities.

Note that not all capabilities of PaLM 2 are currently exposed via PaLM 2 APIs.

剩余91页未读，继续阅读

评论收藏

内容反馈

牛站长

2023-06-20

这份报告很详细，帮助我更深入地了解PaLM 2技术。
光与火花

2023-06-20

这份报告的结构很清晰，让读者能够更容易地理解PaLM 2技术的内容。
禁忌的爱

2023-06-20

这份报告值得分享给其他对PaLM 2技术感兴趣的人阅读。
洪蛋蛋

2023-06-20

作者在这份报告中对PaLM 2技术进行了详细而全面的介绍。
张景淇

2023-06-20

我对这份报告印象深刻，作者把技术内容用简洁的语言解释得很清楚。

前往

页

流水不腐程序

粉丝: 680
资源: 952

谷歌PaLM 2技术报告Palm2 Tech Report.pdf

最新资源

谷歌PaLM 2技术报告Palm2 Tech Report.pdf

谷歌发布技术报告：PaLM-2 推理超越 GPT-4，训练文本是第一代近 5 倍.pdf

PaLM 2 Technical Report ，PaLM 2技术报告

Google PaLM 2 技术手册

谷歌发布技术报告：PaLM-2 推理超越 GPT-4，训练文本是第一代近 5 倍

LINKEDIN -2021 全球Tech 采购决策者洞察报告 敏捷时代的技术营销趋势洞察.pdf

【Design in Tech Report】设计与人工智能 科技中的设计报告2023.pdf

McGraw.Hill.-.The.Ultimate.Palm.Robot.pdf

Wrox.Professional.Palm.OS.Programming.Apr.2005.rar

搭建Linux下的Palm开发环境.pdf

Palm_webOS_Rough_Cuts.pdf

20210526-中信证券-科技行业先锋系列报告214：Starline，谷歌IO大会新发布沉浸式社交技术.pdf

20210526-中信证券-科技行业先锋系列报告215：Starline，谷歌IO大会新发布沉浸式社交技术.pdf

科技行业先锋系列报告214：Starline，谷歌IO大会新发布沉浸式社交技术（11页）.pdf

科技行业先锋系列报告215：Starline，谷歌IO大会新发布沉浸式社交技术（11页）.pdf

海外行业动态研究：谷歌发布新模型PaLM2，推进搜索、办公等场景全面落地，期待移动端AIGC生态的发展.pdf

Linux ＆ Palm的约会.pdf

windows平台usb转串口驱动palm.rar

谷歌PaLM杀疯了，已从语言模型进化成机器人大脑？？.pdf

Palm.and.Treo.Hacks.2005

Palm PDF 工具

平板电脑检测报告.pdf

微软Chatgpt英文原版报告.pdf

谷歌宣布开放PaLM API，可在浏览器内操作，终于可以尝试大模型啦.pdf

palm room.rar

Palm_pre新机入门傻瓜篇.pdf

java2palm-java转palm格式软件

[Palm.WebOS.开发].

stable-diffusion部署需要的包

大规模语言模型：从理论到实践

人工智能大模型介绍.pptx

最新资源

LINKEDIN -2021 全球Tech 采购决策者洞察报告敏捷时代的技术营销趋势洞察.pdf

【Design in Tech Report】设计与人工智能科技中的设计报告2023.pdf