通用人工智能的火花：GPT-4早期实验资源-CSDN文库

共1个文件

pdf：1个

人工智能

需积分: 5 143 浏览量 2023-05-23 10:57:02 上传评论收藏 6.33MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

通用人工智能的火花：GPT-4早期实验.rar （1个子文件）

通用人工智能的火花：GPT-4早期实验.pdf 6.97MB

Sparks of Artiﬁcial General Intelligence:

Early experiments with GPT-4

S´ebastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke

Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg

Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang

Microsoft Research

Abstract

Artiﬁcial intelligence (AI) researchers have been developing and reﬁning large language models (LLMs)

that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding

of learning and cognition. The latest model developed by OpenAI, GPT-4 [Ope23], was trained using an

unprecedented scale of compute and data. In this paper, we report on our investigation of an early version

of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-

4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit

more general intelligence than previous AI models. We discuss the rising capabilities and implications of

these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and diﬃcult

tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any

special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level

performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of

GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version

of an artiﬁcial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis

on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more

comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond

next-word prediction. We conclude with reﬂections on societal inﬂuences of the recent technological leap and

future research directions.

Contents

1 Introduction 4

1.1 Our approach to studying GPT-4’s intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Organization of our demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Multimodal and interdisciplinary composition 13

2.1 Integrative ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Image generation beyond memorization . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Image generation following detailed instructions (`a la Dall-E) . . . . . . . . . . . . . . 17

2.2.3 Possible application in sketch generation . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Coding 21

3.1 From instructions to code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.1 Coding challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.2 Real world scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Understanding existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

arXiv:2303.12712v1 [cs.CL] 22 Mar 2023

4 Mathematical abilities 30

4.1 A mathematical conversation with GPT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 A ﬁrst generalization of the original question . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 A second variant of the original question . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.3 Analysis of the limitations highlighted by conversation . . . . . . . . . . . . . . . . . . 34

4.2 Performance on mathematical problem datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3 Mathematical modeling in various domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Higher level mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Interaction with the world 43

5.1 Tool use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1.1 Using multiple tools to solve more complex tasks . . . . . . . . . . . . . . . . . . . . . 44

5.1.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 Embodied Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.1 Warmup: navigating a map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.2 Text-based games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.3 Real world problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Interaction with humans 54

6.1 Understanding Humans: Theory of Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.1.1 Testing speciﬁc aspects of theory of mind . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.1.2 Testing theory of mind in realistic scenarios . . . . . . . . . . . . . . . . . . . . . . . . 54

6.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2 Talking to Humans: Explainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Discriminative Capabilities 69

7.1 PII Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

7.2 Misconceptions and Fact-Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.2.1 Why Are Current Metrics Insuﬃcient? . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.2.2 GPT-4 as a Judge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

8 Limitations of autoregressive architecture highlighted by GPT-4 76

8.1 Warm-up with two basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

8.2 Lack of planning in arithmetic/reasoning problems . . . . . . . . . . . . . . . . . . . . . . . . 77

8.3 Lack of planning in text generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

9 Societal inﬂuences 82

9.1 Challenges of erroneous generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

9.2 Misinformation and manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9.3 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

9.4 Human expertise, jobs, and economics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

9.5 Constellation of inﬂuences and considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10 Directions and Conclusions 92

10.1 Deﬁnitions of intelligence, AI, and AGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

10.2 On the path to more general artiﬁcial intelligence . . . . . . . . . . . . . . . . . . . . . . . . . 93

10.3 What is actually happening? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A GPT-4 has common sense grounding 101

B Appendix for multimodal and interdisciplinary composition 105

B.1 Further details on integrative ability results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

B.2 Further details on vision results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

B.3 Graphic novel design example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

C Appendix for the Coding section 111

C.1 Measuring human performance on LeetCode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

C.2 Example of GPT-4 visualizing IMDb data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

C.3 More examples on visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

C.4 Example for 2D HTML game development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

C.5 Example for graphical user interface programming . . . . . . . . . . . . . . . . . . . . . . . . 116

C.6 Example for reverse engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

C.7 Testing GPT-4’s ability to execute (pseudo) code . . . . . . . . . . . . . . . . . . . . . . . . . 121

D Additional examples for mathematical reasoning 122

D.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

D.2 Further examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

D.3 Generating math problems with GPT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

D.4 Mitigating calculation errors via external code execution . . . . . . . . . . . . . . . . . . . . . 139

E Additional Interpretability Examples 141

E.1 Explanation Agent Mismatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

F Additional examples for interaction with the world 144

F.1 Interact with tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

F.2 Examples for interaction with environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

Something unknown is doing we don’t know what.

– Sir Arthur Eddington

1 Introduction

Intelligence is a multifaceted and elusive concept that has long challenged psychologists, philosophers, and

computer scientists. An attempt to capture its essence was made in 1994 by a group of 52 psychologists

who signed onto a broad deﬁnition published in an editorial about the science of intelligence [Got97]. The

consensus group deﬁned intelligence as a very general mental capability that, among other things, involves the

ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn

from experience. This deﬁnition implies that intelligence is not limited to a speciﬁc domain or task, but rather

encompasses a broad range of cognitive skills and abilities. Building an artiﬁcial system that exhibits the kind

of general intelligence captured by the 1994 consensus deﬁnition is a long-standing and ambitious goal of AI

research. In early writings, the founders of the modern discipline of artiﬁcial intelligence (AI) research called

out sets of aspirational goals for understanding intelligence [MMRS06]. Over decades, AI researchers have

pursued principles of intelligence, including generalizable mechanisms for reasoning (e.g., [NSS59], [LBFL93])

and construction of knowledge bases containing large corpora of commonsense knowledge [Len95]. However,

many of the more recent successes in AI research can be described as being narrowly focused on well-deﬁned

tasks and challenges, such as playing chess or Go, which were mastered by AI systems in 1996 and 2016,

respectively. In the late-1990s and into the 2000s, there were increasing calls for developing more general

AI systems (e.g., [SBD

96]) and scholarship in the ﬁeld has sought to identify principles that might underly

more generally intelligent systems (e.g., [Leg08, GHT15]). The phrase, “artiﬁcial general intelligence” (AGI),

was popularized in the early-2000s (see [Goe14]) to emphasize the aspiration of moving from the “narrow

AI”, as demonstrated in the focused, real-world applications being developed, to broader notions of intelli-

gence, harkening back to the long-term aspirations and dreams of earlier AI research. We use AGI to refer

to systems that demonstrate broad capabilities of intelligence as captured in the 1994 deﬁnition above, with

the additional requirement, perhaps implicit in the work of the consensus group, that these capabilities are

at or above human-level. We note however that there is no single deﬁnition of AGI that is broadly accepted,

and we discuss other deﬁnitions in the conclusion section.

The most remarkable breakthrough in AI research of the last few years has been the advancement of

natural language processing achieved by large language models (LLMs). These neural network models are

based on the Transformer architecture [VSP

17] and trained on massive corpora of web-text data, using at its

core a self-supervised objective of predicting the next word in a partial sentence. In this paper, we report on

evidence that a new LLM developed by OpenAI, which is an early and non-multimodal version of GPT-4

[Ope23], exhibits many traits of intelligence, according to the 1994 deﬁnition. Despite being purely a language

model, this early version of GPT-4 demonstrates remarkable capabilities on a variety of domains and tasks,

including abstraction, comprehension, vision, coding, mathematics, medicine, law, understanding of human

motives and emotions, and more. We interacted with GPT-4 during its early development by OpenAI using

purely natural language queries (prompts)

. In Figure 1.1, we display some preliminary examples of outputs

from GPT-4, asking it to write a proof of inﬁnitude of primes in the form of a poem, to draw a unicorn in

TiKZ (a language for creating graphics in L

X), to create a complex animation in Python, and to solve

a high-school level mathematical problem. It easily succeeds at all these tasks, and produces outputs that

are essentially indistinguishable from (or even better than) what humans could produce. We also compare

GPT-4’s performance to those of previous LLMs, most notably ChatGPT, which is a ﬁne-tuned version of (an

improved) GPT-3 [BMR

20]. In Figure 1.2, we display the results of asking ChatGPT for both the inﬁni-

tude of primes poem and the TikZ unicorn drawing. While the system performs non-trivially on both tasks,

there is no comparison with the outputs from GPT-4. These preliminary observations will repeat themselves

throughout the paper, on a great variety of tasks. The combination of the generality of GPT-4’s capabilities,

with numerous abilities spanning a broad swath of domains, and its performance on a wide spectrum of tasks

at or beyond human-level, makes us comfortable with saying that GPT-4 is a signiﬁcant step towards AGI.

As GPT-4 ’s development continued after our experiments, one should expect diﬀerent responses from the ﬁnal version of GPT-

4. In particular, all quantitative results should be viewed as estimates of the model’s potential, rather than deﬁnitive numbers. We

repeat this caveat throughout the paper to clarify that the experience on the deployed model may diﬀer. Moreover we emphasize

that the version we tested was text-only for inputs, but for simplicity we refer to it as GPT-4 too.

GPT-4

Prompt: Can you write a proof that there are inﬁnitely

many primes, with every line that rhymes?

GPT-4:

Yes, I think I can,

Though it might take a clever plan.

I’ll start by noting Euclid’s proof,

Which shows that primes aren’t just aloof.

Assume we have a finite list,

Of primes, and that none have been missed.

Multiply them all together,

And add one, just to be clever.

The result will be a number,

That has no prime factors, oh wonder!

But every number has a prime,

That divides it, it’s just a matter of time.

So we’ve found a contradiction,

And our finite list needs eviction.

There must be infinitely many primes,

And that’s the end of my rhyming lines.s

Prompt: Draw a unicorn in TiKZ.

GPT-4: [Produces L

X compiling to

following picture.]

Prompt: Can you generate a pyplot for the following data: x = [1, 3, 5, 6, 8], y1 = [2, 3, 5, 18, 1], y2 = [3, 5, 6, 8, 1], y3 =

[5, 1, 2, 3, 4], y4 = [9, 7, 2, 3, 1]. I want y1, y2 to be in the same plot, but y3 is in another plot next to that plot, y4 is in

below. I want the legend of y1 to be ”bob”, y2 to be ”alice”, y3 to be ”bilbo”, y4 to be ”allie”. I want the x-axis to be

labeled with ”time” and y axis to be labeled with ”money”. I want to add a 10 %-40% random error bar to each curve,

through all times (including non-integers). I want smoothed curves to show the plot, and smoothed error bar. Do not

use linear interpolation, use smooth interpolation! I want to also add some small zig-zag to the smoothed curve to make

it look more real. I want to put a baseline as the mean of each line. I want to put a pie chart below indicating the

fraction of the four people in each time step. I also want an animation to show how the fractions are changing in the

pie chart in **continuous time**. Interpolate the missing fractions! I also want vertical line animation in other three

plots to match the pie chart. I want the fanciest plot. Please add as many fancy things as possible.

Here is a possible code to generate a pyplot for the following data, with some comments to explain the steps:

GPT-4: [We give two snapshots from the animation resulting from the produced code]

Prompt: Suppose g(x) = f

−1

(x), g(0) = 5, g(4) = 7, g(3) = 2, g(7) = 9, g(9) = 6 what is f(f(f (6)))?

GPT-4: There are diﬀerent ways to approach this problem, but one possible method is to use the fact that g(x) =

−1

(x) implies that f (g(x)) = x for any x in the domain of g. That is, applying f to both sides of the equation

g(x) = f

−1

(x) gives f (g(x)) = f (f

−1

(x)) = x. This means that we can undo the eﬀect of f by applying g, and vice

versa. Using this idea, we can try to ﬁnd f(6) by looking for the value of x such that g(x) = 6. From the given

information, we see that g(9) = 6, so f (6) = f(g(9)) = 9. Similarly, we can ﬁnd f (9) by looking for the value of x such

that g(x) = 9. We see that g(7) = 9, so f (9) = f (g(7)) = 7. Finally, we can ﬁnd f(7) by looking for the value of x such

that g(x) = 7. We see that g(4) = 7, so f (7) = f (g(4)) = 4. Therefore, f(f(f(6))) = f(f(9)) = f(7) = 4.

Figure 1.1: Preliminary examples of GPT-4’s capabilities in language, vision, coding, and mathe-

matics.

评论收藏

内容反馈

只是、追念ζ

粉丝: 0
资源: 60

通用人工智能的火花：GPT-4早期实验

论文通用人工智能的火花：GPT-4 早期实验

通用人工智能的火花：GPT-4早期实验[中文].pdf

通用人工智能的火花：GPT-4早期实验.zip

微软GPT研究报告：人工通用智能的火花，GPT-4的早期实验（中文版）

chatGPT/GPT-4微软报告154页pdf《通用人工智能火花：GPT-4早期试验》

通用人工智能的火花 GPT-4早期实验-批注

劲爆！微软GPT研究报告：人工通用智能的火花，GPT-4的早期实验

通用人工智能的火花：GPT-4早期实验[中文].zip

微软：人工通用智能的火花 GPT-4的早期实验

154页微软GPT研究报告：人工通用智能的火花，GPT-4的早期实验（中文版）.pdf

【全中文版】劲爆！154页微软GPT研究报告《人工通用智能的火花：GPT-4的早期实验》.zip

154页微软GPT研究报告：人工通用智能的火花，GPT-4的早期实验（英文）.pdf

【全中文版】154页微软GPT研究报告《人工通用智能的火花：GPT-4的早期实验》.pdf

154页微软GPT研究报告：人工通用智能的火花，GPT-4的早期实验（中文版）

人工通用智能的星星之火-GPT-4的早期实验

人工通用智能GPT-4的早期实验

GPT-4能否点亮通用人工智能？

人工通用智能的火花 GPT-4的早期实验 （中文）.pdf

《GPT-4 ，通用人工智能的火花》论文内容精选与翻译

国际大厂GPT研究报告：人工通用智能的火花，GPT-4的早期实验（中文版）.pdf

人工通用智能的火花 GPT-4的早期实验 （英文）.pdf

GPT-4是通用人工智能，又是通用技术

人工通用智能的星星之火:GPT-4的早期实验

相关实用应用程序（Windows可用）

免费可用的ChatGPT网页版.zip

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

最新资源

人工通用智能的火花 GPT-4的早期实验（中文）.pdf

人工通用智能的火花 GPT-4的早期实验（英文）.pdf