Sparks of Artificial General Intelligence:
Early experiments with GPT-4
S´ebastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke
Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg
Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang
Microsoft Research
Abstract
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs)
that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding
of learning and cognition. The latest model developed by OpenAI, GPT-4 [Ope23], was trained using an
unprecedented scale of compute and data. In this paper, we report on our investigation of an early version
of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-
4 is part of a new cohort of LLMs (along with ChatGPT and Google’s PaLM for example) that exhibit
more general intelligence than previous AI models. We discuss the rising capabilities and implications of
these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult
tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any
special prompting. Moreover, in all of these tasks, GPT-4’s performance is strikingly close to human-level
performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of
GPT-4’s capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version
of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis
on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more
comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond
next-word prediction. We conclude with reflections on societal influences of the recent technological leap and
future research directions.
Contents
1 Introduction 4
1.1 Our approach to studying GPT-4’s intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Organization of our demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Multimodal and interdisciplinary composition 13
2.1 Integrative ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Image generation beyond memorization . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Image generation following detailed instructions (`a la Dall-E) . . . . . . . . . . . . . . 17
2.2.3 Possible application in sketch generation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Coding 21
3.1 From instructions to code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Coding challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2 Real world scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Understanding existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
arXiv:2303.12712v1 [cs.CL] 22 Mar 2023