生成式AI对高技能工作的影响-基于三项软件开发者领域实验的报告总结

人工智能

28 浏览量 2024-09-10 10:18:17 上传评论收藏 3.03MB PDF 举报

资源推荐

资源详情

资源评论

The Effects of Generative AI on High Skilled Work:

Evidence from Three Field Experiments with

Software Developers

Kevin Zheyuan Cui, Mert Demirer, Sonia Jaffe,

Leon Musolff, Sida Peng, and Tobias Salz

September 2024

Abstract

This study evaluates the impact of generative AI on software developer produc-

tivity by analyzing data from three randomized controlled trials conducted at Mi-

crosoft, Accenture, and an anonymous Fortune 100 electronics manufacturing com-

pany. These ﬁeld experiments, which were run by the companies as part of their

ordinary course of business, provided a randomly selected subset of developers with

access to GitHub Copilot, an AI-based coding assistant that suggests intelligent code

completions. Though each separate experiment is noisy, combined across all three

experiments and 4,867 software developers, our analysis reveals a 26.08% increase

(SE: 10.3%) in the number of completed tasks among developers using the AI tool.

Notably, less experienced developers showed higher adoption rates and greater pro-

ductivity gains.

Cui: Princeton. Demirer: MIT. Jaffe: Microsoft. Musolff: The Wharton School of the University of

Pennsylvania. Peng: Microsoft. Salz: MIT. We are grateful to Avi Goldfarb, Shane Greenstein, Anton

Korinek, Ethan Mollick, as well as the participants of the IIOC2024 conference and the AI, Cognition, and

the Economy Workshop. Thanks also for data help from employees at Microsoft, GitHub, and Accenture:

Phillip Coppney, Wen Du, Ya Gao, Lizzie Redford, Ryan J. Salva, Daniel A. Schocke, Amanda Silver, An-Jen

Tai, Dan Tetrick, Jeff Wilcox.

1 Introduction

Many economists expect generative AI to profoundly affect the organization of economic

activity (Agrawal, Gans, and Goldfarb 2019; Frank et al. 2019; Furman and Seamans

2019). Eloundou et al. 2023 estimate that generative AI can perform tasks associated with

over 80% of U.S. jobs and that AI task coverage is notably higher for occupations that

require advanced degrees. The ability of generative AI to perform tasks required in such

high-skilled occupations — allowing it to assist doctors in diagnosing diseases, lawyers

in drafting legal documents, and software engineers with code development — has led

to predictions of substantial productivity gains from the adoption of such technologies

(Baily, Brynjolfsson, and Korinek 2023). Others, however, are less optimistic about such

productivity gains (Acemoglu 2024).

Uncertainty around ﬁrms’ willingness to adopt these technologies and their capacity

to make necessary complementary investments (Bresnahan 2024; Brynjolfsson, Rock, and

Syverson 2021) make it currently difﬁcult to empirically assess whether or not optimism

about productivity gains is justiﬁed.

Nevertheless, some applications of generative AI

have already matured and are integrated into existing workﬂows. An example is software

development, where commercial coding assistants based on generative AI have gained

widespread adoption.

In this project, we ask how generative AI affects the productivity of knowledge work-

ers, using software developers as an example. We analyze three large-scale randomized

controlled trials in a real-world environment. These experiments randomly assigned

access to Copilot, a coding assistant developed by GitHub in collaboration with Ope-

nAI, to just under ﬁve thousand software developers at Microsoft, Accenture, and an

anonymous Fortune 100 electronics manufacturing company (henceforth Anonymous

Company). These experiments were run as part of the ordinary course of business at

these companies to decide whether or how extensively to adopt these technologies, and

the companies kindly shared the resulting data with us.

Our preferred estimates suggest that usage of the coding assistant causes a 26.08%

(SE: 10.3%) increase in the weekly number of completed tasks. When we look at outcomes

of secondary interest, our results support this interpretation, with a 13.55% (SE: 10.0%)

increase in the number of code updates (commits) and a 38.38% (SE: 12.55%) increase in

the number of times code was compiled. For Microsoft we observe both the developers’

tenure and their seniority as measured by job title. We ﬁnd that Copilot signiﬁcantly

raises task completion for more recent hires and those in more junior positions but not for

developers with longer tenure and in more senior positions. Prior work has shown that

In addition, it hard to predict further breakthroughs in the architecture of these models, which may

lead to further improvements in quality or decrease the cost of inference and training.

Prior academic work has shown that generative AI can pass mock interviews for coding jobs at Amazon

in the top decile of human performance, performs at human level in a database of coding challenges that

measure programming logic and proﬁciency, and can write entire programs for simple video games from

several lines of instructions (Bubeck et al. 2023) Copilot is used by 1.3 million subscribers and more than

50,000 businesses.

The exact implementation of these experiments was rather ad-hoc as they were driven by business

considerations at these companies rather than research goals.

when workers are conducting the same tasks, generative AI helps lower-ability or lower-

experience workers more (e.g., Brynjolfsson, Li, and Raymond 2023; Noy and Zhang

2023). Our results extend this ﬁnding by showing that even when workers are performing

tasks according to their tenure or seniority, generative AI increases productivity more for

lower-ability workers.

Our preferred estimate pools estimates across all three experiments and places more

weight on periods with larger differences in treatment status. We make these choices

because our analysis must confront challenges related to statistical power, despite the

large number of software developers in the experiments. These challenges arise due to

large variation in measured outcomes and factors that reduce the take-up and duration

of the three experiments.

The experiment at Microsoft started before ChatGPT and

Copilot were widely known, and initial uptake was low. Shortly after a larger fraction

of developers in the treatment group started using it, the control group was also allowed

access. At Accenture, only a few hundred people participated in the experiment. Lastly,

at Anonymous Company, the treatment consisted of a staggered roll-out with only a

short period of time with differences in treatment status.

Most studies of the impact of generative AI tools on worker productivity have been

conducted in controlled lab-like experiments (Peng et al. 2023; Vaithilingam, Zhang, and

Glassman 2022, Campero et al. 2022, Noy and Zhang 2023). In a lab-in-the-ﬁeld experi-

ment on consultants employed by Boston Consulting Group, Dell’Acqua et al. 2023 ﬁnd

that productivity on 18 tasks designed to mimic the day-to-day work at a consulting

company increased by 12%-25%. Evidence from these experiments generally suggests

signiﬁcant productivity effects of generative AI. The exception is Vaithilingam, Zhang,

and Glassman 2022, which did not ﬁnd a statistically signiﬁcant difference in completion

time. A second common observation from these studies is that generative AI has the

largest effect on the least productive and least experienced workers (e.g., Noy and Zhang

2023).

While lab experiments offer a valuable opportunity to examine the short-term impli-

cations of generative AI, challenges and complex interactions arise when these tools are

deployed in real-world environments (Jaffe et al. 2024). There are some observational

studies of the effects of generative AI in an actual workplace setting (Hoffmann et al.

2024; Yeverechyahu, Mayya, and Oestreicher-Singer 2024) that do not have the beneﬁt of

random experimental assignment of these technologies.

Our work complements both the literature on lab experiments as well as these obser-

vational studies by studying the impact of generative AI using a ﬁeld experiment in an

actual workplace setting. To date, there is still a dearth of experimental studies exam-

ining the effect of generative AI in a ﬁeld setting. In a notable exception, Brynjolfsson,

Li, and Raymond 2023 ﬁnd that an AI-based conversational assistant increases the pro-

ductivity of customer chat support agents by 14%. Our study complements theirs by

examining a ﬁeld experiment with high-skilled and highly paid knowledge workers, a

group that is particularly relevant given the prediction that high-skilled jobs will be most

affected by this technology. Although we examine a different part of the skill distribu-

We observe large variation in the output of software developers due to signiﬁcant heterogeneity in

their seniority, with more senior managers being less likely to engage in coding activities.

tion, we ﬁnd similar productivity increases. Like Brynjolfsson, Li, and Raymond 2023,

we also ﬁnd that these gains are primarily driven by improved output from recent hires

and employees in more junior roles.

2 Setting and Experiments

2.1 What Is AI-Assisted Software Development?

AI Assistants for software development offer intelligent code suggestions and auto-

completion within integrated development environments. Prominent examples include

GitHub Copilot, Amazon CodeWhisperer, and Replit Ghostwriter. In our study, we ex-

amine the effects of one of these tools, GitHub Copilot. GitHub Copilot was developed

by GitHub in partnership with OpenAI. The development of Copilot involved combining

advanced machine learning techniques and natural language processing. A substantial

amount of code from public GitHub repositories was used to train Copilot. This exten-

sive dataset allowed the AI model to learn from real-world coding practices, patterns,

and styles across various programming languages and frameworks.

As developers write software code or plain text comments, Copilot analyzes the con-

text and generates relevant code snippets, comments, and documentation. It can au-

tocomplete code that developers might manually type or suggest snippets they would

otherwise need to search for online. This capability can save developers time and poten-

tially improve code quality by offering suggestions the developer might not be aware of.

However, like all LLM-based tools, Copilot can make mistakes. If developers rely on it

without review, it could potentially introduce errors or decrease code quality.

2.2 Experiments

We analyze three randomized experiments conducted with software developers at Mi-

crosoft, Accenture, and Anonymous Company. In the Microsoft and Accenture experi-

ments, one group of developers (the treated group) was randomly assigned to be able to

access GitHub Copilot, whereas the other group (the control group) did not have access

to the tool for a period of seven (Microsoft) or four (Accenture) months. In the Anony-

mous Company experiment, all users gained access to the tool over a period of two

months, but access dates were randomized, with some teams gaining access six weeks

before others.

Microsoft The experiment at Microsoft started in the ﬁrst week of September 2022, in-

volving a sample size of 1,746 developers primarily located in the United States. Of these

developers, 50.4% were randomly selected to receive access to Github Copilot. Random-

ization was implemented at both the individual and the team levels. The developers

work on building a wide range of software within Microsoft, with tasks that include en-

gineering, designing, and testing software products and services. They occupy various

positions in the company, ranging from entry-level developers to team managers. They

may work in a team or individually, depending on their task and team structure.

Participants in the treated group were informed via email about the opportunity to

Copilot as a productivity-enhancing tool and outlining its potential impact on their cod-

ing tasks (see Figure 6 in Appendix). Beyond this email, treated participants did not

receive any speciﬁc instructions regarding their workload or workﬂow to ensure they

use GitHub Copilot in their natural work environment. Control group participants did

not receive any communication as part of the study.

The experiment ended on May 3rd,

2023, as growing awareness of AI-assisted coding tools led the control group participants

to seek access to Copilot.

Accenture The Accenture experiment started in the last week of July 2023 and included

a number of Accenture ofﬁces located in Southeast Asia. Randomization occurred at the

developer level, with 61.3% of the 320 developers assigned to the treatment group. Treat-

ment group participants were informed over email that they were eligible to sign up

for GitHub Copilot. They also participated in a training session, which explained what

GitHub Copilot is, how to use it, and the potential beneﬁts. Finally, the participating

managers were asked to encourage their reports’ adoption of GitHub Copilot. The con-

trol group was granted access to Copilot in December 2023, though uptake was lower

than in the treatment group.

Anonymous Company The Anonymous Company experiment started in October 2023.

It involved 3,054 developers who were all eventually invited to use Copilot. The invita-

tion dates were randomized, with new invites being sent out weekly between September

2023 and October 2023.

2.3 Variables and Outcome Measures

Measuring the productivity of modern knowledge work is notoriously difﬁcult. Our

setting has the advantage that almost all professional software development follows a

highly structured workﬂow, where speciﬁc tasks are deﬁned and tracked through version

control software. Internally deﬁned goals and tasks are, therefore, quantiﬁable. All three

participating organizations use the version control software GitHub. By observing the

developers’ GitHub activity, we can observe many of the variables that are part of their

workﬂow.

A main outcome of interest is “pull requests”, which can be thought of as a unit of

work for software developers. Within an organization, the scope of a pull request is likely

to remain relatively stable over time, shaped by organizational norms and conventions,

even though different organizations may deﬁne this scope differently. For instance, a

pull request may be asking for a feature to be added to a larger software project. A

pull request will lead to a code review, often by a more senior software developer. If this

review is passed, the code will be merged and thereby become part of the larger software

project.

We use three additional outcome variables related to the developers’ workﬂow. Before

submitting a pull request, a developer will work separately on her code, tracking smaller

A small number of developers in the control group nevertheless got access to Copilot because they

were working on related tools.

剩余21页未读，继续阅读

评论收藏

内容反馈

微凉的衣柜

粉丝: 4044
资源: 19

生成式AI对高技能工作的影响-基于三项软件开发者领域实验的报告总结

tarena的人工智能解读 python能做些什么

什么是ChatGPT以及学习ChatGPT的意义

基于gym的pytorch深度强化学习(PPO,DQN,SAC,DDPG,TD3等算法).zip

深度学习各种框架

计算机视觉与深度学习实战-以MATLAB和Python为工具_基于深度学习的视觉场景识别_项目开发案例教程.pdf

【路径规划】基于人工势场法机器人自动避障matlab代码.zip

基于A*算法的3*3拼图

ChatGPT技术在教育行业的潜在应用探讨.docx

fastai-docs：fastai的文档

caml-crawl:一个非常实验性的ocaml地下城爬行者

Python-tensorflow实战练习包括强化学习推荐系统nlp等

ai_introduction

PyTorch中文教程.zip

zhv2_d2l_ai_libtorch:深入学习深度学习中文第二版公开课目前已经开课，详情可去http

deep-learning:存储库旨在使用深度学习技术为不同的计算机视觉和自然语言处理问题提供解决方案

Voice-Chatbot

深度学习

Face_Generator

random:随机脚本和项目很有趣

Deep-Learning-Projects

自然灾害分类

Deep-Learning-Python:使用Python进行深度学习

anlp:包含应用自然语言处理课程的演示代码

（免费）Chrome浏览器插件axure-chrome-extension

Keil5 MDK5.40版本0积分免费下载

免费插件-AI插件-illustrator插件集合-尺寸标注-智能填充-颜色自动处理-自动批处理-Windows安装包.zip

嵌入式入门-ADS-安装包

pycdc、pycdas工具(最新2024.06.04编译)，Python3.9-3.12可用的反编译工具(exe转py)

yolov8源码+yolov8n、s、m.pt文件整合8.2.0版本

VMware Workstation Pro17安装包

最新资源

基于A算法的33拼图