没有合适的资源?快使用搜索试试~ 我知道了~
生成式AI对高技能工作的影响-基于三项软件开发者领域实验的报告总结
0 下载量 28 浏览量
2024-09-10
10:18:17
上传
评论
收藏 3.03MB PDF 举报
温馨提示
生成式AI对高技能工作的影响——基于三项软件开发者领域实验的报告总结
资源推荐
资源详情
资源评论
The Effects of Generative AI on High Skilled Work:
Evidence from Three Field Experiments with
Software Developers
*
Kevin Zheyuan Cui, Mert Demirer, Sonia Jaffe,
Leon Musolff, Sida Peng, and Tobias Salz
September 2024
Abstract
This study evaluates the impact of generative AI on software developer produc-
tivity by analyzing data from three randomized controlled trials conducted at Mi-
crosoft, Accenture, and an anonymous Fortune 100 electronics manufacturing com-
pany. These field experiments, which were run by the companies as part of their
ordinary course of business, provided a randomly selected subset of developers with
access to GitHub Copilot, an AI-based coding assistant that suggests intelligent code
completions. Though each separate experiment is noisy, combined across all three
experiments and 4,867 software developers, our analysis reveals a 26.08% increase
(SE: 10.3%) in the number of completed tasks among developers using the AI tool.
Notably, less experienced developers showed higher adoption rates and greater pro-
ductivity gains.
*
Cui: Princeton. Demirer: MIT. Jaffe: Microsoft. Musolff: The Wharton School of the University of
Pennsylvania. Peng: Microsoft. Salz: MIT. We are grateful to Avi Goldfarb, Shane Greenstein, Anton
Korinek, Ethan Mollick, as well as the participants of the IIOC2024 conference and the AI, Cognition, and
the Economy Workshop. Thanks also for data help from employees at Microsoft, GitHub, and Accenture:
Phillip Coppney, Wen Du, Ya Gao, Lizzie Redford, Ryan J. Salva, Daniel A. Schocke, Amanda Silver, An-Jen
Tai, Dan Tetrick, Jeff Wilcox.
1
1 Introduction
Many economists expect generative AI to profoundly affect the organization of economic
activity (Agrawal, Gans, and Goldfarb 2019; Frank et al. 2019; Furman and Seamans
2019). Eloundou et al. 2023 estimate that generative AI can perform tasks associated with
over 80% of U.S. jobs and that AI task coverage is notably higher for occupations that
require advanced degrees. The ability of generative AI to perform tasks required in such
high-skilled occupations — allowing it to assist doctors in diagnosing diseases, lawyers
in drafting legal documents, and software engineers with code development — has led
to predictions of substantial productivity gains from the adoption of such technologies
(Baily, Brynjolfsson, and Korinek 2023). Others, however, are less optimistic about such
productivity gains (Acemoglu 2024).
Uncertainty around firms’ willingness to adopt these technologies and their capacity
to make necessary complementary investments (Bresnahan 2024; Brynjolfsson, Rock, and
Syverson 2021) make it currently difficult to empirically assess whether or not optimism
about productivity gains is justified.
1
Nevertheless, some applications of generative AI
have already matured and are integrated into existing workflows. An example is software
development, where commercial coding assistants based on generative AI have gained
widespread adoption.
2
In this project, we ask how generative AI affects the productivity of knowledge work-
ers, using software developers as an example. We analyze three large-scale randomized
controlled trials in a real-world environment. These experiments randomly assigned
access to Copilot, a coding assistant developed by GitHub in collaboration with Ope-
nAI, to just under five thousand software developers at Microsoft, Accenture, and an
anonymous Fortune 100 electronics manufacturing company (henceforth Anonymous
Company). These experiments were run as part of the ordinary course of business at
these companies to decide whether or how extensively to adopt these technologies, and
the companies kindly shared the resulting data with us.
3
Our preferred estimates suggest that usage of the coding assistant causes a 26.08%
(SE: 10.3%) increase in the weekly number of completed tasks. When we look at outcomes
of secondary interest, our results support this interpretation, with a 13.55% (SE: 10.0%)
increase in the number of code updates (commits) and a 38.38% (SE: 12.55%) increase in
the number of times code was compiled. For Microsoft we observe both the developers’
tenure and their seniority as measured by job title. We find that Copilot significantly
raises task completion for more recent hires and those in more junior positions but not for
developers with longer tenure and in more senior positions. Prior work has shown that
1
In addition, it hard to predict further breakthroughs in the architecture of these models, which may
lead to further improvements in quality or decrease the cost of inference and training.
2
Prior academic work has shown that generative AI can pass mock interviews for coding jobs at Amazon
in the top decile of human performance, performs at human level in a database of coding challenges that
measure programming logic and proficiency, and can write entire programs for simple video games from
several lines of instructions (Bubeck et al. 2023) Copilot is used by 1.3 million subscribers and more than
50,000 businesses.
3
The exact implementation of these experiments was rather ad-hoc as they were driven by business
considerations at these companies rather than research goals.
2
when workers are conducting the same tasks, generative AI helps lower-ability or lower-
experience workers more (e.g., Brynjolfsson, Li, and Raymond 2023; Noy and Zhang
2023). Our results extend this finding by showing that even when workers are performing
tasks according to their tenure or seniority, generative AI increases productivity more for
lower-ability workers.
Our preferred estimate pools estimates across all three experiments and places more
weight on periods with larger differences in treatment status. We make these choices
because our analysis must confront challenges related to statistical power, despite the
large number of software developers in the experiments. These challenges arise due to
large variation in measured outcomes and factors that reduce the take-up and duration
of the three experiments.
4
The experiment at Microsoft started before ChatGPT and
Copilot were widely known, and initial uptake was low. Shortly after a larger fraction
of developers in the treatment group started using it, the control group was also allowed
access. At Accenture, only a few hundred people participated in the experiment. Lastly,
at Anonymous Company, the treatment consisted of a staggered roll-out with only a
short period of time with differences in treatment status.
Most studies of the impact of generative AI tools on worker productivity have been
conducted in controlled lab-like experiments (Peng et al. 2023; Vaithilingam, Zhang, and
Glassman 2022, Campero et al. 2022, Noy and Zhang 2023). In a lab-in-the-field experi-
ment on consultants employed by Boston Consulting Group, Dell’Acqua et al. 2023 find
that productivity on 18 tasks designed to mimic the day-to-day work at a consulting
company increased by 12%-25%. Evidence from these experiments generally suggests
significant productivity effects of generative AI. The exception is Vaithilingam, Zhang,
and Glassman 2022, which did not find a statistically significant difference in completion
time. A second common observation from these studies is that generative AI has the
largest effect on the least productive and least experienced workers (e.g., Noy and Zhang
2023).
While lab experiments offer a valuable opportunity to examine the short-term impli-
cations of generative AI, challenges and complex interactions arise when these tools are
deployed in real-world environments (Jaffe et al. 2024). There are some observational
studies of the effects of generative AI in an actual workplace setting (Hoffmann et al.
2024; Yeverechyahu, Mayya, and Oestreicher-Singer 2024) that do not have the benefit of
random experimental assignment of these technologies.
Our work complements both the literature on lab experiments as well as these obser-
vational studies by studying the impact of generative AI using a field experiment in an
actual workplace setting. To date, there is still a dearth of experimental studies exam-
ining the effect of generative AI in a field setting. In a notable exception, Brynjolfsson,
Li, and Raymond 2023 find that an AI-based conversational assistant increases the pro-
ductivity of customer chat support agents by 14%. Our study complements theirs by
examining a field experiment with high-skilled and highly paid knowledge workers, a
group that is particularly relevant given the prediction that high-skilled jobs will be most
affected by this technology. Although we examine a different part of the skill distribu-
4
We observe large variation in the output of software developers due to significant heterogeneity in
their seniority, with more senior managers being less likely to engage in coding activities.
3
tion, we find similar productivity increases. Like Brynjolfsson, Li, and Raymond 2023,
we also find that these gains are primarily driven by improved output from recent hires
and employees in more junior roles.
2 Setting and Experiments
2.1 What Is AI-Assisted Software Development?
AI Assistants for software development offer intelligent code suggestions and auto-
completion within integrated development environments. Prominent examples include
GitHub Copilot, Amazon CodeWhisperer, and Replit Ghostwriter. In our study, we ex-
amine the effects of one of these tools, GitHub Copilot. GitHub Copilot was developed
by GitHub in partnership with OpenAI. The development of Copilot involved combining
advanced machine learning techniques and natural language processing. A substantial
amount of code from public GitHub repositories was used to train Copilot. This exten-
sive dataset allowed the AI model to learn from real-world coding practices, patterns,
and styles across various programming languages and frameworks.
As developers write software code or plain text comments, Copilot analyzes the con-
text and generates relevant code snippets, comments, and documentation. It can au-
tocomplete code that developers might manually type or suggest snippets they would
otherwise need to search for online. This capability can save developers time and poten-
tially improve code quality by offering suggestions the developer might not be aware of.
However, like all LLM-based tools, Copilot can make mistakes. If developers rely on it
without review, it could potentially introduce errors or decrease code quality.
2.2 Experiments
We analyze three randomized experiments conducted with software developers at Mi-
crosoft, Accenture, and Anonymous Company. In the Microsoft and Accenture experi-
ments, one group of developers (the treated group) was randomly assigned to be able to
access GitHub Copilot, whereas the other group (the control group) did not have access
to the tool for a period of seven (Microsoft) or four (Accenture) months. In the Anony-
mous Company experiment, all users gained access to the tool over a period of two
months, but access dates were randomized, with some teams gaining access six weeks
before others.
Microsoft The experiment at Microsoft started in the first week of September 2022, in-
volving a sample size of 1,746 developers primarily located in the United States. Of these
developers, 50.4% were randomly selected to receive access to Github Copilot. Random-
ization was implemented at both the individual and the team levels. The developers
work on building a wide range of software within Microsoft, with tasks that include en-
gineering, designing, and testing software products and services. They occupy various
positions in the company, ranging from entry-level developers to team managers. They
may work in a team or individually, depending on their task and team structure.
4
Participants in the treated group were informed via email about the opportunity to
sign up for GitHub Copilot. The email also included information introducing GitHub
Copilot as a productivity-enhancing tool and outlining its potential impact on their cod-
ing tasks (see Figure 6 in Appendix). Beyond this email, treated participants did not
receive any specific instructions regarding their workload or workflow to ensure they
use GitHub Copilot in their natural work environment. Control group participants did
not receive any communication as part of the study.
5
The experiment ended on May 3rd,
2023, as growing awareness of AI-assisted coding tools led the control group participants
to seek access to Copilot.
Accenture The Accenture experiment started in the last week of July 2023 and included
a number of Accenture offices located in Southeast Asia. Randomization occurred at the
developer level, with 61.3% of the 320 developers assigned to the treatment group. Treat-
ment group participants were informed over email that they were eligible to sign up
for GitHub Copilot. They also participated in a training session, which explained what
GitHub Copilot is, how to use it, and the potential benefits. Finally, the participating
managers were asked to encourage their reports’ adoption of GitHub Copilot. The con-
trol group was granted access to Copilot in December 2023, though uptake was lower
than in the treatment group.
Anonymous Company The Anonymous Company experiment started in October 2023.
It involved 3,054 developers who were all eventually invited to use Copilot. The invita-
tion dates were randomized, with new invites being sent out weekly between September
2023 and October 2023.
2.3 Variables and Outcome Measures
Measuring the productivity of modern knowledge work is notoriously difficult. Our
setting has the advantage that almost all professional software development follows a
highly structured workflow, where specific tasks are defined and tracked through version
control software. Internally defined goals and tasks are, therefore, quantifiable. All three
participating organizations use the version control software GitHub. By observing the
developers’ GitHub activity, we can observe many of the variables that are part of their
workflow.
A main outcome of interest is “pull requests”, which can be thought of as a unit of
work for software developers. Within an organization, the scope of a pull request is likely
to remain relatively stable over time, shaped by organizational norms and conventions,
even though different organizations may define this scope differently. For instance, a
pull request may be asking for a feature to be added to a larger software project. A
pull request will lead to a code review, often by a more senior software developer. If this
review is passed, the code will be merged and thereby become part of the larger software
project.
We use three additional outcome variables related to the developers’ workflow. Before
submitting a pull request, a developer will work separately on her code, tracking smaller
5
A small number of developers in the control group nevertheless got access to Copilot because they
were working on related tools.
5
剩余21页未读,继续阅读
资源评论
微凉的衣柜
- 粉丝: 4044
- 资源: 19
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 详解Numpy中where()函数及其多维数组的应用
- 电脑浏览器缓存清理手册
- STD40NF06LZ-T4-VB一种N-Channel沟道TO252封装MOS管
- 基于USV路径跟踪LOS控制算法matlab仿真源码+详细注释(高分项目)
- STD40NF06LZT4-VB一种N-Channel沟道TO252封装MOS管
- 通讯协议1-UART通用异步接收器/发送器
- Unity发布WebGL版本,InputField中文显示输入问题
- 基于Spring Cloud Sleuth和Zipkin的微服务追踪方案实现
- Python机器学习入门指南:概念讲解与实战案例
- 1. 软件工程基础知识 #软考 #中级设计师 #笔记
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功