没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本论文探讨了通过人工智能(特别是强化学习算法如Q-Learning)进行商品和服务定价时产生的自主协调行为的可能性。研究人员设计了一个重复的价格竞争寡头市场模型,并让采用Q-Learning算法的人工智能在这个环境中多次互动。研究表明,在不交流的情况下,这些算法能学会制定垄断价格,实施有限期的惩罚,并逐渐恢复合作。合谋行为主要体现在定价高于伯川德均衡但低于完全垄断水平,这表明算法可以形成稳定的默契协同行为来规避市场竞争。这种行为对于现有的反垄断法规形成了挑战,并可能需要新的政策调整。 适合人群:从事经济学、金融学以及算法领域的学者、研究生和专业人士。对于对算法博弈论感兴趣的研究人员尤为有价值。 使用场景及目标:研究旨在为理解现代数字市场经济中的动态定价机制提供理论依据和技术支持。通过对算法合谋的深入研究,政策制定者能够更好地应对由此引发的竞争扭曲风险,确保市场的公平与效率。另外,本文也为未来的机器学习和人工智能发展方向提供了指导。 其他说明:虽然这项研究表明某些条件下可能出现自发性的算法间协调,但在实际应用中还需要更多关于算法多样性和市场条件复杂度的研究。此外,加快算法的学习速度也是未来的重要方向之一。
资源推荐
资源详情
资源评论
ARTIFICIAL INTELLIGENCE, ALGORITHMIC PRICING AND COLLUSION
†
Emilio Calvano
∗‡
, Giacomo Calzolari
&‡§
,
Vincenzo Denicol
`
o
∗§
, and Sergio Pastorello
∗
December 2019
Increasingly, algorithms are supplanting human decision-makers in
pricing goods and services. To analyze the possible consequences, we
study experimentally the behavior of algorithms powered by Artificial
Intelligence (Q-learning) in a workhorse oligopoly model of repeated
price competition. We find that the algorithms consistently learn to
charge supra-competitive prices, without communicating with one an-
other. The high prices are sustained by collusive strategies with a finite
phase of punishment followed by a gradual return to cooperation. This
finding is robust to asymmetries in cost or demand, changes in the num-
ber of players, and various forms of uncertainty.
Keywords: Artificial Intelligence, Pricing-Algorithms, Collusion, Reinforcement Learn-
ing, Q-Learning.
J.E.L. codes: L41, L13, D43, D83.
†
We are grateful to the Editor, Jeffrey Ely, and three anonymous referees from many detailed instruc-
tions for revising the paper. We also thank, without implicating, Susan Athey, Ariel Ezrachi, Joshua
Gans, Joe Harrington, Bruno Jullien, Timo Klein, Kai-Uwe K¨uhn, Patrick Legros, David Levine, Wally
Mullin, Yossi Spiegel, Steve Tadelis, Emanuele Tarantino and participants at numerous conferences and
seminars for useful comments. Financial support from the Digital Chair initiative at the Toulouse School
of Economics is gratefully acknowledged.
Corresponding author: Giacomo Calzolari, giacomo.calzolari@eui.eu
∗
Bologna University;
‡
Toulouse School of Economics;
&
European University Institute;
§
CEPR
Electronic copy available at: https://ssrn.com/abstract=3304991
2 E. CALVANO, G. CALZOLARI, V. DENICOL
`
O, S. PASTORELLO
1. INTRODUCTION
Software programs are increasingly being adopted by firms to price their goods and ser-
vices, and this tendency is likely to continue.
1
In this paper, we ask whether pricing
algorithms may “autonomously” learn to collude. The possibility arises because of the
recent evolution of the software, from rule-based to reinforcement learning programs. The
new programs, powered by Artificial Intelligence (AI), are indeed much more autonomous
than their precursors. They can develop their pricing strategies from scratch, engaging in
active experimentation and adapting to changing environments. In this learning process,
they require little or no external guidance.
In the light of these developments, concerns have been voiced, by scholars and policy-
makers alike, that AI pricing algorithms may raise their prices above the competitive
level in a coordinated fashion, even if they have not been specifically instructed to do
so and even if they do not communicate with one another.
2
This form of tacit collusion
would defy current antitrust policy, which typically targets only explicit agreements among
would-be competitors (Harrington, 2018).
But how real is the risk of tacit collusion among algorithms? That is a difficult question to
answer, both empirically and theoretically. On the empirical side, collusion is notoriously
hard to detect from market outcomes,
3
and firms typically do not disclose details of the
pricing software they use. On the theoretical side, the interaction among reinforcement-
learning algorithms in pricing games generates stochastic dynamic systems so complex
that analytical results seem currently out of reach.
4
To make some progress, this paper takes an experimental approach. We construct AI
1
While revenue management programs have been used for decades in such industries as hotels and
airlines, the diffusion of pricing software has boomed with the advent of online marketplaces. For example,
in a sample of over 1,600 best-selling items listed on Amazon, Chen, Mislove and Wilson (2016) find that
in 2015 more than a third of the vendors had already automated their pricing. Since then, a repricing-
software industry has arisen, which supplies turnkey pricing systems to smaller vendors and customizes
software for the larger ones. But pricing software is increasingly used also in traditional off-line sectors
such as gas stations: see e.g. “Why do gas station prices costantly change? Blame the algorithms,” The
Wall Street Journal, May 8, 2017.
2
For the scholarly debate see, for instance, Ezrachi and Stucke (2016, 2017), Harrington (2018), K¨uhn
and Tadelis (2018) and Schwalbe (2019). As for policy, the possibility of algorithmic collusion has been
extensively discussed, for instance, at the 7th session of the FTC Hearings on competition and consumer
protection (November 2018) and has been the subject of white papers independently issued in 2018 by
the Canadian Competition Bureau and the British Competition and Market Authority.
3
With very rich data, however, the problem may not be insurmountable (Byrne and De Roos (2019))
4
One notable theoretical contribution is Salcedo (2015), who argues that optimized algorithms will
inevitably reach a collusive outcome. But this claim hinges crucially on the assumption that each algorithm
can periodically observe and “decode” the others, which in the meantime stay unchanged. The practical
relevance of Salcedo’s result thus remains controversial.
Electronic copy available at: https://ssrn.com/abstract=3304991
AI, ALGORITHMIC PRICING AND COLLUSION 3
pricing agents and let them interact repeatedly in computer-simulated marketplaces. The
challenge of this approach is to choose realistic economic environments, and algorithms
representative of those employed in practice. We discuss in detail how we address these
challenges as we proceed. Any conclusions are necessarily tentative at this stage, but our
findings do suggest that algorithmic collusion is more than a remote theoretical possibility.
The results indicate that, indeed, relatively simple pricing algorithms systematically learn
to play collusive strategies. The algorithms typically coordinate on prices that are some-
what below the monopoly level but substantially above the static Bertrand equilibrium.
The strategies that support these outcomes crucially involve punishments of defections.
Such punishments are finite in duration, with a gradual return to the pre-deviation prices.
The algorithms learn these strategies purely by trial and error. They are not designed or
instructed to collude, they do not communicate with one another, and they have no prior
knowledge of the environment in which they operate.
Our baseline model is a symmetric duopoly with deterministic demand, but we conduct
an extensive robustness analysis. The degree of collusion decreases as the number of
competitors rises. However, substantial collusion continues to prevail when the active firms
are three or four in number. The algorithms display a stubborn propensity to collude even
when they are asymmetric, and when they operate in stochastic environments.
Other papers have simulated reinforcement-learning algorithms in oligopoly, but ours is
the first to clearly document the emergence of collusive strategies among autonomous pric-
ing agents. The previous literature in both computer science and economics has focused on
outcomes rather than strategies.
5
But the observation of supra-competitive prices is not,
per se, genuine proof of collusion. To us economists, collusion is not simply a synonym of
high prices but crucially involves “a reward-punishment scheme designed to provide the
incentives for firms to consistently price above the competitive level” (Harrington (2018),
p. 336). The reward-punishment scheme ensures that the supra-competitive outcomes may
be obtained in equilibrium and do not result from a failure to optimize.
The difference is critical. For example, in their pioneering study of repeated Cournot
competition among Q-learning algorithms, computer scientists Waltman and Kaymak
5
Moreover, the vast majority of the literature does not use the canonical model of collusion, where firms
play an infinitely repeated game, pricing simultaneously in each stage and conditioning their prices on
past history. Rather, it uses frameworks similar to Maskin and Tirole (1988) model of staggered pricing.
In this model, two firms alternate in moving, commit to a price level for two periods, and condition
their pricing only on rival’s current price. The postulate of price commitment is however controversial,
as software algorithms can adjust prices very quickly. And probably the postulate is not innocuous.
Commitment may indeed facilitate coordination, as argued theoretically by Maskin and Tirole (1988)
and experimentally by Leufkens and Peeters (2011). At any rate, the best executed paper in this line of
research is probably Klein (2018), which provides also a survey of the earlier literature.
Electronic copy available at: https://ssrn.com/abstract=3304991
4 E. CALVANO, G. CALZOLARI, V. DENICOL
`
O, S. PASTORELLO
(2008) find that the algorithms reduce output, and hence raise prices, with respect to
the Nash equilibrium of the one-shot game.
6
They refer to this as collusion. When the
algorithms are far-sighted and are able to condition their current choices on past actions,
so that defections can be punished, their findings could indeed be consistent with collusive
behavior according to economists’ usage of the term. But Waltman and Kaymak consider
also the case where algorithms are myopic and have no memory of past actions – conditions
under which collusion is either unfeasible or cannot emerge in equilibrium – and find that
in these cases the output reduction is even larger. This suggests that what they observe
may not be collusion but a failure to learn an optimal strategy.
7
Verifying whether the high prices are supported by equilibrium strategies is not just a
theoretical curiosity. Algorithms that grossly fail to optimize would, in all likelihood,
be dismissed quickly and thus could hardly become a matter of antitrust concern. The
implications are instead very different if, as we show, the supra-competitive prices are set
by optimizing, or quasi-optimizing, programs.
Yet, there is an important caveat to keep in mind. To present a proof-of-concept demon-
stration of algorithmic collusion, in this paper we concentrate on what the algorithms
eventually learn and pay less attention to the speed of learning. Thus, we focus on al-
gorithms that by design learn slowly, in a completely unsupervised fashion, and in our
simulations we allow them to explore widely and interact as many times as is needed to
stabilize their behavior. As a result, the number of repetitions required for completing the
learning is typically high, on the order of hundreds of thousands. In fact, the algorithms
start to raise their prices much earlier. However, the time scale still remains an open issue;
it will be discussed further below.
The rest of the paper is organized as follows. The next section provides a self-contained
description of the class of Q-learning algorithms, which we use in our simulations. Section 3
describes the economic environments where the algorithms operate. Section 4 shows that
collusive outcomes are common and are generated by optimizing, or quasi-optimizing,
behavior. Section 5 then provides a more in-depth analysis of the collusive strategies that
support these outcomes. Section 6 reports on a number of robustness checks. Section 7
discusses the issue of the speed of learning. Section 8 concludes with a brief discussion of
the possible implications for policy.
6
Other papers that study reinforcement learning algorithms in a Cournot oligopoly include, Kimbrough
and Murphy (2009), and Siallagan et al (2013).
7
According to Cooper, Homem-de-Mello and Kleywegt (2015) such “collusion by mistake” may some-
times emerge also among revenue management systems that do not condition their current prices on
rivals’ past prices. This may happen in particular when the programs disregard competitors altogether
in the process of demand estimation, which biases the estimated elasticity downwards.
Electronic copy available at: https://ssrn.com/abstract=3304991
AI, ALGORITHMIC PRICING AND COLLUSION 5
2. Q-LEARNING
Following Waltman and Kaymak (2008), we concentrate on Q-learning algorithms. Even
if reinforcement learning comes in many different varieties,
8
there are several reasons
for this choice. First, one would like to experiment with algorithms that are commonly
adopted in practice, and although little is known on the specific software that firms
actually use, Q-learning is certainly highly popular among computer scientists. Second,
Q-learning algorithms are simple and can be fully characterized by just a few parameters,
the economic interpretation of which is clear. This makes it possible to keep possibly
arbitrary modeling choices to a minimum, and to conduct a comprehensive comparative
statics analysis with respect to the characteristics of the algorithms. Third, Q-learning
algorithms share the same architecture as the more sophisticated programs that have
recently obtained spectacular successes, achieving superhuman performances in such tasks
as playing the ancient board game Go (Silver et al., 2016), the Atari video-games (Mnih
et al., 2015), and, more recently, chess (Silver et al., 2018).
9
The downside of Q-learning
is that the learning process is slow, for reasons that will become clear in a moment.
In the rest of this section, we provide a brief introduction to Q-learning. Readers familiar
with this model may proceed directly to section 3.
2.1. Single agent problems
Like all reinforcement-learning algorithms, Q-learning programs adapt their behavior to
past experience, taking actions that have proven successful more frequently and unsuc-
cessful ones less frequently. In this way, they may learn an optimal policy, or a policy that
approximates the optimum, with no prior knowledge of the particular problem at hand.
10
Originally, Q-learning was proposed by Watkins (1989) to tackle Markov decision pro-
cesses. In a stationary Markov decision process, in each period t = 0, 1, 2, ... an agent
observes a state variable s
t
∈ S and then chooses an action a
t
∈ A(s
t
). For any s
t
and a
t
,
the agent obtains a reward π
t
, and the system moves on to the next state s
t+1
, according
to a time-invariant (and possibly degenerate) probability distribution F (π
t
, s
t+1
|s
t
, a
t
).
Q-learning deals with the version of this model where S and A are finite, and A is not
state-dependent.
8
For a thorough treatment of reinforcement learning in computer science, see Sutton and Barto (2018).
9
These more sophisticated programs might appear themselves to be a natural alternative to Q-learning.
However, they require many modeling choices that are somewhat arbitrary from an economic viewpoint.
We shall come back to this issue in Section 7.
10
Reinforcement learning was introduced in economics by Arthur (1991) and later popularized by Roth
and Erev (1995), Erev and Roth (1998) and Ho, Camerer and Chong (2007), among others.
Electronic copy available at: https://ssrn.com/abstract=3304991
剩余37页未读,继续阅读
资源评论
- #完美解决问题
- #运行顺畅
- #内容详尽
- #全网独家
- #注释完整
pk_xz123456
- 粉丝: 3016
- 资源: 4322
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- COMSOL声子晶体复能带模型与PDE模块:声学黑洞复能带模型解析与实虚能带绘制参考图集,comsol声子晶体复能带模型 PDE模块 声学黑洞 复能带模型 实能带与虚能带的绘制 参考lunwen
- Comsol飞秒激光烧蚀材料仿真:双温模型与PDE方程在固体传热中的应用,Comsol飞秒多脉冲激光烧蚀材料,双温模型,pde或者固体传热模型都可以做 可以实现温度场的仿真,观察电子温度晶格温度温度变
- 基于Comsol的谷霍尔光子晶体太赫兹拓扑光子学模型:复现文章中的计算与磁场分布分析,Comsol谷霍尔光子晶体(VPC)-片上通信的太赫兹拓扑光子学 本模型复现文章:Terahertz topolo
- "COMSOL三次谐波与本征手性BIC的物理特性研究:远场偏振图、手性透射曲线与二维能带图的深度解析-Q因子图与电场图可视化呈现",comsol三次谐波,本征手性BIC,远场偏振图,手性透射曲线,二
- CO2封存模拟研究:基于COMSOL的地层CO2注入与扩散真实模拟分析报告,COMSOL-CO2封存模拟研究; 建立真实地层,模拟CO2注入与扩散情况 考虑地层孔隙度、渗透率、温度与压力等变量
- "Comsol锂枝晶耦合模型:多物理场相场法模拟与应力分析",comsol锂枝晶耦合应力模型 耦合了浓度场电势场应力场 Comsol锂枝晶模拟-相场法加应力 复现参考文献:How Does Exte
- "基于相场法四场耦合模拟锂枝晶生长演化的研究:复现《How Does External Pressure Shape Li Dendrites in Li Metal Batteries》的应力耦合分
- "COMSOL 6.2激光粉末床熔融工件表面激光清洗仿真模型:探索粗糙表面演化的多物理场计算与应用拓展",COMSOL6.2激光粉末床熔融工件表面激光清洗仿真模型 本案例介绍一个针对激光粉末床熔融钛合
- Comsol仿真:多物理场耦合下锂枝晶生长过程的元胞自动机模拟研究,Comsol仿真模型 锂枝晶生长过程的枝晶形貌,温度场耦合,应力场,浓度场,电势场 C++程序,基于元胞自动机法模拟枝晶生长,能
- 基于COMSOL仿真的液晶分子与超表面相互作用调控相位:张量矩阵中任意液晶分布及相态液晶分析 ,COMSOL光学仿真:液晶分子与超表面共同作用调制相位(张量矩阵设置任意液晶分布,向列相 胆甾相液晶)
- "SEM图像二值化处理与信息提取:Matlab与COMSOL模型文件应用指南",SEM图二值化处理,提取图片信息导入模拟软件 利用Matlab将SEM图进行二值化处理,利用 COMSOL插件提取图片信
- COMSOL 6.1激光熔融气孔缺陷演化仿真案例:全面考虑熔融过程物理效应的精确模型,COMSOL 6.1 激光粉末床熔融气孔缺陷演化仿真案例模型 本案例选用层流和流体传热模块,采用水平集法,考虑材料
- 光子晶体中连续域束缚态的远场偏振计算:含K空间能带Q值及Matlab脚本的仿真模型与相关文献综述,comsol光子晶体连续域束缚态 远场偏振计算 含k空间 能带 Q值 远场偏振仿真模型和matlab脚
- "锌离子沉积技术的优化及comsol模型的应用研究",锌离子沉积改善,comsol模型 ,锌离子沉积改善; comsol模型; 沉积技术; 仿真模拟,锌离子沉积优化技术:Comsol模型的应用
- COMSOL相变模拟及Lunwen复现:气液固相变与管道高温热湿耦合的数值研究,comsol相变模拟,lunwen复现,气液固相变,管道高温热湿耦合 comsol管中流水加热气化,水由左侧流入右侧流出
- "换热站程序组态系统:双循环泵与双补水泵一用一备自动切换控制,基于CAD图纸与昆仑通泰触摸屏及西门子200smart硬件实现",热站程序组态系统,2个循环泵,2个补水泵,循环泵与补水泵采用一用一备,按
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功