没有合适的资源?快使用搜索试试~ 我知道了~
内容概要:本论文探讨了通过人工智能(特别是强化学习算法如Q-Learning)进行商品和服务定价时产生的自主协调行为的可能性。研究人员设计了一个重复的价格竞争寡头市场模型,并让采用Q-Learning算法的人工智能在这个环境中多次互动。研究表明,在不交流的情况下,这些算法能学会制定垄断价格,实施有限期的惩罚,并逐渐恢复合作。合谋行为主要体现在定价高于伯川德均衡但低于完全垄断水平,这表明算法可以形成稳定的默契协同行为来规避市场竞争。这种行为对于现有的反垄断法规形成了挑战,并可能需要新的政策调整。 适合人群:从事经济学、金融学以及算法领域的学者、研究生和专业人士。对于对算法博弈论感兴趣的研究人员尤为有价值。 使用场景及目标:研究旨在为理解现代数字市场经济中的动态定价机制提供理论依据和技术支持。通过对算法合谋的深入研究,政策制定者能够更好地应对由此引发的竞争扭曲风险,确保市场的公平与效率。另外,本文也为未来的机器学习和人工智能发展方向提供了指导。 其他说明:虽然这项研究表明某些条件下可能出现自发性的算法间协调,但在实际应用中还需要更多关于算法多样性和市场条件复杂度的研究。此外,加快算法的学习速度也是未来的重要方向之一。
Emilio Calvano
, Giacomo Calzolari
Vincenzo Denicol
, and Sergio Pastorello
December 2019
Increasingly, algorithms are supplanting human decision-makers in
pricing goods and services. To analyze the possible consequences, we
study experimentally the behavior of algorithms powered by Artificial
Intelligence (Q-learning) in a workhorse oligopoly model of repeated
price competition. We find that the algorithms consistently learn to
charge supra-competitive prices, without communicating with one an-
other. The high prices are sustained by collusive strategies with a finite
phase of punishment followed by a gradual return to cooperation. This
finding is robust to asymmetries in cost or demand, changes in the num-
ber of players, and various forms of uncertainty.
Keywords: Artificial Intelligence, Pricing-Algorithms, Collusion, Reinforcement Learn-
ing, Q-Learning.
J.E.L. codes: L41, L13, D43, D83.
We are grateful to the Editor, Jeffrey Ely, and three anonymous referees from many detailed instruc-
tions for revising the paper. We also thank, without implicating, Susan Athey, Ariel Ezrachi, Joshua
Gans, Joe Harrington, Bruno Jullien, Timo Klein, Kai-Uwe K¨uhn, Patrick Legros, David Levine, Wally
Mullin, Yossi Spiegel, Steve Tadelis, Emanuele Tarantino and participants at numerous conferences and
seminars for useful comments. Financial support from the Digital Chair initiative at the Toulouse School
of Economics is gratefully acknowledged.
Corresponding author: Giacomo Calzolari, giacomo.calzolari@eui.eu
Bologna University;
Toulouse School of Economics;
European University Institute;
Software programs are increasingly being adopted by firms to price their goods and ser-
vices, and this tendency is likely to continue.
In this paper, we ask whether pricing
algorithms may “autonomously” learn to collude. The possibility arises because of the
recent evolution of the software, from rule-based to reinforcement learning programs. The
new programs, powered by Artificial Intelligence (AI), are indeed much more autonomous
than their precursors. They can develop their pricing strategies from scratch, engaging in
active experimentation and adapting to changing environments. In this learning process,
they require little or no external guidance.
In the light of these developments, concerns have been voiced, by scholars and policy-
makers alike, that AI pricing algorithms may raise their prices above the competitive
level in a coordinated fashion, even if they have not been specifically instructed to do
so and even if they do not communicate with one another.
This form of tacit collusion
would defy current antitrust policy, which typically targets only explicit agreements among
would-be competitors (Harrington, 2018).
But how real is the risk of tacit collusion among algorithms? That is a difficult question to
answer, both empirically and theoretically. On the empirical side, collusion is notoriously
hard to detect from market outcomes,
and firms typically do not disclose details of the
pricing software they use. On the theoretical side, the interaction among reinforcement-
learning algorithms in pricing games generates stochastic dynamic systems so complex
that analytical results seem currently out of reach.
To make some progress, this paper takes an experimental approach. We construct AI
While revenue management programs have been used for decades in such industries as hotels and
airlines, the diffusion of pricing software has boomed with the advent of online marketplaces. For example,
in a sample of over 1,600 best-selling items listed on Amazon, Chen, Mislove and Wilson (2016) find that
in 2015 more than a third of the vendors had already automated their pricing. Since then, a repricing-
software industry has arisen, which supplies turnkey pricing systems to smaller vendors and customizes
software for the larger ones. But pricing software is increasingly used also in traditional off-line sectors
such as gas stations: see e.g. “Why do gas station prices costantly change? Blame the algorithms,” The
Wall Street Journal, May 8, 2017.
For the scholarly debate see, for instance, Ezrachi and Stucke (2016, 2017), Harrington (2018), K¨uhn
and Tadelis (2018) and Schwalbe (2019). As for policy, the possibility of algorithmic collusion has been
extensively discussed, for instance, at the 7th session of the FTC Hearings on competition and consumer
protection (November 2018) and has been the subject of white papers independently issued in 2018 by
the Canadian Competition Bureau and the British Competition and Market Authority.
With very rich data, however, the problem may not be insurmountable (Byrne and De Roos (2019))
One notable theoretical contribution is Salcedo (2015), who argues that optimized algorithms will
inevitably reach a collusive outcome. But this claim hinges crucially on the assumption that each algorithm
can periodically observe and “decode” the others, which in the meantime stay unchanged. The practical
relevance of Salcedo’s result thus remains controversial.
pricing agents and let them interact repeatedly in computer-simulated marketplaces. The
challenge of this approach is to choose realistic economic environments, and algorithms
representative of those employed in practice. We discuss in detail how we address these
challenges as we proceed. Any conclusions are necessarily tentative at this stage, but our
findings do suggest that algorithmic collusion is more than a remote theoretical possibility.
The results indicate that, indeed, relatively simple pricing algorithms systematically learn
to play collusive strategies. The algorithms typically coordinate on prices that are some-
what below the monopoly level but substantially above the static Bertrand equilibrium.
The strategies that support these outcomes crucially involve punishments of defections.
Such punishments are finite in duration, with a gradual return to the pre-deviation prices.
The algorithms learn these strategies purely by trial and error. They are not designed or
instructed to collude, they do not communicate with one another, and they have no prior
knowledge of the environment in which they operate.
Our baseline model is a symmetric duopoly with deterministic demand, but we conduct
an extensive robustness analysis. The degree of collusion decreases as the number of
competitors rises. However, substantial collusion continues to prevail when the active firms
are three or four in number. The algorithms display a stubborn propensity to collude even
when they are asymmetric, and when they operate in stochastic environments.
Other papers have simulated reinforcement-learning algorithms in oligopoly, but ours is
the first to clearly document the emergence of collusive strategies among autonomous pric-
ing agents. The previous literature in both computer science and economics has focused on
outcomes rather than strategies.
But the observation of supra-competitive prices is not,
per se, genuine proof of collusion. To us economists, collusion is not simply a synonym of
high prices but crucially involves “a reward-punishment scheme designed to provide the
incentives for firms to consistently price above the competitive level” (Harrington (2018),
p. 336). The reward-punishment scheme ensures that the supra-competitive outcomes may
be obtained in equilibrium and do not result from a failure to optimize.
The difference is critical. For example, in their pioneering study of repeated Cournot
competition among Q-learning algorithms, computer scientists Waltman and Kaymak
Moreover, the vast majority of the literature does not use the canonical model of collusion, where firms
play an infinitely repeated game, pricing simultaneously in each stage and conditioning their prices on
past history. Rather, it uses frameworks similar to Maskin and Tirole (1988) model of staggered pricing.
In this model, two firms alternate in moving, commit to a price level for two periods, and condition
their pricing only on rival’s current price. The postulate of price commitment is however controversial,
as software algorithms can adjust prices very quickly. And probably the postulate is not innocuous.
Commitment may indeed facilitate coordination, as argued theoretically by Maskin and Tirole (1988)
and experimentally by Leufkens and Peeters (2011). At any rate, the best executed paper in this line of
research is probably Klein (2018), which provides also a survey of the earlier literature.
(2008) find that the algorithms reduce output, and hence raise prices, with respect to
the Nash equilibrium of the one-shot game.
They refer to this as collusion. When the
algorithms are far-sighted and are able to condition their current choices on past actions,
so that defections can be punished, their findings could indeed be consistent with collusive
behavior according to economists’ usage of the term. But Waltman and Kaymak consider
also the case where algorithms are myopic and have no memory of past actions – conditions
under which collusion is either unfeasible or cannot emerge in equilibrium – and find that
in these cases the output reduction is even larger. This suggests that what they observe
may not be collusion but a failure to learn an optimal strategy.
Verifying whether the high prices are supported by equilibrium strategies is not just a
theoretical curiosity. Algorithms that grossly fail to optimize would, in all likelihood,
be dismissed quickly and thus could hardly become a matter of antitrust concern. The
implications are instead very different if, as we show, the supra-competitive prices are set
by optimizing, or quasi-optimizing, programs.
Yet, there is an important caveat to keep in mind. To present a proof-of-concept demon-
stration of algorithmic collusion, in this paper we concentrate on what the algorithms
eventually learn and pay less attention to the speed of learning. Thus, we focus on al-
gorithms that by design learn slowly, in a completely unsupervised fashion, and in our
simulations we allow them to explore widely and interact as many times as is needed to
stabilize their behavior. As a result, the number of repetitions required for completing the
learning is typically high, on the order of hundreds of thousands. In fact, the algorithms
start to raise their prices much earlier. However, the time scale still remains an open issue;
it will be discussed further below.
The rest of the paper is organized as follows. The next section provides a self-contained
description of the class of Q-learning algorithms, which we use in our simulations. Section 3
describes the economic environments where the algorithms operate. Section 4 shows that
collusive outcomes are common and are generated by optimizing, or quasi-optimizing,
behavior. Section 5 then provides a more in-depth analysis of the collusive strategies that
support these outcomes. Section 6 reports on a number of robustness checks. Section 7
discusses the issue of the speed of learning. Section 8 concludes with a brief discussion of
the possible implications for policy.
Other papers that study reinforcement learning algorithms in a Cournot oligopoly include, Kimbrough
and Murphy (2009), and Siallagan et al (2013).
According to Cooper, Homem-de-Mello and Kleywegt (2015) such “collusion by mistake” may some-
times emerge also among revenue management systems that do not condition their current prices on
rivals’ past prices. This may happen in particular when the programs disregard competitors altogether
in the process of demand estimation, which biases the estimated elasticity downwards.
Following Waltman and Kaymak (2008), we concentrate on Q-learning algorithms. Even
if reinforcement learning comes in many different varieties,
there are several reasons
for this choice. First, one would like to experiment with algorithms that are commonly
adopted in practice, and although little is known on the specific software that firms
actually use, Q-learning is certainly highly popular among computer scientists. Second,
Q-learning algorithms are simple and can be fully characterized by just a few parameters,
the economic interpretation of which is clear. This makes it possible to keep possibly
arbitrary modeling choices to a minimum, and to conduct a comprehensive comparative
statics analysis with respect to the characteristics of the algorithms. Third, Q-learning
algorithms share the same architecture as the more sophisticated programs that have
recently obtained spectacular successes, achieving superhuman performances in such tasks
as playing the ancient board game Go (Silver et al., 2016), the Atari video-games (Mnih
et al., 2015), and, more recently, chess (Silver et al., 2018).
The downside of Q-learning
is that the learning process is slow, for reasons that will become clear in a moment.
In the rest of this section, we provide a brief introduction to Q-learning. Readers familiar
with this model may proceed directly to section 3.
2.1. Single agent problems
Like all reinforcement-learning algorithms, Q-learning programs adapt their behavior to
past experience, taking actions that have proven successful more frequently and unsuc-
cessful ones less frequently. In this way, they may learn an optimal policy, or a policy that
approximates the optimum, with no prior knowledge of the particular problem at hand.
Originally, Q-learning was proposed by Watkins (1989) to tackle Markov decision pro-
cesses. In a stationary Markov decision process, in each period t = 0, 1, 2, ... an agent
observes a state variable s
∈ S and then chooses an action a
∈ A(s
). For any s
and a
the agent obtains a reward π
, and the system moves on to the next state s
, according
to a time-invariant (and possibly degenerate) probability distribution F (π
, s
, a
Q-learning deals with the version of this model where S and A are finite, and A is not
For a thorough treatment of reinforcement learning in computer science, see Sutton and Barto (2018).
These more sophisticated programs might appear themselves to be a natural alternative to Q-learning.
However, they require many modeling choices that are somewhat arbitrary from an economic viewpoint.
We shall come back to this issue in Section 7.
Reinforcement learning was introduced in economics by Arthur (1991) and later popularized by Roth
and Erev (1995), Erev and Roth (1998) and Ho, Camerer and Chong (2007), among others.
