算法定价与合谋：基于Q-Learning的人工智能算法模拟实验研究及其经济后果资源-CSDN文库

版权申诉

186 浏览量 2025-01-15 10:10:05 上传评论收藏 697KB PDF 举报

展开

资源推荐

资源详情

资源评论

ARTIFICIAL INTELLIGENCE, ALGORITHMIC PRICING AND COLLUSION

†

Emilio Calvano

∗‡

, Giacomo Calzolari

&‡§

Vincenzo Denicol

∗§

, and Sergio Pastorello

∗

December 2019

Increasingly, algorithms are supplanting human decision-makers in

pricing goods and services. To analyze the possible consequences, we

study experimentally the behavior of algorithms powered by Artiﬁcial

Intelligence (Q-learning) in a workhorse oligopoly model of repeated

price competition. We ﬁnd that the algorithms consistently learn to

charge supra-competitive prices, without communicating with one an-

other. The high prices are sustained by collusive strategies with a ﬁnite

phase of punishment followed by a gradual return to cooperation. This

ﬁnding is robust to asymmetries in cost or demand, changes in the num-

ber of players, and various forms of uncertainty.

Keywords: Artiﬁcial Intelligence, Pricing-Algorithms, Collusion, Reinforcement Learn-

ing, Q-Learning.

J.E.L. codes: L41, L13, D43, D83.

†

We are grateful to the Editor, Jeﬀrey Ely, and three anonymous referees from many detailed instruc-

tions for revising the paper. We also thank, without implicating, Susan Athey, Ariel Ezrachi, Joshua

Gans, Joe Harrington, Bruno Jullien, Timo Klein, Kai-Uwe K¨uhn, Patrick Legros, David Levine, Wally

Mullin, Yossi Spiegel, Steve Tadelis, Emanuele Tarantino and participants at numerous conferences and

seminars for useful comments. Financial support from the Digital Chair initiative at the Toulouse School

of Economics is gratefully acknowledged.

Corresponding author: Giacomo Calzolari, giacomo.calzolari@eui.eu

∗

Bologna University;

‡

Toulouse School of Economics;

European University Institute;

CEPR

Electronic copy available at: https://ssrn.com/abstract=3304991

2 E. CALVANO, G. CALZOLARI, V. DENICOL

O, S. PASTORELLO

1. INTRODUCTION

Software programs are increasingly being adopted by ﬁrms to price their goods and ser-

vices, and this tendency is likely to continue.

In this paper, we ask whether pricing

algorithms may “autonomously” learn to collude. The possibility arises because of the

recent evolution of the software, from rule-based to reinforcement learning programs. The

new programs, powered by Artiﬁcial Intelligence (AI), are indeed much more autonomous

than their precursors. They can develop their pricing strategies from scratch, engaging in

active experimentation and adapting to changing environments. In this learning process,

they require little or no external guidance.

In the light of these developments, concerns have been voiced, by scholars and policy-

makers alike, that AI pricing algorithms may raise their prices above the competitive

level in a coordinated fashion, even if they have not been speciﬁcally instructed to do

so and even if they do not communicate with one another.

This form of tacit collusion

would defy current antitrust policy, which typically targets only explicit agreements among

would-be competitors (Harrington, 2018).

But how real is the risk of tacit collusion among algorithms? That is a diﬃcult question to

answer, both empirically and theoretically. On the empirical side, collusion is notoriously

hard to detect from market outcomes,

and ﬁrms typically do not disclose details of the

pricing software they use. On the theoretical side, the interaction among reinforcement-

learning algorithms in pricing games generates stochastic dynamic systems so complex

that analytical results seem currently out of reach.

To make some progress, this paper takes an experimental approach. We construct AI

While revenue management programs have been used for decades in such industries as hotels and

airlines, the diﬀusion of pricing software has boomed with the advent of online marketplaces. For example,

in a sample of over 1,600 best-selling items listed on Amazon, Chen, Mislove and Wilson (2016) ﬁnd that

in 2015 more than a third of the vendors had already automated their pricing. Since then, a repricing-

software industry has arisen, which supplies turnkey pricing systems to smaller vendors and customizes

software for the larger ones. But pricing software is increasingly used also in traditional oﬀ-line sectors

such as gas stations: see e.g. “Why do gas station prices costantly change? Blame the algorithms,” The

Wall Street Journal, May 8, 2017.

For the scholarly debate see, for instance, Ezrachi and Stucke (2016, 2017), Harrington (2018), K¨uhn

and Tadelis (2018) and Schwalbe (2019). As for policy, the possibility of algorithmic collusion has been

extensively discussed, for instance, at the 7th session of the FTC Hearings on competition and consumer

protection (November 2018) and has been the subject of white papers independently issued in 2018 by

the Canadian Competition Bureau and the British Competition and Market Authority.

With very rich data, however, the problem may not be insurmountable (Byrne and De Roos (2019))

One notable theoretical contribution is Salcedo (2015), who argues that optimized algorithms will

inevitably reach a collusive outcome. But this claim hinges crucially on the assumption that each algorithm

can periodically observe and “decode” the others, which in the meantime stay unchanged. The practical

relevance of Salcedo’s result thus remains controversial.

Electronic copy available at: https://ssrn.com/abstract=3304991

AI, ALGORITHMIC PRICING AND COLLUSION 3

pricing agents and let them interact repeatedly in computer-simulated marketplaces. The

challenge of this approach is to choose realistic economic environments, and algorithms

representative of those employed in practice. We discuss in detail how we address these

challenges as we proceed. Any conclusions are necessarily tentative at this stage, but our

ﬁndings do suggest that algorithmic collusion is more than a remote theoretical possibility.

The results indicate that, indeed, relatively simple pricing algorithms systematically learn

to play collusive strategies. The algorithms typically coordinate on prices that are some-

what below the monopoly level but substantially above the static Bertrand equilibrium.

The strategies that support these outcomes crucially involve punishments of defections.

Such punishments are ﬁnite in duration, with a gradual return to the pre-deviation prices.

The algorithms learn these strategies purely by trial and error. They are not designed or

instructed to collude, they do not communicate with one another, and they have no prior

knowledge of the environment in which they operate.

Our baseline model is a symmetric duopoly with deterministic demand, but we conduct

an extensive robustness analysis. The degree of collusion decreases as the number of

competitors rises. However, substantial collusion continues to prevail when the active ﬁrms

are three or four in number. The algorithms display a stubborn propensity to collude even

when they are asymmetric, and when they operate in stochastic environments.

Other papers have simulated reinforcement-learning algorithms in oligopoly, but ours is

the ﬁrst to clearly document the emergence of collusive strategies among autonomous pric-

ing agents. The previous literature in both computer science and economics has focused on

outcomes rather than strategies.

But the observation of supra-competitive prices is not,

per se, genuine proof of collusion. To us economists, collusion is not simply a synonym of

high prices but crucially involves “a reward-punishment scheme designed to provide the

incentives for ﬁrms to consistently price above the competitive level” (Harrington (2018),

p. 336). The reward-punishment scheme ensures that the supra-competitive outcomes may

be obtained in equilibrium and do not result from a failure to optimize.

The diﬀerence is critical. For example, in their pioneering study of repeated Cournot

competition among Q-learning algorithms, computer scientists Waltman and Kaymak

Moreover, the vast majority of the literature does not use the canonical model of collusion, where ﬁrms

play an inﬁnitely repeated game, pricing simultaneously in each stage and conditioning their prices on

past history. Rather, it uses frameworks similar to Maskin and Tirole (1988) model of staggered pricing.

In this model, two ﬁrms alternate in moving, commit to a price level for two periods, and condition

their pricing only on rival’s current price. The postulate of price commitment is however controversial,

as software algorithms can adjust prices very quickly. And probably the postulate is not innocuous.

Commitment may indeed facilitate coordination, as argued theoretically by Maskin and Tirole (1988)

and experimentally by Leufkens and Peeters (2011). At any rate, the best executed paper in this line of

research is probably Klein (2018), which provides also a survey of the earlier literature.

Electronic copy available at: https://ssrn.com/abstract=3304991

4 E. CALVANO, G. CALZOLARI, V. DENICOL

O, S. PASTORELLO

(2008) ﬁnd that the algorithms reduce output, and hence raise prices, with respect to

the Nash equilibrium of the one-shot game.

They refer to this as collusion. When the

algorithms are far-sighted and are able to condition their current choices on past actions,

so that defections can be punished, their ﬁndings could indeed be consistent with collusive

behavior according to economists’ usage of the term. But Waltman and Kaymak consider

also the case where algorithms are myopic and have no memory of past actions – conditions

under which collusion is either unfeasible or cannot emerge in equilibrium – and ﬁnd that

in these cases the output reduction is even larger. This suggests that what they observe

may not be collusion but a failure to learn an optimal strategy.

Verifying whether the high prices are supported by equilibrium strategies is not just a

theoretical curiosity. Algorithms that grossly fail to optimize would, in all likelihood,

be dismissed quickly and thus could hardly become a matter of antitrust concern. The

implications are instead very diﬀerent if, as we show, the supra-competitive prices are set

by optimizing, or quasi-optimizing, programs.

Yet, there is an important caveat to keep in mind. To present a proof-of-concept demon-

stration of algorithmic collusion, in this paper we concentrate on what the algorithms

eventually learn and pay less attention to the speed of learning. Thus, we focus on al-

gorithms that by design learn slowly, in a completely unsupervised fashion, and in our

simulations we allow them to explore widely and interact as many times as is needed to

stabilize their behavior. As a result, the number of repetitions required for completing the

learning is typically high, on the order of hundreds of thousands. In fact, the algorithms

start to raise their prices much earlier. However, the time scale still remains an open issue;

it will be discussed further below.

The rest of the paper is organized as follows. The next section provides a self-contained

description of the class of Q-learning algorithms, which we use in our simulations. Section 3

describes the economic environments where the algorithms operate. Section 4 shows that

collusive outcomes are common and are generated by optimizing, or quasi-optimizing,

behavior. Section 5 then provides a more in-depth analysis of the collusive strategies that

support these outcomes. Section 6 reports on a number of robustness checks. Section 7

discusses the issue of the speed of learning. Section 8 concludes with a brief discussion of

the possible implications for policy.

Other papers that study reinforcement learning algorithms in a Cournot oligopoly include, Kimbrough

and Murphy (2009), and Siallagan et al (2013).

According to Cooper, Homem-de-Mello and Kleywegt (2015) such “collusion by mistake” may some-

times emerge also among revenue management systems that do not condition their current prices on

rivals’ past prices. This may happen in particular when the programs disregard competitors altogether

in the process of demand estimation, which biases the estimated elasticity downwards.

Electronic copy available at: https://ssrn.com/abstract=3304991

AI, ALGORITHMIC PRICING AND COLLUSION 5

2. Q-LEARNING

Following Waltman and Kaymak (2008), we concentrate on Q-learning algorithms. Even

if reinforcement learning comes in many diﬀerent varieties,

there are several reasons

for this choice. First, one would like to experiment with algorithms that are commonly

adopted in practice, and although little is known on the speciﬁc software that ﬁrms

actually use, Q-learning is certainly highly popular among computer scientists. Second,

Q-learning algorithms are simple and can be fully characterized by just a few parameters,

the economic interpretation of which is clear. This makes it possible to keep possibly

arbitrary modeling choices to a minimum, and to conduct a comprehensive comparative

statics analysis with respect to the characteristics of the algorithms. Third, Q-learning

algorithms share the same architecture as the more sophisticated programs that have

recently obtained spectacular successes, achieving superhuman performances in such tasks

as playing the ancient board game Go (Silver et al., 2016), the Atari video-games (Mnih

et al., 2015), and, more recently, chess (Silver et al., 2018).

The downside of Q-learning

is that the learning process is slow, for reasons that will become clear in a moment.

In the rest of this section, we provide a brief introduction to Q-learning. Readers familiar

with this model may proceed directly to section 3.

2.1. Single agent problems

Like all reinforcement-learning algorithms, Q-learning programs adapt their behavior to

past experience, taking actions that have proven successful more frequently and unsuc-

cessful ones less frequently. In this way, they may learn an optimal policy, or a policy that

approximates the optimum, with no prior knowledge of the particular problem at hand.

Originally, Q-learning was proposed by Watkins (1989) to tackle Markov decision pro-

cesses. In a stationary Markov decision process, in each period t = 0, 1, 2, ... an agent

observes a state variable s

∈ S and then chooses an action a

∈ A(s

). For any s

and a

the agent obtains a reward π

, and the system moves on to the next state s

t+1

, according

to a time-invariant (and possibly degenerate) probability distribution F (π

, s

t+1

, a

Q-learning deals with the version of this model where S and A are ﬁnite, and A is not

state-dependent.

For a thorough treatment of reinforcement learning in computer science, see Sutton and Barto (2018).

These more sophisticated programs might appear themselves to be a natural alternative to Q-learning.

However, they require many modeling choices that are somewhat arbitrary from an economic viewpoint.

We shall come back to this issue in Section 7.

Reinforcement learning was introduced in economics by Arthur (1991) and later popularized by Roth

and Erev (1995), Erev and Roth (1998) and Ho, Camerer and Chong (2007), among others.

Electronic copy available at: https://ssrn.com/abstract=3304991

剩余37页未读，继续阅读

评论收藏

内容反馈

版权申诉

#完美解决问题
#运行顺畅
#内容详尽
#全网独家
#注释完整

pk_xz123456

粉丝: 3016
资源: 4322

算法定价与合谋：基于Q-Learning的人工智能算法模拟实验研究及其经济后果

【提供操作视频】基于Q-learning强化学习的H无穷控制器设计matlab仿真

Q-Learning算法模拟环境程序模拟环境.zip

基于Q-learning的改进版强化学习算法

机器人python路径规划-基于Q-learning的机器人路径规划系统（matlab）.pdf

强化学习扫盲贴：从Q-learning到DQN.rar

基于强化学习Q-learning算法的移动机器人路径优化研究：MATLAB实现与性能分析,机器人路径优化：基于强化学习Q-learning算法的移动机器人路径优化MATLAB ,核心关键词：机器人路径

基于Q-Learning强化学习算法走迷宫游戏python源码.zip

基于Qlearning算法最优路径规划算法matlab仿真,同时使用A星算法进行对比+代码操作视频

基于Q-Learning算法实现的论文推荐系统python源码(带数据和论文).zip

RLlib入门与环境搭建+强化学习基础理论+Ray框架概览与RLlib集成+RLlib的算法基础：Q-Learning等全套教程

Q-learning.zip_Q learning_Q-learning 最优_Q-learning算法_Q算法_函数最优值

Q-Learning算法 Matlab代码实现

基于Q-learning的应用算法

"Q-learning算法在机器人路径规划与避障中的实践与模块化编程详解",Q-learning机器人路径规划算法 机器人路径规划，机器人路径避障 求解常见的路径规划问题 内含算法的注释，模块化编

基于K-means算法的光伏时间序列聚类分析与优化调度研究,关键词：光伏聚类 K-means聚类 时间序列 编程语言：matlab 主题：基于k-means算法的光伏时间序列聚类 主要内容： 本

基于强化学习Q-Learning方法实现机器人走迷宫源码.zip

可直接运行 基于MATLAB实现的机器人Q-Learning路径规划算法动态仿真设置起点和终点 动态图形显示 程序源代码.rar

【2024年最新算法研究：基于NRBO-XGBoost回归模型的数据交叉验证及其实践应用】,【24年最新算法】NRBO-XGboost回归交叉验证 基于牛顿-拉夫逊优化算法(NRBO)优化XGBoos

qlearning111_Q-learning_路径规划_matlab

【路径规划】基于强化学习Q-Learing实现栅格地图路径规划matlab源码.zip

基于Q-learning的分布式自适应拓扑稳定性算法.pdf

基于q-高斯分布的自适应变异粒子群算法.pdf

基于python实现Q-Learning算法训练倒立摆控制源码.zip

RBF改进算法的Q-Learing路径规划MATLAB仿真_QlearningMATLAB_qlearning_Q算法_qlea

基于Python平台的强化学习Q-learning算法在分层电力市场中的需求响应动态定价策略研究,代码关键词：需求响应 强化学习 动态定价 编程语言：python平台 主题：16、基于强化学

基于Pytorch实现深度强化学习各种算法python源码+算法介绍(DQN、Q-Learning、Sarsa等14种).zip

基于优化Q-Learning的移动机器人局部路径规划算法研究

基于Q-Learning的论文推荐系统设计（本科毕业设计）.zip

最新资源

"Q-learning算法在机器人路径规划与避障中的实践与模块化编程详解",Q-learning机器人路径规划算法机器人路径规划，机器人路径避障求解常见的路径规划问题内含算法的注释，模块化编

基于K-means算法的光伏时间序列聚类分析与优化调度研究,关键词：光伏聚类 K-means聚类时间序列编程语言：matlab 主题：基于k-means算法的光伏时间序列聚类主要内容：本

可直接运行基于MATLAB实现的机器人Q-Learning路径规划算法动态仿真设置起点和终点动态图形显示程序源代码.rar

【2024年最新算法研究：基于NRBO-XGBoost回归模型的数据交叉验证及其实践应用】,【24年最新算法】NRBO-XGboost回归交叉验证基于牛顿-拉夫逊优化算法(NRBO)优化XGBoos

基于Python平台的强化学习Q-learning算法在分层电力市场中的需求响应动态定价策略研究,代码关键词：需求响应强化学习动态定价编程语言：python平台主题：16、基于强化学