4180 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 35, NO. 6, NOVEMBER 2020
Agent-Based Modeling in Electricity Market Using
Deep Deterministic Policy Gradient Algorithm
Yanchang Liang , Student Member, IEEE, Chunlin Guo , Zhaohao Ding , Member, IEEE, and Huichun Hua
Abstract—Game theoretic methods and simulations based on
reinforcement learning (RL) are often used to analyze electricity
market equilibrium. However, the former is limited to a simple
market environment with complete information, and difficult to
visually reflect the tacit collusion; while the conventional RL al-
gorithm is limited to low-dimensional discrete state and action
spaces, and the convergence is unstable. To address the afore-
mentioned problems, this paper adopts deep deterministic policy
gradient (DDPG) algorithm to model the bidding strategies of
generation companies (GenCos). Simulation experiments, includ-
ing different settings of GenCo, load and network, demonstrate
that the proposed method is more accurate than conventional RL
algorithm, and can converge to the Nash equilibrium of com-
plete information even in the incomplete information environment.
Moreover, the proposed method can intuitively reflect the different
tacit collusion level by quantitatively adjusting GenCos’ patience
parameter, which can be an effective means to analyze market
strategies.
Index Terms—Electricity market, Nash equilibrium, deep
reinforcement learning (DRL), deep deterministic policy gradient
(DDPG), game theory, tacit collusion.
NOMENCLATURE
A. Indices and Sets
D Set of loads indexed by d.
D
i
Set of loads at bus i, D
i
⊆D.
G Set of GenCos {1, 2,...,G} indexed by g.
G
i
Set of GenCos at bus i, G
i
⊆G.
I Set of buses {1, 2,...,I} indexed by i.
t Index for operation intervals.
B. Parameters
α
m
g
Intercept of marginal cost function of GenCo g.
β
m
g
Slope of marginal cost function of GenCo g.
F Vector of the maximum lines flow limits.
Manuscript received May 17, 2019; revised September 18, 2019, December
13, 2019, and March 28, 2020; accepted May 30, 2020. Date of publication June
2, 2020; date of current version November 4, 2020. This work was supported in
part by the National Natural Science Foundation of China under Grant 51907063,
in part by the Fundamental Research Funds for the Central Universities under
Grant 2019MS054, and in part by the Support Program for the Excellent
Talents in Beijing City under Grant X19048. Paper no. TPWRS-00684-2019.
(Corresponding author: Chunlin Guo.)
The authors are with the State Key Laboratory of Alternate Electrical Power
System with Renewable EnergySources, North China Electric Power University,
Beijing 102206, China (e-mail: liangyancang@gmail.com; gcl@ncepu.edu.cn;
zhaohao.ding@ncepu.edu.cn; huahuichun@126.com).
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TPWRS.2020.2999536
γ Discount factor.
PTDF Matrix of power transfer distribution factor.
τ Soft update rate.
θ
μ
Parameter consisting of the weights and biases of
the network μ.
θ
Q
Parameter consisting of the weights and biases of
the network Q.
f
d
Slope of demand curve of load d.
lr
a
Learning rate of actor network.
lr
c
Learning rate of critic network.
N Size of a mini-batch.
p
max
g
Maximum power generation of GenCo g.
p
min
g
Minimum power generation of GenCo g.
T Total number of operation intervals.
C. Variables
α
gt
Intercept of the strategic supply function of GenCo
g at t, also called strategic variable.
α
∗
gt
GenCo g’s strategic variable in the NE of static
game Γ
t
.
¯
λ
t
Average nodal price at t.
p
t
Power generation vector for all buses at t.
q
t
Power demand vector for all buses at t.
Γ
t
Static game at t.
λ
it
Nodal price of bus i at t.
a
gt
Action variable of GenCo g at t.
D
max
dt
Maximum load demand of load d at t.
D
Σ
t
Total maximum load demand at t.
n
gt
Action noise of GenCo g at t.
q
dt
Power demand of load d at t.
R
g
Cumulative payoff of GenCo g.
r
gt
Payoff of GenCo g at t.
s
gt
State variable of GenCo g at t.
D. Functions
μ(s) Policy function.
μ(s, a|θ
μ
) Policy function approximated by the neural net-
work with parameter θ
μ
.
ρ
m
g
(p
gt
) Marginal cost function of GenCo g.
ρ
dt
(q
dt
) Demand function of load d at t.
ρ
gt
(p
gt
) Strategic supply function of GenCo g at t.
C
m
g
(p
gt
) Cost function of GenCo g.
J(θ
μ
) Objective function of parameter θ
μ
.
L(θ
Q
) Loss function of parameter θ
Q
.
Q(s, a|θ
Q
) Action-value function approximated by the neural
network with parameter θ
Q
.
Q(s, a) Action-value function.
0885-8950 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Hohai University Library. Downloaded on December 06,2020 at 07:46:23 UTC from IEEE Xplore. Restrictions apply.
- 1
- 2
前往页