142号资源-源程序：python程序基于DDPG算法的发电公司竞价策略研究本人博客有解读资源-CSDN文库

共34个文件

py：20个

png：6个

pyc：6个

版权申诉

python

5星 · 超过95%的资源 164 浏览量 2024-08-07 18:47:16 上传评论 2 收藏 3.38MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

191-python程序基于DDPG算法的发电公司竞价策略研究.rar （34个子文件）

191-python程序基于DDPG算法的发电公司竞价策略研究

基于DDPG算法的发电公司竞价策略研究

09106862.pdf 1.99MB

使用说明.txt 2KB

algorithm

__init__.py 0B

DDPG.py 2KB

QLearning.py 2KB

model.py 2KB

VRE.py 2KB

run

DDPG_3_bus.png 492KB

run_DDPG_30-bus.py 2KB

DDPG_30-bus.png 895KB

run_Q-Learning_3-bus.py 1KB

algorithm

__init__.py 0B

DDPG.py 2KB

QLearning.py 2KB

model.py 2KB

VRE.py 2KB

__pycache__

__init__.cpython-37.pyc 188B

model.cpython-37.pyc 3KB

DDPG.cpython-37.pyc 2KB

run_VRE_3-bus.py 1KB

run_DDPG_3-bus.py 2KB

market

__init__.py 0B

thirty_bus.py 5KB

three_bus.py 2KB

__pycache__

three_bus.cpython-37.pyc 2KB

thirty_bus.cpython-37.pyc 4KB

__init__.cpython-37.pyc 229B

results

DDPG_3_bus.png 84KB

Q-Learning.png 60KB

DDPG_30-bus.png 95KB

VRE.png 28KB

market

__init__.py 0B

thirty_bus.py 5KB

three_bus.py 2KB

4180 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 35, NO. 6, NOVEMBER 2020

Agent-Based Modeling in Electricity Market Using

Deep Deterministic Policy Gradient Algorithm

Yanchang Liang , Student Member, IEEE, Chunlin Guo , Zhaohao Ding , Member, IEEE, and Huichun Hua

Abstract—Game theoretic methods and simulations based on

reinforcement learning (RL) are often used to analyze electricity

market equilibrium. However, the former is limited to a simple

market environment with complete information, and difﬁcult to

visually reﬂect the tacit collusion; while the conventional RL al-

gorithm is limited to low-dimensional discrete state and action

spaces, and the convergence is unstable. To address the afore-

mentioned problems, this paper adopts deep deterministic policy

gradient (DDPG) algorithm to model the bidding strategies of

generation companies (GenCos). Simulation experiments, includ-

ing different settings of GenCo, load and network, demonstrate

that the proposed method is more accurate than conventional RL

algorithm, and can converge to the Nash equilibrium of com-

plete information even in the incomplete information environment.

Moreover, the proposed method can intuitively reﬂect the different

tacit collusion level by quantitatively adjusting GenCos’ patience

parameter, which can be an effective means to analyze market

strategies.

Index Terms—Electricity market, Nash equilibrium, deep

reinforcement learning (DRL), deep deterministic policy gradient

(DDPG), game theory, tacit collusion.

NOMENCLATURE

A. Indices and Sets

D Set of loads indexed by d.

Set of loads at bus i, D

⊆D.

G Set of GenCos {1, 2,...,G} indexed by g.

Set of GenCos at bus i, G

⊆G.

I Set of buses {1, 2,...,I} indexed by i.

t Index for operation intervals.

B. Parameters

Intercept of marginal cost function of GenCo g.

Slope of marginal cost function of GenCo g.

F Vector of the maximum lines ﬂow limits.

Manuscript received May 17, 2019; revised September 18, 2019, December

13, 2019, and March 28, 2020; accepted May 30, 2020. Date of publication June

2, 2020; date of current version November 4, 2020. This work was supported in

part by the National Natural Science Foundation of China under Grant 51907063,

in part by the Fundamental Research Funds for the Central Universities under

Grant 2019MS054, and in part by the Support Program for the Excellent

Talents in Beijing City under Grant X19048. Paper no. TPWRS-00684-2019.

(Corresponding author: Chunlin Guo.)

The authors are with the State Key Laboratory of Alternate Electrical Power

System with Renewable EnergySources, North China Electric Power University,

Beijing 102206, China (e-mail: liangyancang@gmail.com; gcl@ncepu.edu.cn;

zhaohao.ding@ncepu.edu.cn; huahuichun@126.com).

Color versions of one or more of the ﬁgures in this article are available online

at https://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TPWRS.2020.2999536

γ Discount factor.

PTDF Matrix of power transfer distribution factor.

τ Soft update rate.

Parameter consisting of the weights and biases of

the network μ.

Parameter consisting of the weights and biases of

the network Q.

Slope of demand curve of load d.

Learning rate of actor network.

Learning rate of critic network.

N Size of a mini-batch.

max

Maximum power generation of GenCo g.

min

Minimum power generation of GenCo g.

T Total number of operation intervals.

C. Variables

Intercept of the strategic supply function of GenCo

g at t, also called strategic variable.

∗

GenCo g’s strategic variable in the NE of static

game Γ

Average nodal price at t.

Power generation vector for all buses at t.

Power demand vector for all buses at t.

Static game at t.

Nodal price of bus i at t.

Action variable of GenCo g at t.

max

Maximum load demand of load d at t.

Total maximum load demand at t.

Action noise of GenCo g at t.

Power demand of load d at t.

Cumulative payoff of GenCo g.

Payoff of GenCo g at t.

State variable of GenCo g at t.

D. Functions

μ(s) Policy function.

μ(s, a|θ

) Policy function approximated by the neural net-

work with parameter θ

) Marginal cost function of GenCo g.

) Demand function of load d at t.

) Strategic supply function of GenCo g at t.

) Cost function of GenCo g.

J(θ

) Objective function of parameter θ

L(θ

) Loss function of parameter θ

Q(s, a|θ

) Action-value function approximated by the neural

network with parameter θ

Q(s, a) Action-value function.

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Hohai University Library. Downloaded on December 06,2020 at 07:46:23 UTC from IEEE Xplore. Restrictions apply.

LIANG et al.: AGENT-BASED MODELING IN ELECTRICITY MARKET USING DEEP DETERMINISTIC POLICY GRADIENT ALGORITHM 4181

I. INTRODUCTION

ASH equilibrium (NE) is an effective tool for analyzing

trading trends, assisting electricity market design and reg-

ulation [1]. In the deregulated electric wholesale markets, inde-

pendent system operators (ISOs) use auction mechanism to clear

market based on supply and demand bids submitted by market

participants. During the clearing process, market participant,

such as generation company (GenCo), independently makes

strategic decisions without the information of other competitors,

which is a static game of incomplete information.

To solve the NE of this static game, it is usually assumed

that the information is complete and game theoretic method

is typically adopted as solving method. The Supply Func-

tion Equilibrium (SFE) model [2] which represents GenCos’

competitive behavior, and the computational optimization tools

such as Mathematical Programs with Equilibrium Constraints

(MPEC) and Equilibrium Problem with Equilibrium Constraints

(EPEC) are widely used in the electricity market [3]. Most

of these methods model the market as a single-stage game.

However, in electricity markets that run daily or hourly, static

game models may not adequately reﬂect the behaving charac-

teristics of market participants. In empirical studies, GenCos’

behavior is more similar to tacit collusion than to the NE of

static game [4]. In tacit collusion, there is no explicit agreement

between GenCos, and the rise in market prices is simply due

to each GenCo’s perception of market clearing clearance [5].

In fact, in the case of game-theoretic modeling we observe that

in an inﬁnitely repeated static game, one set of circumstances

can support many different collusive outcomes, which is often

referred in literature as Folk Theorem [6]. Although the Folk

Theorem supports the existence of tacit collusion, but it cannot

describe the level and characteristics of collusion. In addition,

Folk Theorem is still limited to games of complete information,

and cannot strictly reﬂect electricity market with incomplete

information.

As an alternative to the game theoretic approach, market

simulation based on multi-agent system (MAS) has received

more attention because it is low-cost, repeatable, and can re-

ﬂect market dynamics with incomplete information. Reinforce-

ment Learning (RL) is a type of machine learning technique

commonly used in MAS that enables an agent to learn in

an interactive environment by trial and error using feedback

from their own actions and experiences. The commonly used

RL algorithms in the electricity market are Roth-Erev (RE)

learning [7], Q-Learning [8] algorithms and their variants, which

store the estimated value of all actions or state-action pairs in

a table and interact with the environment to update the table.

Tabular method makes them suitable only for low-dimensional,

discrete state and action spaces, and do not easily converge to

optimal behavior. For example, 10 experiments were repeated

using the variant Roth-Erev (VRE) algorithm [9] in the same

market environment in [10], but each time the results were

different, and the standard deviation of the results had reached

30% of the average price. The AMES framework [11] based

on the VRE algorithm was used to simulate market equilibrium

in [12], but the result deviated from the real NE, and it was found

that Q-Learning algorithm has stronger exploration ability than

VRE. Q-Learning was used to evaluate the proposed market

power mitigation rules for the California electricity market [13],

but it was also mentioned that if there are too many decision

variables, the Q-Learning model will suffer from the curse of

dimensionality.

In order to address the shortcomings of the tabular method in

traditional RL, Mnih et al. combined the deep neural network

(DNN) with traditional Q-Learning algorithm to propose a deep

Q network (DQN) model [14], [15], which is a pioneering

work in the ﬁeld of deep RL (DRL). Experiments show that

the DQN-based agent exhibits a competitive level comparable

to that of a human player in solving complex problems such

as Atari 2600 games [15]. Recent years, DQN algorithm has

been applied to model-free optimization and control in complex

environments of power systems, such as electric vehicle charg-

ing scheduling [16], online building energy optimization [17],

microgrid energy management [18], short-term voltage con-

trol [19], and adaptive power system emergency control [20].

Although the DQN algorithm can better deal with problems in

high-dimensional continuous state space, its action space is still

required to be discrete. In order to solve the problem in the

continuous action space, Lillicrap et al. [21] used the idea of

DQN extended Q-Learning to transform the deterministic policy

gradient (DPG) [22], and proposed deep deterministic policy

gradient (DDPG) algorithm. Recently, the DDPG algorithm was

used to solve the joint bidding and pricing problem for a load

service entity [23], and to model the strategic bidding of a

market participant considering the physical non-convex oper-

ational characteristics [24]. However, our paper uses multiple

DDPG-based agents to model the competition of GenCos and

analyze market equilibrium.

This paper aims to address the limitations of previous methods

on market equilibrium modeling. For instance, game theoretic

method is generally limited to solving the NE of complete

information static game. Although traditional RL algorithms can

dynamically simulate repeated game of incomplete information,

they are limited to low-dimensional discrete state/action space,

and the convergence results are unstable. Considering all those

aforementioned factors, the DDPG algorithm is used to model

GenCo agents, which uses DNN to improve performance and

avoid discretization of state/action space. The proposed method

was used to simulate several market scenarios, including dif-

ferent setting of patience characteristics of GenCos, different

numbers of GenCos and time-varying loads. The simulation

results demonstrate the effectiveness of proposed method by

comparing with prevalent game theoretic methods and tradi-

tional RL approaches.

To summarize, the main contributions of this paper are sum-

marized as follows:

1) An electricity market simulation model based on DDPG

algorithm is proposed. The employment of DNN enhances

the performance of proposed model on processing high-

dimensional continuous data which avoids the discretiza-

tion of state/action space.

2) The accuracy and stability of agent-based modeling is

Authorized licensed use limited to: Hohai University Library. Downloaded on December 06,2020 at 07:46:23 UTC from IEEE Xplore. Restrictions apply.

4182 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 35, NO. 6, NOVEMBER 2020

signiﬁcantly improved. Experiments demonstrate that the

proposed model can converge to the NE of complete infor-

mation even in an incomplete information environment.

3) A method of analyzing market power has been proposed.

The proposed model can accurately simulate different

bidding levels by quantitatively adjusting the patience of

the agent, which can be used to characterize the degree of

competition in the market and analyze the potential market

power.

The rest of this paper is constructed as follows. Section II

describes GenCos’ bidding procedure and ISO market clear-

ing model. Section III describes two models of game theory

to analyze market: static game and inﬁnitely repeated game.

Section IV presents a GenCo agent model based on DDPG

algorithm, including its structure and learning scheme. Section V

uses several different methods to analyze market equilibrium

in a simple environment. The simulation results of proposed

method in more complex market environments are described and

discussed in Section VI. Section VII provides the conclusion and

discusses the future work.

II. E

LECTRICITY MARKET STRUCTURE

A. GenCos Bidding Procedure

In this paper we adopt the SFE model for GenCos’ bidding

procedure. The cost function of GenCo g is modeled as a

quadratic function of its output power:

)=α

min

≤ p

max

, ∀g ∈G (1)

where p

is the output power at time interval t, p

min

and

max

are minimum/maximum offered power generation limits,

respectively and G is the set of GenCos. The marginal cost of

GenCo g is therefore a linear function of the output power:

)=α

+ β

(2)

where α

and β

are the intercept and slope of marginal cost

function, respectively.

At time interval t, each GenCo g will submit a supply offer to

ISO. We use intercept-parameterization [25] to model the bid-

ding strategy of GenCos, and the supply offer can be formulated

)=α

+ β

,α

∈A

(3)

where α

is the intercept of supply function and is referred to

as strategic variable, which is assigned by GenCo g in strategy

space A

and can be deviated from α

to exert market power.

The slope of supply function is kept equal to β

It is worth noting that there are other alternatives to model

GenCo’s strategy, e.g., only the slope of supply function can be

modiﬁed but the intercept is equal to α

, or both the intercept

and slope can be arbitrarily assigned.

B. ISO Market Clearing Model

At time interval t, the electricity consumer at load d is modeled

with a linear demand curve:

)=f

· (q

− D

max

) (4)

where f

is the slope and does not change with time, q

is the

demand quantities, D

max

is the maximum load demand at time

interval t, and D is the set of loads. The total maximum load

demand for all loads is



d∈D

max

(5)

Under the condition of satisfying node power balance, branch

ﬂow constraint and generator output constraint, ISO clears the

market with objective of maximizing total social beneﬁts. Mar-

ket clearing at time interval t can be formulated with a DC power

ﬂow model as

max



d∈D



− f

max



−



g∈G





s.t.



g∈G

−



d∈D

− F ≤ PTDF(p

− q

) ≤ F

min

≤ p

max

, ∀g ∈G

(6)

where p

and q

are power generation and demand vector for all

buses, which are linear combinations of p

and q

, r espectively,

PTDF is the matrix of power transfer distribution factor, and

F is the vector of maximum lines ﬂow limits.

At this time interval, the payoff of each GenCo g is

= λ

− (α

),g∈G

(7)

where λ

is the nodal price at bus i, which can be calculated

from the Lagrange multipliers corresponding to constraints in

(6), and g ∈G

indicates that GenCo g is located at bus i.

III. G

AME THEORETIC METHOD

From the perspective of game theory, there are two types

of models, static game and inﬁnitely repeated game, that have

been widely adopted in the existing literatures for analyzing

electricity market. The static game model focuses on how to

solve the NE, while the inﬁnitely repeated game model focuses

on the existence of tacit collusion in the market.

A. Static Games

The market opration, as described in Section II, can be seen as

a static game Γ

with a set of players (GenCos) G, strategy space

,...,A

and a vector of all GenCos’ payoffs (r

,...,r

In game Γ

, each GenCo must choose the strategic parameter

to maximize payoff by considering the ISO market clearing

problem, which is a MPEC model, which can be formulated as a

Authorized licensed use limited to: Hohai University Library. Downloaded on December 06,2020 at 07:46:23 UTC from IEEE Xplore. Restrictions apply.

LIANG et al.: AGENT-BASED MODELING IN ELECTRICITY MARKET USING DEEP DETERMINISTIC POLICY GRADIENT ALGORITHM 4183

bilevel optimization problem. Considering that the ISO market

clearing problem (6) is a convex quadratic programming prob-

lem, its corresponding Karush-Kuhn-Tucker (KKT) conditions

are equivalent to the global optimal of the original problem.

Therefore, the MPEC faced by each GenCo g can be formulated

as:

max

= λ

− (α

)

s.t.α

∈ [0, 3α

]

the KKT conditions in (6)

(8)

Many methods can be used to solve MPEC, such as sequential

quadratic programming [26], interior point method [27], branch

and bound method [28], and particle swarm optimization [29].

This paper uses the sequence quadratic programming method to

solve MPEC.

Simultaneous solution of all GenCos’ MPECs forms an

EPEC, which can be solved using diagonalization solution

method: the MPEC problem faced by each GenCo is solved

alternately (other GenCos’ α

are ﬁxed) until an approxima-

tion equilibrium point is found. The solution process of EPEC

assumes that the information is complete, because each GenCo

knows other GenCos’ α

when selecting its own α

. Therefore,

the solution of EPEC is the NE of the game Γ

with complete

information, which is recorded as (α

∗

,...,α

∗

B. Inﬁnitely Repeated Games

Repeated auctions in the electricity market can be modeled as

an inﬁnite sequence of static games Γ

, Γ

,...with discount

factor γ shared by GenCos. For each GenCo g, the present value

of payoff sequence r

,...is

+ γr

+ γ

+ ··· =

∞



t=1

t−1

(9)

where, discount factor γ ∈ [0, 1] reﬂects the time value of

money [30]. The future payoff will be more emphasized if γ

is closer to 1, which means the GenCo is more patient in the

game.

If the static game Γ

for each time interval is the same

(Γ=Γ

=Γ

= ···), the game sequence Γ

, Γ

,...

is called an inﬁnitely repeated game Γ(∞,γ), in which Γ is also

called the stage game.

The Folk Theorem [6] suggests that, in an inﬁnitely repeated

game, when players are sufﬁciently patient (γ → 1), they can

earn higher payoff than in a single-stage game. In fact, any

feasible cooperation of the player is the NE of inﬁnitely repeated

games with γ → 1 [31].

Gibbons [30] analyzed the inﬁnitely repeated Prisoner’s

Dilemma with the assumption that the players used grim-trigger

strategy. Initially, a player using grim-trigger will cooperate (stay

silent), but as soon as the opponent defects (betray), the player

using grim-trigger will defect for the remainder of the repeated

game. If both players adopt this grim-trigger strategy then the

outcome of the inﬁnitely repeated game will be cooperate with

each other in every stage. Gibbons concludes that it is a NE for

all the players to play grim-trigger strategy if and only if γ ≥ γ

where

=max

− r

∗

(10)

where r

is player g’s payoff when both players stay silent, r

is the payoff from the player g’s unilateral betrayal, and r

∗

player g’s payoff when both players choose to betray.

However, in the electricity market, GenCo’s action space is

continuous, unlike the Prisoner’s Dilemma, which has only two

actions (stay silent and betray). There are also many different

levels of tacit collusion in the market. So it is not easy to analyt-

ically calculate the critical value γ

. Therefore, we only consider

the special case of γ =0when calculating the NE using the game

theoretic method. γ =0means that the GenCos only care about

the payoff of the static game at the current stage, which is also

true when the static games at each stage are different. Therefore,

the NE of the static game sequence Γ

, Γ

,... with γ =0

is (α

∗

1,1

,...,α

∗

), (α

∗

1,2

,...,α

∗

), (α

∗

1,3

,...,α

∗

), ···.

IV. G

ENCO AGENT MODEL

Different from the requirement of complete i nformation in

game theoretic method, the agent-based simulation method

models the market as a partially observable Markov decision pro-

cess (POMDP) and solves it using RL algorithm. The POMDP

is described as follows: At each time interval t, each GenCo

agent g receives an observable state s

, takes an action a

and

receives a scalar payoff r

. Each agent g aims to maximize

its own cumulative payoff R



t=1

t−1

, where T is the

total number of time intervals.

We interpret the above variables in combination with market

mechanisms: 1) State s

: We take the nodal prices of previous

time interval and the total load demand of current time interval

as state variables:

=(λ

1,t−1

, λ

2,t−1

,...,λ

I,t−1

) (10)

2) Action a

: This action a

is generated by the GenCo agent

g, which is essentially the strategic variable α

in this paper,

but their range may be different and needs to be further scaled.

3) Payoff r

: It is often referred to as reward in RL ﬁeld.

In this paper, we assume that the GenCos are rational market

participants who only consider their own payoffs, so we make

payoffs as rewards.

A. DDPG Algorithm

DDPG is an actor-critic, model-free algorithm based on the

deterministic policy gradient that can operate in continuous state

and action space [21]. The actor-critic algorithm consists of a

policy function and an action-value function: the policy function

acts as an actor, generating actions and interacting with the

environment; the action-value function acts as a critic, which

evaluates the performance of the actor and guides the follow-up

of the actor.

DDPG algorithm uses DNNs to establish two approximation

functions of the actor-critic algorithm: the actor network can be

Authorized licensed use limited to: Hohai University Library. Downloaded on December 06,2020 at 07:46:23 UTC from IEEE Xplore. Restrictions apply.

评论收藏

内容反馈

版权申诉

灿灿咬肉包

2024-08-27

实在是宝藏资源、宝藏分享者！感谢大佬~
sosnyf

2024-12-06

发现一个超赞的资源，赶紧学习起来，大家一起进步，支持！
皮皮*

2024-10-18

感谢大佬分享的资源给了我灵感，果断支持！感谢分享~
lijian00563

2024-08-23

简直是宝藏资源，实用价值很高，支持！
2401_85692312

2024-09-25

内容与描述一致，超赞的资源，值得借鉴的内容很多，支持！

前往

页

电网论文源程序

粉丝: 1w+
资源: 384

142号资源-源程序：python程序基于DDPG算法的发电公司竞价策略研究本人博客有解读

基于DDPG算法的发电公司竞价策略深度强化学习研究：不完全信息环境下的纳什均衡探索,python代码：基于DDPG（深度确定性梯度策略）算法的电公司竞价策略研究 关键词：DDPG 算法 深度强化学习

基于DDPG算法的电力市场竞争策略研究：售电公司竞价策略探索与实践,基于DDPG算法的电力市场竞争策略研究：售电公司竞价策略探索与实践,python代码：基于DDPG（深度确定性梯度策略）算法的电公司

python代码：基于DDPG（深度确定性梯度策略）算法的电公司竞价策略研究 关键词：DDPG 算法 深度强化学习 电力市场 发电商 竞价 说明文档：完美复现英文文档，可找我看文档 主要内容

python代码：基于DDPG（深度确定性梯度策略）算法的售电公司竞价策略研究 关键词：DDPG 算法 深度强化学习 电力市场

python代码：基于DDPG（深度确定性梯度策略）算法的售电公司竞价策略研究（csdn）————程序.pdf

174号资源-源程序：卡尔曼滤波器的MPC汽车控制器（python）-本人博客有解读

多策略融合：基于改进蜣螂优化算法的MSADBO-CNN-BiGRU模型与Python代码实现，支持数据回归预测与模型参数优化,**多策略改进的MSADBO-CNN-BiGRU模型：Python代码实现

175号资源-源程序：附带文档-MPC模型预测控制从原理到代码实现-本人博客有解读

基于DDPG-PID方法的水下机器人姿态控制python程序.rar

基于DDPG和PPO的深度强化学习在自动驾驶策略中的应用及Python实验成果报告,基于DDPG与PPO深度强化学习的自动驾驶策略研究：Python实验结果与报告分析,基于深度强化学习的自动驾驶策略

深度强化学习下的混合动力汽车能量管理策略：结合DQN与DDPG算法实现与优化,基于深度强化学习算法的混合动力汽车能量管理策略研究：结合DQN与DDPG算法实现优化控制,基于深度强化学习的混合动力汽车能

文献驱动的滑膜无人船艇轨迹跟踪与智能控制研究：基于Python DDPG算法与多策略融合的动态面控制方法 ,文献驱动的深度强化学习在无人船艇轨迹跟踪中的综合应用：结合DDPG、MPC与DDQN方法的研

基于DDPG强化学习算法的水下机器人姿态控制python代码.rar

基于深度强化学习的微电网能量管理策略优化-双深度期望Q网络算法的应用与实践,python代码-基于深度期望Q网络算法的微电网能量管理策略-002 关键词：光伏发电、微电网能量管理、深度强化学习、双深

基于强化学习算法Q-learning的水库优化调度研究-探索Python代码实现与优化策略,强化学习驱动的水库优化调度策略研究：基于Q-learning算法的智能决策与实施路径,python代码-基

python程序设计与算法基础教程-概述.pdf

opencv优质资源：OpenCV算法精解：基于Python与C

基于强化学习（DDPG）的机器人导航算法实现python源码+数据集.zip

深度强化学习信号灯控制智能体-基于DDPG算法来解决交通信号灯控制问题python源码+模型.zip

Deep-learning-for-a-robot-arm:BSc毕业项目-DDPG算法，用于解决到达和跟踪线问题

深度强化学习下的混合动力汽车能量管理策略：DQN与DDPG算法应用及Python编程实践,基于深度强化学习的混合动力汽车能量管理策略，包含DQN和DDPG两个算法 基于Python编程 ,核心关

基于python+django的基于RSA加密算法软件的研究设计的实现.zip

IF-IDF算法(Python实现)

Python毕业设计-基于python+django协同过滤算法的电影推荐系统源码+论文

基于python聚类算法的实现-包含：最大最小距离算法、近邻聚类算法、层次聚类算法、K-均值聚类算法、ISODATA聚类算法

pytorch-ddpg-naf:实现连续控制算法（DDPG和NAF）

应用DDPG实现无人机追击任务研究（Python代码实现）

基于DDPG算法的Python炒股强化学习设计源码

Django Web开发指南-源代码(python3.4+django1.7.1)

最新资源

基于DDPG算法的发电公司竞价策略深度强化学习研究：不完全信息环境下的纳什均衡探索,python代码：基于DDPG（深度确定性梯度策略）算法的电公司竞价策略研究关键词：DDPG 算法深度强化学习

python代码：基于DDPG（深度确定性梯度策略）算法的电公司竞价策略研究关键词：DDPG 算法深度强化学习电力市场发电商竞价说明文档：完美复现英文文档，可找我看文档主要内容

python代码：基于DDPG（深度确定性梯度策略）算法的售电公司竞价策略研究关键词：DDPG 算法深度强化学习电力市场

深度强化学习下的混合动力汽车能量管理策略：DQN与DDPG算法应用及Python编程实践,基于深度强化学习的混合动力汽车能量管理策略，包含DQN和DDPG两个算法基于Python编程 ,核心关