【免费】腾讯王者荣耀人工智能开放研究环境-论文资源-CSDN文库

需积分: 0 124 浏览量 2024-05-07 11:19:50 上传评论收藏 14.6MB PDF 举报

王者荣耀人工智能开放研究环境是基于MOBA游戏「王者荣耀」的游戏测试环境，其复杂性为游戏AI提供了理想的实验场。AI研究者在签署承诺函申请许可并通过审批后，可下载使用Gamecore Server与测试环境进行交互，在本地环境进行探索和研究。具体使用方法请查阅开源代码及论文。参考来源：https://aiarena.tencent.com/aiarena/zh/open-gamecore ### 腾讯王者荣耀人工智能开放研究环境概览 #### 一、引言近年来，随着人工智能技术的迅猛发展，特别是深度学习与强化学习等领域的突破，游戏成为了衡量人工智能能力的重要平台之一。从早期的国际象棋到后来的Atari游戏乃至围棋等，每一次进步都标志着人工智能领域的一大步。2016年，AlphaGo战胜世界围棋冠军，更是将这一趋势推向了高潮。在此背景下，《王者荣耀》作为全球最受欢迎的游戏之一，也成为了人工智能研究的新试验田。 #### 二、王者荣耀人工智能开放研究环境简介 **王者荣耀人工智能开放研究环境**(简称Honor of Kings Arena或HoK Arena)是由腾讯AI Lab推出的一款基于《王者荣耀》(Honor of Kings)的游戏测试平台。该平台旨在为研究者提供一个复杂多变的游戏环境，以促进竞技型强化学习的研究与发展。 #### 三、环境特点 1. **多智能体问题**: HoK Arena是一个典型的多智能体问题，其中每个代理都需要与其他对手竞争。 2. **通用化挑战**: 游戏中的英雄角色多样且各有特点，这就要求智能体具备一定的通用化能力，能够根据不同情况灵活调整策略。 3. **观察、行动与奖励**: HoK Arena定义了特定的观察空间、行动空间以及奖励机制，以确保智能体能够在复杂环境中学习有效的策略。 4. **多样性任务**: 提供了二十种不同的英雄角色，每种英雄都有自己的任务和挑战，这增加了环境的多样性和难度。 #### 四、使用方法与资源 - **许可申请**: 研究者需要先签署一份承诺书并获得授权才能下载Gamecore Server及相关测试环境。 - **本地环境搭建**: 在获得许可后，研究者可以在本地环境中安装必要的软件，并通过Python接口与游戏引擎进行交互。 - **开源代码**: 所有相关的软件，包括环境类等均公开发布于GitHub上，方便研究者下载使用。 #### 五、研究成果与挑战 - **基准结果**: 平台提供了一些基于强化学习的方法作为基准线，这些方法在合理计算资源下已经取得了初步的成功。 - **通用化挑战**: HoK Arena提出了新的通用化挑战，即如何使智能体面对不同类型的英雄和对手时都能表现出色。这涉及到模型的泛化能力和适应性等方面。 - **解决方案探讨**: 论文中还探讨了一些可能的解决方案和技术手段，如迁移学习、元学习等，来应对这些挑战。 #### 六、结论王者荣耀人工智能开放研究环境为竞技型强化学习的研究提供了重要的支持。它不仅推动了多智能体系统的理论发展，也为实际应用中的策略优化等问题提供了新的思路和方向。对于希望在这个领域有所作为的研究者来说，这是一个不可多得的机会。此外，该平台还为研究人员提供了一个开放共享的社区，使得大家可以相互交流经验、分享成果，共同推动人工智能技术的发展。未来，随着更多研究者的加入和技术的进步，相信HoK Arena将会在人工智能领域发挥更加重要的作用。

资源推荐

资源详情

资源评论

Honor of Kings Arena: an Environment for

Generalization in Competitive Reinforcement

Learning

Hua Wei

∗†[

, Jingxiao Chen

∗‡[

, Xiyang Ji

∗§

, Hongyang Qin

, Minwen Deng

, Siqin Li

Liang Wang

, Weinan Zhang

‡

, Yong Yu

‡

, Lin Liu

, Lanxiao Huang

Deheng Ye

§B

, Qiang Fu

, Wei Yang

Tencent AI Lab,

Tencent Timi Studio,

†

New Jersey Institute of Technology,

‡

Shanghai Jiao Tong University

hua.wei@njit.edu, timemachine@sjtu.edu.cn, wnzhang@sjtu.edu.cn, yyu@apex.sjtu.edu.cn,

{xiyangji, hongyangqin, danierdeng, gracesqli, enginewang, lincliu, jackiehuang,

dericye, leonfu, willyang}@tencent.com

Abstract

This paper introduces Honor of Kings Arena, a reinforcement learning (RL) envi-

ronment based on Honor of Kings, one of the world’s most popular games at present.

Compared to other environments studied in most previous work, ours presents new

generalization challenges for competitive reinforcement learning. It is a multi-

agent problem with one agent competing against its opponent; and it requires the

generalization ability as it has diverse targets to control and diverse opponents to

compete with. We describe the observation, action, and reward speciﬁcations for

the Honor of Kings domain and provide an open-source Python-based interface

for communicating with the game engine. We provide twenty target heroes with

a variety of tasks in Honor of Kings Arena and present initial baseline results for

RL-based methods with feasible computing resources. Finally, we showcase the

generalization challenges imposed by Honor of Kings Arena and possible reme-

dies to the challenges. All of the software, including the environment-class, are

publicly available at:

https://github.com/tencent-ailab/hok_env

. The

documentation is available at: https://aiarena.tencent.com/hok/doc/.

1 Introduction

Games have been used as testbeds to measure AI capabilities in the past few decades, from backgam-

mon [

] to chess [

] and Atari games [

]. In 2016, AlphaGo defeated the world champion through

deep reinforcement learning and Monte Carlo tree search [

]. In recent years, reinforcement learning

models have brought huge advancements in robot control [

], autonomous driving [

], and video

games like StarCraft [23], Dota [1], Minecraft [7] and Honor of Kings [26, 28, 29].

Related to previous AI milestones, the research focus of game AI has shifted from board games

to more complex games, such as imperfect information poker games [

] and real-time strategic

games [

]. As a sub-genre of real-time strategic games, Multi-player Online Battle Arena (MOBA)

games have attracted much attention recently [

]. The unique playing mechanics of MOBA

involve role/hero play and multi-player. Especially since MOBA games have different roles/heroes

and each role has different actions, a good AI model needs to perform stably well in controlling the

Authors contributed equally;

[

work done at Tencent

Corresponding author

36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks.

arXiv:2209.08483v3 [cs.LG] 18 Oct 2022

(a) User interface (b) Change of opponents (c) Change of targets

Figure 1:

Game user interface (UI) and the change of opponents and targets in one match of Honor of Kings. (a)

In the main screen, there are four sub-parts: a mini-map on the top-left, a dashboard that records the number of

KDAs (kill/death/assist) on the top-right, a movement controller on the bottom-left, and skill controller buttons

on the bottom-right. (b) The environment changes with different opponent heroes. (c) The action space changes

with different target heroes.

actions of different heroes against different opponent heroes. This makes MOBA 1v1 games, which

focus on hero control [29], a perfect testbed to test the generality of models under different tasks.

Existing benchmark environments on RL generality are mainly focusing on relatively narrow tasks

for a single agent. For example, MetaWorld [

] and RLBench [

] present benchmarks of simulated

manipulation tasks in a shared, table-top environment with a simulated arm, whose goal is to train the

arm controller to complete tasks like opening the door, fetching balls, etc. As the agent’s action space

remains the same as an arm, it is hard to tell the generality of the learned RL on more diverse tasks

like simulated legs.

In this paper, we provide Honor of Kings Arena, a MOBA 1v1 game environment, authorized by

the original game Honor of Kings

. The game of Honor of Kings was reported to be one of the

world’s most popular and highest-grossing games of all time, as well as the most downloaded App

worldwide. As of November 2020, the game was reported to have over 100 million daily active

players [

]. There are two camps in MOBA 1v1, each with one agent, and each agent controls a

hero character. As shown in Figure 1(a), an Honor of Kings player uses the bottom-left steer button to

control the movements of a hero and uses the bottom-right set of buttons to control the hero’s skills.

To win a game, agents must take actions with planning, attacking, defending and skill combos, with

consideration on the opponents in the partially observable environment.

Speciﬁcally, the Honor of Kings Arena imposes the following challenges regarding generalization:

• Generalization across opponents

. When controlling one target hero, its opponent hero varies

across different matches. There are over 20 possible opponent heroes in Honor of Kings Arena (in the

original game there are over 100 heroes), each having different inﬂuences in the game environment.

If we keep the same target hero and vary the opponent hero as in Figure 1(b), Honor of Kings Arena

could be treated as a similar environment as MetaWorld [

], which both provides a variety of tasks

for the same agent with the same action space.

• Generalization across targets

. The generality challenge of RL arises to a different dimension

when it comes to the competitive setting. In a match of MOBA game like Honor of Kings and DOTA,

players also need to master different hero targets. Playing a different MOBA hero is like playing a

different game since different heroes have various attacking and healing skills, and the action control

can completely change from hero to hero, as shown in Figure 1(c). With over 20 heroes to control for

Honor of Kings Arena, it calls for robust and generalized modeling in RL.

Contributions:

As we will show in this paper, the above-mentioned challenges are not well solved

by existing RL methods under Honor of Kings Arena. In summary, our contributions are as follows:

•

We provide the Honor of Kings Arena, a highly-optimized game engine that simulates the popular

MOBA game, Honor of Kings. It supports 20 heroes in the competitive mode.

•

We introduce simple and standardized APIs to make RL in Honor of Kings straightforward: the

complex observations and actions are deﬁned in terms of low-resolution grids of features; conﬁgurable

rewards are provided combining factors like the score from the game engine.

•

We evaluate RL algorithms under Honor of Kings Arena, providing an extensive set of benchmark-

ing results for future comparison.

•

The generality challenges in competitive RL settings are proposed with preliminary experiments

showing that existing RL methods cannot cope well under Honor of Kings Arena.

https://en.wikipedia.org/wiki/Honor_of_Kings

What Honor of Kings Arena isn’t:

Honor of Kings Arena currently only supports the 1v1 mode of

Honor of Kings, where there is only one hero in each camp. Though we acknowledge that some game

modes of Honor of Kings (i.e., 3v3 or 5v5 mode where there are multiple heroes in each camp) are

more popular than 1v1 mode, they might complicates the generalization challenge in competitive RL

with the cooperation ability between heroes (which is out of the scope of this paper). The cooperation

ability also makes it hard to evaluate the generalization ability of different models. Thus, Honor of

Kings Arena leaves these 3v3 and 5v5 mode of Honor of Kings out of the current implementation.

2 Motivations and Related Work

The key motivation behind Honor of Kings Arena is to address a variety of needs that existing

environments lack:

Diversity in controlled targets

A unique feature of Honor of Kings Arena is that it has 20 heroes

for the agents to control, where each hero has its unique skills. In most existing open environments

like Google Research Football [

], StarCraft2 AI Arena [

], Blood Bowl [

], MetaWorld [

] and

RLBench [

], the meaning of actions remains the same when the agent controls different target units.

The change of action control between a great number of heroes in Honor of Kings Arena provides

appealing scenarios for testing the generalization ability of the agents. RoboSuite [

] provides seven

different robotic arms across ﬁve single-agent tasks and three cooperative tasks between two arms.

HoK environment differs from Robosuite by providing a large number of competitive settings.

Free and open accessibility

DotA2 shares a similar game setting with Honor of Kings (both are

representative MOBA games with large state/action spaces), which also has multi-agent competition

and coordination, unknown environment model, partially observable environment and diverse target

heroes to control, but the environment in [

] used by OpenAI is not open with only an overview

posted. Other MOBA environments like Derk’s Gym [

] lack free accessibility because of the

requirement of commercial licenses. XLand [

] also focuses on the generalization capability of

agents and supports multi-agent scenarios, but it is not open-source.

Existing Interest

This environment has been used as a testbed for RL in research competitions

and many researchers have conducted experiments under the environment of Honor of Kings [

].Though some of them veriﬁed the feasibility of reinforcement learning in

tackling the game [

], they are more focused on methodological novelty in planning, tree-

searching, etc. Unlike these papers, this paper focuses on making the environment open-accessible

and providing benchmarking results, which could serve as a reference and foundation for future

research. Moreover, this paper showed the weaknesses of former methods in lacking of model

generalization across multiple heroes.

3 Honor of Kings Arena Environment

Honor of Kings Arena is open-sourced under Apache License V2.0 and accessible to all indi-

viduals for any non-commercial activity. The encrypted game engine and game replay tools fol-

lows Tencent’s Hornor of Kings AI And Machine Learning License

and can be downloaded

from:

https://aiarena.tencent.com/hok/download

. The code for agent training and eval-

uation is built with ofﬁcial authorization from Honor of Kings and is available at:

https:

//github.com/tencent-ailab/hok_env

. Any non-commercial users are free to download our

game engine and tools after registration.

3.1 Tasks

We use the term "task" to refer to speciﬁc conﬁgurations of an environment (e.g., game setting,

speciﬁc heroes, number of agents, etc.). The general task for agents in Honor of Kings Arena is as

follows: When the match starts, each player controls the hero, sets out from the base, gains gold and

experience by killing or destroying other game units (e.g., enemy heroes, creeps, turrets). The goal is

to destroy the opponent’s turrets and base crystal while protecting its own turrets and base crystal. A

detailed description of the game units and heroes can be found in Appendix B.

https://aiarena.tencent.com/aiarena/en

https://github.com/tencent-ailab/hok_env/blob/master/GAMECORE.LICENSE

Figure 2:

The tasks in Honor of Kings Arena. Each row represents the same target hero with different opponent

heroes. Each column represents different target heroes with the same opponent hero. There are 20 heroes in

Honor of Kings Arena, making 20 × 20 = 400 tasks in total.

Though the general goal is the same across different matches, every match would differentiate from

each other. Before the match starts, each player needs to choose one hero to control, where each hero

has its unique skills, which would have different inﬂuences on the environment. Any changes in the

chosen hero would make the task different. As shown in Figure 2, in the current Honor of Kings

Arena, 20 heroes could be chosen, which makes up 400 tasks in total.

3.2 Agents

Honor of Kings Arena provides recognizable and conﬁgurable observation spaces, action spaces, and

reward functions. In this section, we provide a general description of these functions, whose details

can be found in Appendix.

Observation Space

The observations often carry spatial and status cues and suggest meaningful

actions to perform in a given state. In Honor of Kings Arena, the observation space is designed to

be the same across all heroes, creating the opportunity to generalize across tasks. Speciﬁcally, the

observation space of Honor of Kings Arena consists of ﬁve main components, whose dimensions

depend on the number of heroes in the game (for the full description, please see Appendix D):

HeroStatePublic

, which describes the hero’s status;

HeroStatePrivate

, which includes the speciﬁc

skill information for all the heroes in the game;

VecCreeps

describing the status of soldiers in the

troops;

VecTurrets

describing the status of turrets and crystals;

VecCampsWholeInfo

, which indicates

the period of the match.

Action Space

The native action space of the environment consists of a triplet form, which covers

all the possible actions of the hero hierarchically: 1) which action button to take; 2) who to target,e.g.,

a turret, an enemy hero, or a soldier in the troop; 3) how to act, e.g., the discretized direction to move

and release skills. Note that different heroes have different prohibited skill offsets since they have

different skills.

Reward Information

Honor of Kings has both sparse and dense reward conﬁgurations in ﬁve

categories: farming related, kill-death-assist (KDA) related, damage related, pushing related, and

win-lose related (for the full description, please see Appendix F).

Episode Dynamics

An episode of Honor of Kings Arena task terminates when the crystal of one

camp is pushed down. In practice, there is a time limit in training, though an actual round of Honor of

Kings game has no time limit. The timer is set at the beginning of the episode. The actions in Honor

of Kings Arena are executed every 133ms by default to match with the response time of top-amateur

players, while the action interval is conﬁgurable. The constraints of the game are expressed in system

剩余19页未读，继续阅读

评论收藏

内容反馈

代码的搬运工_XXMN

粉丝: 4
资源: 4

腾讯王者荣耀人工智能开放研究环境 -论文

最新资源

腾讯王者荣耀人工智能开放研究环境 -论文

20150409-华泰证券-科大讯飞-002230-深度研究：从语音识别王者向人工智能巨头的华丽转身.pdf

人脸识别（腾讯AI开放平台）1

经过处理的腾讯中文词汇/短语向量 tencent-ailab-embedding-zh-d200-v0.2.0-s

王者荣耀源码.rar

王者荣耀人物装备PNG153个，透明背景图片，可用于视频贴图。（免积分下载）

精品社会调研范文-王者荣耀市场调研报告-调研报告.doc.doc

AI人工智能技术伦理观报告-腾讯研究院-2019.6-30页.pdf

2020腾讯人工智能白皮书-腾讯研究院-202007精品报告2020.pdf

国家数字竞争力指数研究报告2019-腾讯研究院-中国人民大学-201906.pdf

王者荣耀源代码，，

已改-戴永伦 腾讯公司营销研究-论文.zip

AI人工智能技术伦理观报告-腾讯研究院-2019.6-30页.rar

simhei.ttf和王者荣耀数据集.rar

某宝两万多买的王者荣耀素材

【2022】2022人工智能教育蓝皮书-华东师范&中国教育科学院&腾讯研究院_168页.pdf

新版云开发王者荣耀查询小程序源码带流量主

腾讯研究院-人机共生-大模型时代的AI十大趋势观察-2023.07-52页.pdf

腾讯地图jdk qqmap-wx-jssdk

王者荣耀产品体验分析

腾讯-人工智能 制造 产业发展研究报告-2018.06-48页.pdf

AI人工智能技术伦理观报告-腾讯研究院-6-30页.pdf

00后来袭：腾讯00后研究报告-腾讯-201811.pdf

腾讯未来交通白皮书2.0-腾讯研究院&腾讯智慧交通-2021.11-33页.pdf

腾讯-00后研究报告-2018.5-52页.pdf

2019腾讯00后研究报告-腾讯广告-201910.pdf

腾讯AI开放平台TencentAIopenplatformforlaravel

王者荣耀查询小程序源码.rar

8.（地图数据篇）腾讯地图矢量瓦片数据爬取--java代码.zip

进取的00后—2019腾讯00后研究报告-腾讯-202012精品报告2020.pdf

王者荣耀查询小程序源码最新云开发微信小程序王者查询源码 带流量主

最新资源

已改-戴永伦腾讯公司营销研究-论文.zip

腾讯-人工智能制造产业发展研究报告-2018.06-48页.pdf

王者荣耀查询小程序源码最新云开发微信小程序王者查询源码带流量主