Simulation-basedTrainingofAI.pdf资源-CSDN文库

计算机程序

需积分: 9 96 浏览量 2020-05-06 19:31:29 上传评论收藏 7.18MB PDF 举报

资源推荐

资源详情

资源评论

Adaptive Simulation-based Training of AI

Decision-makers using Bayesian Optimization

Brett Israelsen

and Nisar Ahmed

University of Colorado, Boulder, CO, 80309, USA

Kenneth Center

and Roderick Green

Orbit Logic Incorporated, Greenbelt, MD, 20770, USA

Winston Bennett Jr.

Wright Patterson AFB, OH, 45433, USA

This work studies how an AI-controlled dog-ﬁghting agent with tunable decision-

making parameters can learn to optimize performance against an intelligent adversary,

as measured by a stochastic objective function evaluated on simulated combat engage-

ments. Gaussian process Bayesian optimization (GPBO) techniques are developed to

automatically learn global Gaussian Process (GP) surrogate models, which provide

statistical performance predictions in both explored and unexplored areas of the pa-

rameter space. This allows a learning engine to sample full-combat simulations at

parameter values that are most likely to optimize performance and also provide highly

informative data points for improving future predictions. However, standard GPBO

methods do not provide a reliable surrogate model for the highly volatile objective

functions found in aerial combat, and thus do not reliably identify global maxima.

These issues are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-

point Sampling (HRMS) techniques. Simulation studies show that HRMS improves

the accuracy of GP surrogate models, allowing AI decision-makers to more accurately

predict performance and eﬃciently tune parameters.

Graduate Researcher, Computer Science, AIAA Student Member

Assistant Professor, Aerospace Engineering Sciences, AIAA Member

Director, Orbit Logic Incorporated, AIAA Professional Member

Sr. Software Engineer, Orbit Logic Incorporated, MD

Division Technical Advisor – Air Force Research Laboratory Warﬁghter Readiness Research Division

arXiv:1703.09310v2 [cs.LG] 28 Jul 2017

I. Introduction

Rapid advancement in the capabilities of Artiﬁcial Intelligence (AI) has the potential to com-

pletely revolutionize the way that U.S. armed forces train for battlespace dominance. As AI becomes

more sophisticated, there are many opportunities to insert AI into domain training environments.

One use of AI is as the basis of agents that stand in for human vehicle/platform controllers in

Live-Virtual-Constructive (LVC) simulations. Humans participating in training exercises in these

environments need to be challenged at an appropriate level relative to their skills. Needed are agents

that can assess the skill-level of human participants and adapt accordingly, serving as credible and

adaptable adversaries that are indistinguishable from experienced humans. This work investigates

how an AI agent with tunable parameters governing its overall behavior can be adapted to optimize

an objective function for engagement outcomes.

Several challenges make it diﬃcult to meet these objectives:

1. Simulating an engagement can be costly. Beyond the ﬁnancial expense of operating the sim-

ulation environment, contributions to the cost may also include the involvement of skilled

personnel with limited availability, and the wall-clock duration of the simulation itself.

2. The engagement metrics that need to be optimized cannot be described analytically, but

can only be evaluated by running simulations. When performance evaluations are sampled,

they are generally highly nonlinear functions of environmental parameters and decision maker

states. Consequently, many traditional optimization methods are not applicable.

3. The realistic nature of engagement simulations makes virtually all performance objective func-

tions of interest extremely volatile and uncertain (e.g. due to combined random eﬀects of

weather, terrain, sensor noise, psycho-motor time delays, etc.).

Bayesian optimization with Gaussian Process surrogate models (abbreviated as GPBO) is well-

suited for directly addressing points 1 and 2. In this formulation the Gaussian Process (GP) serves

as a tractable surrogate model that approximates the true (non-closed form/intractable) objective

function. This surrogate model estimates a nonparametric probability distribution over the objective

function values at all location in the solution space. The Bayesian optimization algorithm uses this

surrogate model to intelligently search the solution space for the optimum, based on a number

of sampled function evaluations, i.e. using ‘explore/exploit’ strategies to locate the minimum as

quickly as possible while also using sparse function evaluations to build the surrogate model. Since

the GP surrogate model is cheaper to evaluate than the true objective function, global nonlinear

optimization methods can be used on the GP model to eﬃciently search the decision parameter

space while also accounting for uncertainty in the underlying objective function.

However, standard GPBO methods are not well suited to address point 3. This work develops a

novel approach for implementing GPBO, called Hybrid Repeat/Multi-point Sampling (HRMS), to

address these issues. In the setting of simulated one-on-one aerial dog-ﬁghting engagements, GPBO

with HRMS is able to not only identify the optimum more reliably than standard GPBO, but also

yield a more accurate and consistent surrogate representation of the objective surface – using no

more total function evaluations that traditional GPBO techniques.

The remainder of this paper is outlined as follows. Section II discusses relevant prior work

and provides a formal deﬁnition of the adaptive AI agent problem; relevant details regarding the

application of GPBO for aerial dog ﬁghting performance optimization are also provided. Details of

the proposed GPBO ‘learning engine’ framework used to train the decision-making AI are given in

Section III, which also includes some discussion regarding practical implementation of GPBO. Sec-

tion IV reviews existing sampling strategies for GPBO and introduces our novel sampling strategy:

HRMS. Finally, Section V shows by simulated experiments that the proposed implementation of

GPBO with HRMS is useful for optimizing highly volatile performance metrics for the dog-ﬁghting

application. It is shown that HRMS can signiﬁcantly outperform traditional GPBO sampling tech-

niques when dealing with highly volatile objective functions, and yields valuable insights about AI

decision maker performance through the global GP surrogate model.

II. Preliminaries and Problem Description

A. Problem Domain and Previous Work

Logistical and ﬁscal constraints have led to a recent surge in interest around simulation-based

methods for training warﬁghters. As such, eﬀorts such as the Air Force’s ‘Not-So-Grand Challenge’

were developed with the speciﬁc goals of investigating solutions for current and future simulation

training systems. As part of this eﬀort, diﬀerent autonomous decision making AI agents have

been developed and evaluated based on their ability to eﬀectively mimic human pilots in diﬀerent

situations [1]. Although such AIs can eﬀectively mimic human behavior to varying degrees of success

in diﬀerent circumstances, there still remains the question of whether they can adapt based on their

adversaries’ responses. The speciﬁc problem considered here is how a single autonomous agent with

certain behavioral parameters can adapt to improve performance in response to other autonomous

agents (human or AI) in a combat simulation featuring stochastic uncertainties and highly volatile

outcomes.

The application focus is on simulations for air-to-air combat (dog-ﬁghting) training, which

have been studied extensively. For instance, McManus and Goodrich [2] discuss the integration of

an AI-based tactical decision generator (TDG) system into two separate simulators to study and

evaluate air combat environments. One of the simulation modes included an interface for human

pilots to participate in training against the TDG. More recently, the ‘Not-So-Grand Challenge’

produced multiple ‘human-like’ AI systems to train against human pilots [1]. Several diﬀerent teams

participated and were each rated on several metrics of performance and eﬀectiveness against human

pilots. State of the art techniques, such as inverse reinforcement learning, hierarchical logic, and

other proprietary approaches, were used to develop the AI pilots. One crucial factor for performance

evaluation in this case is the degree to which the AI can mimic realistic human decision making.

While it was found that some AI systems were more eﬀective than others in this regard for speciﬁc

scenarios (e.g. execution of evasive maneuvers, formation maintenance, target engagement, etc.),

the question of how to eﬃciently ‘retune’ and adapt any particular AI based on adversarial responses

remained open.

This leads to the consideration of another critical component in the evaluation process for pilots

(human or AI): development of suitable metrics that quantify performance during an engagement.

For instance, Moore et al. discuss formal methods for measuring pilot dog-ﬁghting performance

and validated it in simulated combat scenarios [3]. These measures of pilot performance are meant

to be less subjective than traditional ratings given by instructors/expert observers (as in the Not-

So-Grand Challenge), as they are based on data recorded during an engagement. This touches

on another motivation for developing AI pilots, i.e. automation of instructor functions and expert

training resources, which are costly to provide and maintain. Ideally, if an autonomous AI can

control a complex optionally manned vehicle such as a ﬁghter plane, then it should also be able

to evaluate and advise human pilots based on that same expertise (in much the same way human

instructors are able to do this).

Performance metrics of aircraft engagement scenarios have evolved considerably since the in-

ception of engagement debrieﬁngs. Many of these metrics are well accepted in the community.

In contemporary development of newer metrics, expert evaluations are still utilized for validation.

Kelly reviews and summarizes much of this work [4], and speciﬁcally mentions metrics that include

variables such as: relative aircraft position, throttle and speedbrake manipulation, and overall en-

gagement outcomes to name a few. Identiﬁcation of meaningful metrics that operate on time-series

and summary data from engagements is still an active ﬁeld of research [2, 4–8]. The work reported

in this paper utilizes several metrics developed in the ﬁghter pilot training literature as objective

functions for automated learning and tuning of decision-making AI (although it is not exclusive to

any particular set of metrics, or dog-ﬁghting training applications, per se).

A large segment of work on optimization of aircraft engagement focuses on optimal teaming

strategies. Mulgund et al. examined ‘large-scale’ air combat tactics (formations, etc.) and were

able to demonstrate promising results in that area [5, 7]. Wu et al. addressed the problem of

optimizing cooperative multiple target attack using genetic algorithms (GAs) [9]. Also applying

GAs, Gonsalves and Burge investigated how mission plans could be optimized [10]. While these are

interesting and important application areas, the present work is focused on one-on-one engagements

between autonomous adversaries (where one or both agents is an AI decision-maker), and providing

an adversary that can be adaptive to the skills of the other pilot (human/AI). In other words, as

剩余48页未读，继续阅读

评论收藏

内容反馈

南门二A

粉丝: 2
资源: 26

Simulation-based Training of AI.pdf

最新资源

Simulation-based Training of AI.pdf

gate-level-simulation-app-note.pdf

gate-level_simulation_methodology.pdf.pdf

Doc.9995-EN Manual of Evidence-based Training.pdf

ISO IEC TR 29119-11：2020 Software and systems engineering — Software testing — Part 11：Guidelines on the testing of AI-based systems - 完整英文版（58页）.pdf

prosys-opc-ua-simulation-Client-3.2.0-214.rar

Modeling-and-Simulation-in-Python-pdf.pdf.pdf

PyPI 官网下载 | general-simulation-framework-0.0.6.tar.gz

prosys-opc-ua-simulation-server-linux-aarch64-5.4.6-148.zip

The Simulation Hypothesis_ An MIT Computer Scientist Shows Why AI, Quantum .pdf

prosys-opc-ua-simulation-server-3.2.0-214.rar

Computer-Simulation-of-Liquids.pdf

Discrete-Event-System-Simulation-5th-Edition-Banks-Solution-Manual_1.pdf

Prosys-OPC-UA-Simulation-Server-UserManual.pdf

SW2010-2012.Activator.SSQ.exe

prosys-opc-ua-simulation-server-3.0.0-157.zip

[xilinx-vivado手册]UG900-vivado-logic-simulation-en-us-2024.1

计控实验四--最小拍系统设计.pdf

ISSCC 2013 所有

SolidWorks-Simulation-Student-Guide-ENG.pdf

prosys-opc-ua-simulation-server-windows-x64-5.4.6-148.exe

5-Simbeor-Tutoriel-FR-Simulation-Multi-cartes-post-routage.pdf

4-Simbeor-Tutoriel-FR-simulation-SI-avec-ODB-post-routage.pdf

MATLAB-Based Simulation of Buoyancy-Driven Underwater Glider Motion.pdf

prosys-opc-ua-simulation-server-linux-x64-5.4.6-148.zip

Simulation-and-Analysis-of-Location.zip

最新资源