从噪声污染的部分状态观测推断连续动态博弈中的目标_InferringObjectivesinContinuousDyna资源-CSDN文库

版权申诉

159 浏览量 2022-01-20 23:34:23 上传评论收藏 1.73MB PDF 举报

从噪声污染的部分状态观测推断连续动态博弈中的目标_Inferring Objectives in Continuous Dynamic Games from Noise-Corrupted Partial State Observations.pdf 在本文“从噪声污染的部分状态观测推断连续动态博弈中的目标”中，研究者探讨了如何在连续动态博弈（Continuous Dynamic Games）环境中，通过部分且受噪声污染的状态观测来推断多个智能体（如机器人或自主系统）的目标。动态博弈理论提供了一个强大的工具，用于描述多智能体之间具有不同目标的交互行为。然而，定义每个智能体的精确目标是一项挑战，而这篇论文提出了一种新方法来解决这个问题。该方法的核心是逆向游戏求解器（Inverse Game Solver），它能同时优化各个智能体的目标函数和连续状态估计。通过将玩家目标与纳什均衡（Nash Equilibrium）约束关联起来，该方法可以直接最大化观察到的交互行为的概率，而无需依赖其他非概率性替代标准。这意味着即使在状态不完全观测或策略信息模糊的情况下，也能从嘈杂的数据中恢复出智能体的目标信息。关键创新点在于，这种方法不需要完全的游戏状态或玩家策略就能识别玩家目标。相反，它能够从含有噪声的局部状态观测中稳健地推断出这些信息。作为推断目标的一个副产品，该方法还能计算出对应于这些目标的纳什均衡轨迹，这使得它适用于下游的轨迹预测任务。在几个模拟交通场景的应用中展示了这种方法的效果。实验结果表明，即使在噪声干扰下，它也能从短序列的局部状态观测中可靠地估计出玩家目标。此外，利用这些估计的目标，该方法能够准确预测每个玩家的未来轨迹，这对于自动驾驶、交通管理和其他多智能体交互情境具有重要意义。总结来说，这篇论文提出了一个新颖的框架，通过部分和噪声污染的观测数据来推断动态博弈中智能体的目标，为理解和预测多智能体系统的复杂行为提供了有力的工具。这一技术对于提高机器人和自主系统的交互能力以及优化服务有着深远的影响，特别是在需要处理不确定性、部分信息和动态环境的领域。

资源详情

资源评论

Inferring Objectives in Continuous Dynamic Games

from Noise-Corrupted Partial State Observations

Lasse Peters

∗

, David Fridovich-Keil

†

, Vicenc¸ Rubies-Royo

‡

, Claire J. Tomlin

‡

and Cyrill Stachniss

∗

University of Bonn, Germany

†

University of Texas, Austin, USA

‡

University of California, Berkeley, USA

Email: {lasse.peters, cyrill.stachniss}@igg.uni-bonn.de, dfk@utexas.edu, {vrubies, tomlin}@berkeley.edu

Abstract—Robots and autonomous systems must interact with

one another and their environment to provide high-quality

services to their users. Dynamic game theory provides an ex-

pressive theoretical framework for modeling scenarios involving

multiple agents with differing objectives interacting over time.

A core challenge when formulating a dynamic game is designing

objectives for each agent that capture desired behavior. In this

paper, we propose a method for inferring parametric objec-

tive models of multiple agents based on observed interactions.

Our inverse game solver jointly optimizes player objectives

and continuous-state estimates by coupling them through Nash

equilibrium constraints. Hence, our method is able to directly

maximize the observation likelihood rather than other non-

probabilistic surrogate criteria. Our method does not require full

observations of game states or player strategies to identify player

objectives. Instead, it robustly recovers this information from

noisy, partial state observations. As a byproduct of estimating

player objectives, our method computes a Nash equilibrium

trajectory corresponding to those objectives. Thus, it is suitable

for downstream trajectory forecasting tasks. We demonstrate our

method in several simulated trafﬁc scenarios. Results show that

it reliably estimates player objectives from a short sequence of

noise-corrupted partial state observations. Furthermore, using

the estimated objectives, our method makes accurate predictions

of each player’s trajectory.

I. INTRODUCTION

Most robots use motion planning and optimal control meth-

ods to select and execute actions when operating in the real

world. Commonly used approaches require specifying the

objective to optimize. In many real-world applications, how-

ever, designing optimal control objectives is challenging. For

example, tuning cost parameters, even in the case of a linear-

quadratic regulator (LQR), can be a tedious heuristic process

when performed manually. As a result, it can be desirable to

learn optimal control objectives automatically from demonstra-

tions. To this end, researchers have investigated learning from

demonstration and inverse optimal control (IOC). Recent work

shows promising results, even for complex problems with large

state and observation spaces [10, 23].

Optimal control methods, however, are not directly suitable

for interactive settings with multiple agents. For example,

consider multiple vehicles engaged in lane changes on a

crowded highway. In this setting, each agent has its own

objective that naturally depends upon the behavior of others.

For instance, agents may wish to maintain a safe distance

from others and at the same time travel at a preferred speed.

Thus, their interaction is more accurately characterized as

a noncooperative game-theoretic equilibrium rather than as

-1 0 1 2

Position x [ m]

-4

-3

-2

-1

-1 0 1 2

Position x [ m]

-4

-3

-2

-1

Position y [m]

ObservationsGround Truth

Player Objectives

Player Strategies

Inverse Game

Forward Game

-1 0 1 2

Position x [ m]

-4

-3

-2

-1

Position y [m]

-1 0 1 2

Position x [ m]

-4

-3

-2

-1

Player 5

Player 1

Cost

0.0

0.2

0.4

0.6

0.8

1.0

...

Fig. 1. Inverse and forward versions of a dynamic game modeling a 5-player

highway driving scenario. The solution of the forward problem maps the

player objectives (left) to the players’ optimal strategies (right). Our method

solves the inverse problem: it takes noisy, partial state observations of multi-

agent interaction as input to recover an objective model for each player that

explains the the observed behavior. The visualized slice of the cost landscape

shows one important aspect of the recovered objective model, namely, each

player’s preference to keep a safe distance from others. The inferred objectives

deﬁnes an abstract game-theoretic behavior model that can be used to predict

player strategies for arbitrary agent conﬁgurations.

the solution to a joint optimal control problem. Despite the

added complexity of these noncooperative interactions, re-

cent developments enable computationally-efﬁcient solutions

to the dynamic games which arise in multi-agent robotic

settings [8, 11, 12, 19].

There are similar challenges in designing objectives for

dynamic games as for single-player optimal control problems.

As in the single-player case, automatic cost learning promises

to circumvent this difﬁculty. However, cost inference takes on

an even more important role in multi-agent settings. That is,

any individual player must also understand the objectives of

other players to interact effectively. In this paper, we study

arXiv:2106.03611v3 [cs.RO] 7 Aug 2021

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余9页未读，立即下载

评论收藏

内容反馈

版权申诉

从噪声污染的部分状态观测推断连续动态博弈中的目标_Inferring Objectives in Continuous Dyna

评论0

最新资源

从噪声污染的部分状态观测推断连续动态博弈中的目标_Inferring Objectives in Continuous Dyna

评论0

最新资源

相关推荐

环境污染治理博弈问题分析与研判

Bugs as deviant behavior__a general approach to inferring errors in systems code

Inferring Phylogenies.pdf

Inferring Relevant Social Networks from Interpersonal Communication

食饵捕食模型matlab代码-Inferring_community_assembly_processes:Inferring_communi

Python-inferringandexecuting用于视觉推理的推断和执行程序

LockInfer:从连接模式推断奇怪的行为（PAKDD 2014，KAIS 2015）

ARC_V2_FPGA_Synthesis_Flow.pdf

成人高等教育学士学位英语考试复习资料全.doc

Inferring and visualizing social networks on Internet Relay Chat

布卢姆认知目标过程维度分类.pdf

R软件代码转换为matlab-Inferring-critical-thresholds:该存储库包含论文“从空间数据推断生态系统转变的临界阈

Inferring Gene Regulatory Network for Cell Reprogramming

[删除] mobicom2010的论文集合part_2

matlab人头检测的代码-amodal-py-faster-rcnn:cuda8.0cudnnv5（〜ing）

RotatE：Knowledge Graph Embedding by Relational Rotation in Complex Space.pdf

Your Cart Tells You: Inferring Demographic Attributes from Purchase Data

Large-scale Knowledge Base Completion: Inferring via Grounding Network Sampling over Selected Instances

回归树matlab代码-GENIE3:基于机器学习的方法可根据表达数据推断基因调控网络

Altera器件的推荐代码风格.pdf

Inferring Caragana adaptive dispersion:from central to Asian to east Asia

基于源和答案的问题生成 .pdf

英语教学法UnitTeachingReadingPPT课件.pptx

Philosophers are Mortal- Inferring the Truth of Unseen Facts(CONLL13).pdf

道路网络的建设和车辆路线的热路线推断

tensor voting理论详解（代码实现）

Mastering Lambdas- Java Programming in a Multicore World

计算机视觉_一种现代方法_Computer Vision. A Modern Approach

TRIP：交互式检索-推断数据插补方法