Inferring Objectives in Continuous Dynamic Games
from Noise-Corrupted Partial State Observations
Lasse Peters
∗
, David Fridovich-Keil
†
, Vicenc¸ Rubies-Royo
‡
, Claire J. Tomlin
‡
and Cyrill Stachniss
∗
∗
University of Bonn, Germany
†
University of Texas, Austin, USA
‡
University of California, Berkeley, USA
Email: {lasse.peters, cyrill.stachniss}@igg.uni-bonn.de, dfk@utexas.edu, {vrubies, tomlin}@berkeley.edu
Abstract—Robots and autonomous systems must interact with
one another and their environment to provide high-quality
services to their users. Dynamic game theory provides an ex-
pressive theoretical framework for modeling scenarios involving
multiple agents with differing objectives interacting over time.
A core challenge when formulating a dynamic game is designing
objectives for each agent that capture desired behavior. In this
paper, we propose a method for inferring parametric objec-
tive models of multiple agents based on observed interactions.
Our inverse game solver jointly optimizes player objectives
and continuous-state estimates by coupling them through Nash
equilibrium constraints. Hence, our method is able to directly
maximize the observation likelihood rather than other non-
probabilistic surrogate criteria. Our method does not require full
observations of game states or player strategies to identify player
objectives. Instead, it robustly recovers this information from
noisy, partial state observations. As a byproduct of estimating
player objectives, our method computes a Nash equilibrium
trajectory corresponding to those objectives. Thus, it is suitable
for downstream trajectory forecasting tasks. We demonstrate our
method in several simulated traffic scenarios. Results show that
it reliably estimates player objectives from a short sequence of
noise-corrupted partial state observations. Furthermore, using
the estimated objectives, our method makes accurate predictions
of each player’s trajectory.
I. INTRODUCTION
Most robots use motion planning and optimal control meth-
ods to select and execute actions when operating in the real
world. Commonly used approaches require specifying the
objective to optimize. In many real-world applications, how-
ever, designing optimal control objectives is challenging. For
example, tuning cost parameters, even in the case of a linear-
quadratic regulator (LQR), can be a tedious heuristic process
when performed manually. As a result, it can be desirable to
learn optimal control objectives automatically from demonstra-
tions. To this end, researchers have investigated learning from
demonstration and inverse optimal control (IOC). Recent work
shows promising results, even for complex problems with large
state and observation spaces [10, 23].
Optimal control methods, however, are not directly suitable
for interactive settings with multiple agents. For example,
consider multiple vehicles engaged in lane changes on a
crowded highway. In this setting, each agent has its own
objective that naturally depends upon the behavior of others.
For instance, agents may wish to maintain a safe distance
from others and at the same time travel at a preferred speed.
Thus, their interaction is more accurately characterized as
a noncooperative game-theoretic equilibrium rather than as
-1 0 1 2
Position x [ m]
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
-1 0 1 2
Position x [ m]
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Position y [m]
1
2
3
4
5
ObservationsGround Truth
Player Objectives
Player Strategies
Inverse Game
Forward Game
-1 0 1 2
Position x [ m]
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Position y [m]
-1 0 1 2
Position x [ m]
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Player 5
2
1
3
4
5
Player 1
1
2
3
4
5
Cost
0.0
0.2
0.4
0.6
0.8
1.0
...
...
...
Fig. 1. Inverse and forward versions of a dynamic game modeling a 5-player
highway driving scenario. The solution of the forward problem maps the
player objectives (left) to the players’ optimal strategies (right). Our method
solves the inverse problem: it takes noisy, partial state observations of multi-
agent interaction as input to recover an objective model for each player that
explains the the observed behavior. The visualized slice of the cost landscape
shows one important aspect of the recovered objective model, namely, each
player’s preference to keep a safe distance from others. The inferred objectives
defines an abstract game-theoretic behavior model that can be used to predict
player strategies for arbitrary agent configurations.
the solution to a joint optimal control problem. Despite the
added complexity of these noncooperative interactions, re-
cent developments enable computationally-efficient solutions
to the dynamic games which arise in multi-agent robotic
settings [8, 11, 12, 19].
There are similar challenges in designing objectives for
dynamic games as for single-player optimal control problems.
As in the single-player case, automatic cost learning promises
to circumvent this difficulty. However, cost inference takes on
an even more important role in multi-agent settings. That is,
any individual player must also understand the objectives of
other players to interact effectively. In this paper, we study
arXiv:2106.03611v3 [cs.RO] 7 Aug 2021
评论0
最新资源