没有合适的资源?快使用搜索试试~ 我知道了~
An Online POMDP Solver for Uncertainty Planning in Dynamic Envir...
需积分: 0 4 下载量 88 浏览量
2020-12-08
09:45:05
上传
评论
收藏 247KB PDF 举报
温馨提示
试读
19页
一种online方式求解POMDP问题的工具,Adaptive Belief Tree (ABT)算法。 应该是最早提出这个方法的文章,看过我的https://blog.csdn.net/qq_34241498/article/details/105288427,里面这个文章链接不对,在此提供,供需要者学习下载。
资源详情
资源评论
资源推荐
An Online POMDP Solver for Uncertainty
Planning in Dynamic Environment
Hanna Kurniawati and Vinay Yadav
Abstract Motion planning under uncertainty is important for reliable robot oper-
ations in uncertain and dynamic environments. Partially Observable Markov Deci-
sion Process (POMDP) is a general and systematic framework for motion planning
under uncertainty. To cope with dynamic environment well, we often need to modify
the POMDP model during runtime. However, despite recent tremendous advances
in POMDP planning, most solvers are not fast enough to generate a good solution
when the POMDP model changes during runtime. Recent progress in online POMDP
solvers have shown promising results. However, most online solvers are based on
replanning, which recompute a solution from scratch at each step, discarding any
solution that has been computed so far, and hence wasting valuable computational
resources. In this paper, we propose a new online POMDP solver, called Adaptive
Belief Tree (ABT), that can reuse and improve existing solution, and update the
solution as needed whenever the POMDP model changes. Given enough time, ABT
converges to the optimal solution of the current POMDP model in probability. Prelim-
inary results on three distinct robotics tasks in dynamic environments are promising.
In all test scenarios, ABT generates similar or better solutions faster than the fastest
online POMDP solver today; using an average of less than 50 ms of computation
time per step.
Vinay Yadav—All work were done while the author was an internship student at University of
Queensland.
H. Kurniawati (
B
)
School of Information Technology
and Electrical Engineering, University of Queensland, Brisbane, Australia
e-mail: hannakur@uq.edu.au
V. Yadav
Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, India
e-mail: vinayyadav.iitkgp@gmail.com
© Springer International Publishing Switzerland 2016
M. Inaba and P. Corke (eds.), Robotics Research, Springer Tracts
in Advanced Robotics 114, DOI 10.1007/978-3-319-28872-7_35
611
612 H. Kurniawati and V. Yadav
1 Introduction
Motion planning under uncertainty is important for reliable robot operation in imper-
fectly known and dynamic environments. Partially Observable Markov Decision
Process (POMDP) is a general and systematic framework for planning under uncer-
tainty. Motion planning under uncertainty problems can be modelled as POMDPs
quite naturally. However, solving a POMDP problem exactly is computationally
intractable [14]. A lot of effort and tremendous progress have been made in devel-
oping efficient approximate POMDP solvers [5, 6, 11, 15, 16, 19, 21, 24], such
that today, we have POMDP solvers that can solve simple to moderately difficult
motion planning problems within seconds to minutes [7, 8, 12]. However, many of
these solvers are not efficient enough to recompute or update its solution when the
POMDP model changes during runtime. Such changes in the POMDP model are
often required when a robot operates in dynamic environment. This paper proposes
a new approximate POMDP solver that improves and updates its solution online,
following changes in the environment.
In imperfectly known and dynamic environment, a robot rarely knows its exact
state due to errors in control and sensing. POMDP provides a systematic way to reason
about the best action to perform when perfect state information is unavailable. It finds
the best action with respect to the set of states that are consistent with the available
information so far. The set of states is represented as a probability distribution, called
a belief b, and the set of all possible beliefs is called the belief space B. A POMDP
solver calculates an optimal policy π
∗
: B → A that maps a belief in B to an action
in the set A of all possible actions the robot can perform, so as to maximize a
given objective function. An offline POMDP solver computes this mapping prior to
execution, while an online POMDP solver computes the mapping during runtime.
Methods that use POMDP framework to solve planning in imperfectly known
and dynamic environment can be classified into two approaches. The first approach
embeds all possible environments and their dynamics as part of the POMDP model.
It uses an offline POMDP solver to find a good policy, prior to execution. When
the environment and its dynamics are largely unknown, this approach constructs
POMDP models too huge to be solved by even the best offline solver today.
The second approach models only the known part of the environment and its
dynamics (both stochastic and deterministic), and allows the model to change during
execution when more information about the environment becomes available. Key to
the success of this approach is an efficient online POMDP solver that can compute
a good policy during runtime, following changes in the POMDP model.
Online POMDP solvers have advanced significantly over the past few years
[5, 6, 18, 19]. However, most of these solvers [5, 6, 18] are based on replanning,
which recompute the best action to perform from scratch at each step, discarding any
policy that has been computed so far. As a result, these solvers often waste significant
computational resources when changes happen gradually or only to some part of the
environment, which are often the case in robotics tasks.
An Online POMDP Solver for Uncertainty Planning in Dynamic Environment 613
This paper proposes a new online POMDP solver, called Adaptive Belief Tree
(ABT), that reuses and improves existing policy at each time step, and update the
policy as needed whenever the POMDP model changes. To enable fast policy update,
ABT uses the following two observations. First, a change in the POMDP model is
directly reflected as a change in the robot’s behaviour at a particular set of states.
Second, a change in one optimal mapping π
∗
(b) from a belief b, may affect the
optimal policy π
∗
(.) at other beliefs that can reach b. Using insight from these
observations, ABT represents the policy as pairs of belief and action, and explicitly
represents the relation between beliefs, states, and their reachability information, so
that it can quickly identify subset of the policy affected by changes in the POMDP
model and update the policy fast whenever necessary. To quickly generate a good
policy, ABT plans with respect to only a set of representative sampled beliefs. It
represents each belief as a set of state particles, and samples a belief b by sampling
a set of state trajectories from a particle of the given initial belief b
0
. An effective
strategy for sampling state trajectories enables ABT to converge to an optimal policy
in probability, and quickly generate a good policy. Preliminary results on three distinct
robotics tasks in dynamic environment indicate that ABT can generate similar or
better motion strategies faster than the best online POMDP solver today [19]. In all
three test scenarios, ABT requires an average of less than 400ms of preprocessing
time, and an average of less than 50 ms of online computation time per step.
Furthermore, ABT is designed for POMDP problems with continuous state space
and uses a generative model. A generative model is a black box simulator that enables
us to generate experiences about the system dynamic and behaviour at various dif-
ferent states. By using a generative model, ABT does not need an explicit model on
control error, observation error, and uncertainty about the system dynamics, which
are often difficult to obtain in complex robotics tasks.
2 Related Work
2.1 POMDP Background
A POMDP is defined as a tuple S, A, O, T , Z, R, b
0
, γ , where S is the set of states,
A is the set of actions, and O is the set of observations. At each step, the agent is in a
state s ∈ S, takes an action a ∈ A, moves from s to an end state s
∈ S, and perceives
an observation o ∈ O. Due to action uncertainty, the system dynamic from s to s
is represented as a conditional probability function T (s, a, s
) = f (s
|s, a). Further-
more, due to sensing uncertainty, after performing action a and ends at state s
, the
observation that may be perceived by the agent is represented as a conditional prob-
ability function Z(s
, a, o) = f (o|s
, a). At each step, the agent receives a reward
R(s, a), if it takes action a from state s. The agent’s goal is to choose a suitable
sequence of actions that will maximize its expected total reward, while the agent’s
initial belief is denoted as b
0
. When the sequence of actions has infinite length, we
614 H. Kurniawati and V. Yadav
specify a discount factor γ ∈ (0, 1) so that the total reward is finite and the problem
is well defined.
In many problems with large state space, explicit representation of the conditional
probability functions T and Z may not be available. However, one can use a gener-
ative model, which is a black box simulator that outputs an observation perceived,
reward received, and next state visited when the agent performs the input action from
the input state.
A POMDP planner computes an optimal policy that maximizes the agent’s
expected total reward. A POMDP policy π : B → A assigns an action a to each
belief b ∈ B. A policy π induces a value function V (b, π ) which specifies the
expected total reward of executing policy π from belief b, and is computed as
V (b, π ) = E[
∞
t=0
γ
t
R(s
t
, a
t
)|b, π]. A policy can be represented by various rep-
resentations, e.g., policy-graph [3] and pairs of belief and action [23]. However,
most online solvers do not maintain an explicit representation of the policy. Instead,
they calculate the mapping from beliefs to actions on the fly. In contrast, our pro-
posed online solver maintains an explicit representation of a subset of the policy, and
improves and updates the policy on the fly.
To execute a policy π, an agent executes action selection and belief update repeat-
edly. For example, if the agent’s current belief is b, it selects the action referred
to by a = π(b). After the agent performs action a and receives an observation o
according to the observation function Z , it updates b to a new belief b
given by
b
(s
) = τ (b, a, o) = ηZ(s
, a, o)
s∈S
T (s, a, s
)ds where η is a normalization con-
stant. We use generative model and represents each belief with a set of particles. The
belief update is approximated using particle filter.
2.2 Related POMDP Solvers
POMDP is a systematic and general approach for planning under uncertainty.
Although solving a POMDP exactly is computationally intractable [14], the past
few years have seen tremendous increase in the capability of both offline and online
POMDP solvers [11, 19, 21], such that POMDP approach is now practical for solving
simple to moderately difficult motion planning problems.
The fastest offline POMDP solvers today are based on point-based approach
[11, 15, 20, 21]. This approach reduces the complexity of planning in the belief
space B by representing B as a set of sampled beliefs and planning with respect
to this set only. To generate a policy, most point-based POMDPs use value iter-
ation, utilizing the fact that the optimal value function satisfies Bellman equa-
tion. They start from an initial policy, represented as a value function V . And
iteratively perform Bellman backup on V at the sampled beliefs, i.e., V (b) =
max
a∈A
R(b, a) + γ
o∈O
τ (b, a, o)
ˆ
V
∗
(τ (b, a, o))
where
ˆ
V
∗
(b
) is the current
best value of b
. The iteration is performed until it converges. Over the past few
years, impressive progress have been gained by improving the strategy for sampling
剩余18页未读,继续阅读
_Little_Coder_
- 粉丝: 11
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 筷手引流工具.apk
- 论文(最终)_20240430235101.pdf
- 基于python编写的Keras深度学习框架开发,利用卷积神经网络CNN,快速识别图片并进行分类
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0