【免费】AnOnlinePOMDPSolverforUncertaintyPlanninginDynamicEnvironment.pdf资源-CSDN文库

论文

需积分: 0 88 浏览量 2020-12-08 09:45:05 上传评论收藏 247KB PDF 举报

资源详情

资源评论

资源推荐

An Online POMDP Solver for Uncertainty

Planning in Dynamic Environment

Hanna Kurniawati and Vinay Yadav

Abstract Motion planning under uncertainty is important for reliable robot oper-

ations in uncertain and dynamic environments. Partially Observable Markov Deci-

sion Process (POMDP) is a general and systematic framework for motion planning

under uncertainty. To cope with dynamic environment well, we often need to modify

the POMDP model during runtime. However, despite recent tremendous advances

in POMDP planning, most solvers are not fast enough to generate a good solution

when the POMDP model changes during runtime. Recent progress in online POMDP

solvers have shown promising results. However, most online solvers are based on

replanning, which recompute a solution from scratch at each step, discarding any

solution that has been computed so far, and hence wasting valuable computational

resources. In this paper, we propose a new online POMDP solver, called Adaptive

Belief Tree (ABT), that can reuse and improve existing solution, and update the

solution as needed whenever the POMDP model changes. Given enough time, ABT

converges to the optimal solution of the current POMDP model in probability. Prelim-

inary results on three distinct robotics tasks in dynamic environments are promising.

In all test scenarios, ABT generates similar or better solutions faster than the fastest

online POMDP solver today; using an average of less than 50 ms of computation

time per step.

Vinay Yadav—All work were done while the author was an internship student at University of

Queensland.

H. Kurniawati (

)

School of Information Technology

and Electrical Engineering, University of Queensland, Brisbane, Australia

e-mail: hannakur@uq.edu.au

V. Yadav

Department of Electrical Engineering, Indian Institute of Technology, Kharagpur, India

e-mail: vinayyadav.iitkgp@gmail.com

M. Inaba and P. Corke (eds.), Robotics Research, Springer Tracts

in Advanced Robotics 114, DOI 10.1007/978-3-319-28872-7_35

611

612 H. Kurniawati and V. Yadav

1 Introduction

Motion planning under uncertainty is important for reliable robot operation in imper-

fectly known and dynamic environments. Partially Observable Markov Decision

Process (POMDP) is a general and systematic framework for planning under uncer-

tainty. Motion planning under uncertainty problems can be modelled as POMDPs

quite naturally. However, solving a POMDP problem exactly is computationally

intractable [14]. A lot of effort and tremendous progress have been made in devel-

oping efﬁcient approximate POMDP solvers [5, 6, 11, 15, 16, 19, 21, 24], such

that today, we have POMDP solvers that can solve simple to moderately difﬁcult

motion planning problems within seconds to minutes [7, 8, 12]. However, many of

these solvers are not efﬁcient enough to recompute or update its solution when the

POMDP model changes during runtime. Such changes in the POMDP model are

often required when a robot operates in dynamic environment. This paper proposes

a new approximate POMDP solver that improves and updates its solution online,

following changes in the environment.

In imperfectly known and dynamic environment, a robot rarely knows its exact

state due to errors in control and sensing. POMDP provides a systematic way to reason

about the best action to perform when perfect state information is unavailable. It ﬁnds

the best action with respect to the set of states that are consistent with the available

information so far. The set of states is represented as a probability distribution, called

a belief b, and the set of all possible beliefs is called the belief space B. A POMDP

solver calculates an optimal policy π

∗

: B → A that maps a belief in B to an action

in the set A of all possible actions the robot can perform, so as to maximize a

given objective function. An ofﬂine POMDP solver computes this mapping prior to

execution, while an online POMDP solver computes the mapping during runtime.

Methods that use POMDP framework to solve planning in imperfectly known

and dynamic environment can be classiﬁed into two approaches. The ﬁrst approach

embeds all possible environments and their dynamics as part of the POMDP model.

It uses an ofﬂine POMDP solver to ﬁnd a good policy, prior to execution. When

the environment and its dynamics are largely unknown, this approach constructs

POMDP models too huge to be solved by even the best ofﬂine solver today.

The second approach models only the known part of the environment and its

dynamics (both stochastic and deterministic), and allows the model to change during

execution when more information about the environment becomes available. Key to

the success of this approach is an efﬁcient online POMDP solver that can compute

a good policy during runtime, following changes in the POMDP model.

Online POMDP solvers have advanced signiﬁcantly over the past few years

[5, 6, 18, 19]. However, most of these solvers [5, 6, 18] are based on replanning,

which recompute the best action to perform from scratch at each step, discarding any

policy that has been computed so far. As a result, these solvers often waste signiﬁcant

computational resources when changes happen gradually or only to some part of the

environment, which are often the case in robotics tasks.

An Online POMDP Solver for Uncertainty Planning in Dynamic Environment 613

This paper proposes a new online POMDP solver, called Adaptive Belief Tree

(ABT), that reuses and improves existing policy at each time step, and update the

policy as needed whenever the POMDP model changes. To enable fast policy update,

ABT uses the following two observations. First, a change in the POMDP model is

directly reﬂected as a change in the robot’s behaviour at a particular set of states.

Second, a change in one optimal mapping π

∗

(b) from a belief b, may affect the

optimal policy π

∗

(.) at other beliefs that can reach b. Using insight from these

observations, ABT represents the policy as pairs of belief and action, and explicitly

represents the relation between beliefs, states, and their reachability information, so

that it can quickly identify subset of the policy affected by changes in the POMDP

model and update the policy fast whenever necessary. To quickly generate a good

policy, ABT plans with respect to only a set of representative sampled beliefs. It

represents each belief as a set of state particles, and samples a belief b by sampling

a set of state trajectories from a particle of the given initial belief b

. An effective

strategy for sampling state trajectories enables ABT to converge to an optimal policy

in probability, and quickly generate a good policy. Preliminary results on three distinct

robotics tasks in dynamic environment indicate that ABT can generate similar or

better motion strategies faster than the best online POMDP solver today [19]. In all

three test scenarios, ABT requires an average of less than 400ms of preprocessing

time, and an average of less than 50 ms of online computation time per step.

Furthermore, ABT is designed for POMDP problems with continuous state space

and uses a generative model. A generative model is a black box simulator that enables

us to generate experiences about the system dynamic and behaviour at various dif-

ferent states. By using a generative model, ABT does not need an explicit model on

control error, observation error, and uncertainty about the system dynamics, which

are often difﬁcult to obtain in complex robotics tasks.

2 Related Work

2.1 POMDP Background

A POMDP is deﬁned as a tuple S, A, O, T , Z, R, b

, γ , where S is the set of states,

A is the set of actions, and O is the set of observations. At each step, the agent is in a

state s ∈ S, takes an action a ∈ A, moves from s to an end state s



∈ S, and perceives

an observation o ∈ O. Due to action uncertainty, the system dynamic from s to s



is represented as a conditional probability function T (s, a, s



) = f (s



|s, a). Further-

more, due to sensing uncertainty, after performing action a and ends at state s



, the

observation that may be perceived by the agent is represented as a conditional prob-

ability function Z(s



, a, o) = f (o|s



, a). At each step, the agent receives a reward

R(s, a), if it takes action a from state s. The agent’s goal is to choose a suitable

sequence of actions that will maximize its expected total reward, while the agent’s

initial belief is denoted as b

. When the sequence of actions has inﬁnite length, we

614 H. Kurniawati and V. Yadav

specify a discount factor γ ∈ (0, 1) so that the total reward is ﬁnite and the problem

is well deﬁned.

In many problems with large state space, explicit representation of the conditional

probability functions T and Z may not be available. However, one can use a gener-

ative model, which is a black box simulator that outputs an observation perceived,

reward received, and next state visited when the agent performs the input action from

the input state.

A POMDP planner computes an optimal policy that maximizes the agent’s

expected total reward. A POMDP policy π : B → A assigns an action a to each

belief b ∈ B. A policy π induces a value function V (b, π ) which speciﬁes the

expected total reward of executing policy π from belief b, and is computed as

V (b, π ) = E[



∞

t=0

R(s

, a

)|b, π]. A policy can be represented by various rep-

resentations, e.g., policy-graph [3] and pairs of belief and action [23]. However,

most online solvers do not maintain an explicit representation of the policy. Instead,

they calculate the mapping from beliefs to actions on the ﬂy. In contrast, our pro-

posed online solver maintains an explicit representation of a subset of the policy, and

improves and updates the policy on the ﬂy.

To execute a policy π, an agent executes action selection and belief update repeat-

edly. For example, if the agent’s current belief is b, it selects the action referred

to by a = π(b). After the agent performs action a and receives an observation o

according to the observation function Z , it updates b to a new belief b



given by



) = τ (b, a, o) = ηZ(s



, a, o)



s∈S

T (s, a, s



)ds where η is a normalization con-

stant. We use generative model and represents each belief with a set of particles. The

belief update is approximated using particle ﬁlter.

2.2 Related POMDP Solvers

POMDP is a systematic and general approach for planning under uncertainty.

Although solving a POMDP exactly is computationally intractable [14], the past

few years have seen tremendous increase in the capability of both ofﬂine and online

POMDP solvers [11, 19, 21], such that POMDP approach is now practical for solving

simple to moderately difﬁcult motion planning problems.

The fastest ofﬂine POMDP solvers today are based on point-based approach

[11, 15, 20, 21]. This approach reduces the complexity of planning in the belief

space B by representing B as a set of sampled beliefs and planning with respect

to this set only. To generate a policy, most point-based POMDPs use value iter-

ation, utilizing the fact that the optimal value function satisﬁes Bellman equa-

tion. They start from an initial policy, represented as a value function V . And

iteratively perform Bellman backup on V at the sampled beliefs, i.e., V (b) =

max

a∈A



R(b, a) + γ



o∈O

τ (b, a, o)

∗

(τ (b, a, o))



where

∗



) is the current

best value of b



. The iteration is performed until it converges. Over the past few

years, impressive progress have been gained by improving the strategy for sampling

剩余18页未读，继续阅读

评论收藏

内容反馈

_Little_Coder_

粉丝: 11
资源: 4

An Online POMDP Solver for Uncertainty Planning in Dynamic Envir...

评论0

最新资源

An Online POMDP Solver for Uncertainty Planning in Dynamic Envir...

评论0

论文研究-一种基于独立任务的POMDP问题的解决方法.pdf

规划求解工具SOLVER

solver_2013.pdf

基于POMDP的可伸缩视频流优化决策调度

具有一步贪婪算法的 POMDP 求解器：使用 POMDP 一步贪婪算法驱动机器人穿越世界以达到目标位置。-matlab开发

改进GA3C求解POMDP的深度强化学习网络模型.pdf

PEGASUS：大型MDP和POMDP的策略搜索方法.pdf

论文研究-基于POMDP的负载均衡贪婪转发策略.pdf

论文研究-基于POMDP的信道感知接入算法.pdf

基于循环卷积神经网络的POMDP值迭代算法.pdf

论文研究-Gap-based Belief Selection in Point-based Value Iteration for POMDP.pdf

Probabilistic Robotics .pdf

Decision Making in Multiagent Settings

基于POMDP模型的分布式机会频谱接入算法.pdf

POMDP代码资料.rar_pomdp代码_导航路径规划_路径规划_路径规划 车_车导航路径

pomdp-solve-v4.0.tar.gz_POMDP C/C++_pomdp_马尔科夫_马尔科夫决策

论文研究-部分可观环境中AUV软件自修复模型研究.pdf

机器学习与POMDP的关系

despot:DESPOT在线POMDP求解器

pomdp-py:构建和解决POMDP问题的框架。 文献资料

Academic+Phrasebank+2021+Edition+_中英文对照.pdf

基于python的超市管理系统的设计与实现毕业论文+项目文档源码

1000套计算机毕业设计带源码

数模国赛word模板.zip

2023高教社数学建模C题 - 蔬菜类商品的自动定价与补货决策【数据处理详细代码】

2021年国赛A题（FAST主动反射面形状调节）论文+代码材料.zip

Python大作业（包含论文）——可打包的双人五子棋程序

基于高校校园网的网络规划设计与实现-以锦城学院为例-kaic.docx

最新资源

POMDP代码资料.rar_pomdp代码_导航路径规划_路径规划_路径规划车_车导航路径

pomdp-py:构建和解决POMDP问题的框架。文献资料