概率模型检验与自治_ProbabilisticModelCheckingandAutonomy_ProbabilisticModelChecking"（概率模型检验）模型资源-CSDN文库

版权申诉

158 浏览量 2022-01-27 03:28:27 上传评论收藏 2.46MB PDF 举报

资源推荐

资源详情

资源评论

Probabilistic Model

Checking and Autonomy

Marta Kwiatkowska,

Gethin Norman

and

David Parker

Department of Computer Science, University of Oxford, UK, OX1 3QD; email:

marta.kwiatkowska@cs.ox.ac.uk

School of Computing Science, University of Glasgow, UK, G12 8RZ; email:

gethin.norman@glasgow.ac.uk

School of Computer Science, University of Birmingham, UK; email:

d.a.parker@cs.bham.ac.uk

Posted with permission from the Annual

Review of Control, Robotics, and

Autonomous Systems, Volume 5;

https://www.annualreviews.org/

Keywords

probabilistic modelling, temporal logic, model checking, strategy

synthesis, stochastic games, equilibria

Abstract

Design and control of autonomous systems that operate in uncertain

or adversarial environments can be facilitated by formal modelling and

analysis. Probabilistic model checking is a technique to automatically

verify, for a given temporal logic speciﬁcation, that a system model

satisﬁes the speciﬁcation, as well as to synthesise an optimal strategy

for its control. This method has recently been extended to multi-agent

systems that exhibit competitive or cooperative behaviour modelled via

stochastic games and synthesis of equilibria strategies. In this paper,

we provide an overview of probabilistic model checking, focusing on

models supported by the PRISM and PRISM-games model checkers.

This includes fully observable and partially observable Markov decision

processes, as well as turn-based and concurrent stochastic games, to-

gether with associated probabilistic temporal logics. We demonstrate

the applicability of the framework through illustrative examples from

autonomous systems. Finally, we highlight research challenges and sug-

gest directions for future work in this area.

arXiv:2111.10630v1 [cs.LO] 20 Nov 2021

1. INTRODUCTION

As autonomous systems become embedded within computing infrastructure, from informa-

tion systems through to security and robotics, there is a growing need for methodologies

that ensure their safe, secure, reliable, timely and resource eﬃcient execution. Design of

computer systems can be facilitated by formal modelling and veriﬁcation, and in partic-

ular model checking, which aims to automatically check if a system model satisﬁes given

requirements typically expressed in temporal logic. Autonomy, however, creates additional

demands of controllability, since autonomous systems operate in uncertain or adversarial

environments, and strategic reasoning, to ensure eﬀective coordination of cooperative or

competitive behaviour of system components (agents).

Probabilistic model checking is a collection of techniques for the modelling of systems

that exhibit probabilistic and non-deterministic behaviour, which supports not only their

model checking against temporal logic, but also synthesis of optimal controllers (strate-

gies) from temporal logic speciﬁcations. Probability is used to quantify environmental

uncertainty and stochasticity, while non-determinism represents model decisions. Markov

decision processes (MDPs) are typically employed to model and reason about the strategic

behaviour of an agent against a stochastic environment, where speciﬁcations are expressed

in probabilistic extensions of the temporal logics CTL or LTL. Partially observable Markov

decision processes (POMDPs) permit similar modelling and analysis, but for contexts where

the agent has limited power to observe its environment.

MDPs and POMDPs, however, are unable to faithfully represent the behaviour of mul-

tiple players competing or cooperating to achieve their individual goals. To this end, we

employ multi-agent systems modelled via stochastic games and reason about their strate-

gic behaviour for both zero-sum and nonzero-sum (equilibria) properties. For zero-sum

properties, the utilities of an agent are the negation of the utility of its opponent, whereas

for nonzero-sum each agent is pursuing its own quantitative objective. Probabilistic model

checking has been recently extended to encompass both turn-based and concurrent stochas-

tic games, together with an extension of the temporal logic that inherits the coalition op-

erator from ATL, as well as synthesis of optimal Nash equilibria strategies (more precisely,

subgame-perfect social welfare optimal strategies).

In this paper, we provide an overview of recent advances in probabilistic model check-

ing, focusing on the model checking and strategic reasoning methods implemented in the

PRISM (1) and PRISM-games (2) tools for discrete probabilistic models. The review covers

fully observable and partially observable Markov decision processes (Sections 2 and 3 re-

spectively), as well as turn-based and concurrent stochastic games (Sections 4 and 5 respec-

tively), together with associated probabilistic temporal logics. We discuss the core types of

quantitative analyses available for each model, as well as extensions such as multi-objective

analysis and continuous-time, also called real-time in the model checking literature, models

(Section 6). We demonstrate the applicability of the framework through illustrative exam-

ples, with emphasis on the areas of robotics and autonomy. Finally, we highlight challenges

and suggest directions for future work in this area (Section 7).

2. MARKOV DECISION PROCESSES

We begin with Markov decision processes (MDPs) (3), which are a classic model for deci-

sion making under uncertainty. This is a discrete-time model, with discrete sets of states

and actions, that allows both non-determinism, e.g., to represent the choices made by the

2 Kwiatkowska et al.

east

south

east

0.8

0.6

{goal}

{hazard}

0.2

south

0.6

0.4

east

north

west

east

south

east

0.2

0.8

0.2

0.9

0.1

Figure 1

Left: A simple MDP representing a robot navigating through a grid; a (deterministic, memoryless)

optimal policy for the property P

max=?

[ ¬hazard U goal ] is marked in bold. Right: A topological

map from (4) used to build a similar style MDP modelling a mobile robot exploring a building.

controller of a robot or vehicle, and discrete probabilistic choice, to model environmental

uncertainty arising due to, for instance, the presence of humans, noisy sensors, unreliable

communication media or faulty hardware.

We give a formal deﬁnition of MDPs below. Here, and in the remainder of the pa-

per, Dist(X) denotes the set of (discrete) probability distributions over a ﬁnite set X, i.e.,

functions µ : X → [0, 1] such that

x∈X

µ(x) = 1.

Deﬁnition 1 (Markov decision process). A Markov decision process (MDP) is a tuple

M = (S, ¯s, A, δ, AP, L) where:

• S is a ﬁnite set of states and ¯s ∈ S is an initial state;

• A is a ﬁnite set of actions;

• δ : (S×A) → Dist(S) is a (partial) probabilistic transition function, mapping state-

action pairs to probability distributions over S;

• AP is a set of atomic propositions and L : S → 2

is a state labelling function.

The execution of an MDP M proceeds as follows. When in a state s, there is a non-

deterministic choice over the actions that are available in the state, deﬁned as the actions

a ∈ A such that δ(s, a) is deﬁned and denoted A(s). It is assumed that the set of available

actions is non-empty for every state. After an action a ∈ A(s) has been chosen in s, it is

performed and the probability of transitioning to state s

∈ S equals δ(s, a)(s

Example 1. A simple example of an MDP is shown in Figure 1 (left); it models the

movement of a robot through locations in a 3 × 2 grid. Each state (s

) represents a location

and actions taken in states result in probabilistic transitions to other locations. For example,

in state s

there is a choice between moving east and southeast; if east is chosen, then with

probability 0.6 the robot moves east and with probability 0.4 the robot remains in its

current location. Also shown are atomic propositions (goal and hazard) needed for property

speciﬁcation. Figure 1 (right) shows a topological map used to build a larger, similar-style

MDP modelling a mobile robot traversing locations within an oﬃce building (4).

A path of M is deﬁned by an alternating sequence of action choices and transitions. More

formally, a path is a ﬁnite or inﬁnite sequence π = s

−→ s

−→ · · · such that s

= ¯s,

∈ A(s

) and δ(s

, a

)(s

i+1

) > 0 for all i > 0. FPaths

and IPaths

denote the sets of

ﬁnite and inﬁnite paths of M, respectively.

www.annualreviews.org

•

Probabilistic Model Checking and Autonomy 3

We next introduce the notion of a strategy (often also called a policy) of an MDP M,

which resolves the non-determinism present in M. In particular, strategies decide which

actions to take in states of the MDP, depending on its execution to date.

Deﬁnition 2 (MDP strategy). A strategy of an MDP M is a function σ : FPaths

→

Dist(A) such that, if σ(π)(a) > 0, then a ∈ A(last(π)) where last(π) is the ﬁnal state of π.

The set of all strategies of M is denoted Σ

. We classify a strategy σ ∈ Σ

in terms of its

use of randomisation and memory.

• Randomisation: σ is deterministic (or pure) if σ(π) picks a single action with

probability 1 for all ﬁnite paths π, and randomised otherwise.

• Memory: σ is memoryless if σ(π) depends only on last(π) for all ﬁnite paths π, and

ﬁnite-memory if there are ﬁnitely many modes such that, for any π, σ(π) depends only

on last(π) and the current mode, which is updated each time an action is performed;

otherwise, it is inﬁnite-memory.

Under a particular strategy, the behaviour of MDP M is fully probabilistic and we can reason

about the probability of diﬀerent events. For a strategy σ of M, we denote by FPaths

and

IPaths

the set of ﬁnite and inﬁnite paths that correspond to the choices of σ. Following (5),

we can deﬁnite a probability measure Prob

over IPaths

that corresponds to the behaviour

of the MDP under σ. Using this probability measure we can then also deﬁne, for a random

variable X : IPaths

→ R, the expected value E

(X) of X under σ.

Random variables can be used to introduce a variety of quantitative properties of MDPs.

This is often achieved by augmenting an MDP with reward structures (these can in some

cases represent costs, but for consistency we will use the term rewards). Example applica-

tions of rewards include: the energy consumption of a device, the number of tasks completed

by a robot or the number of packets lost by a communication protocol.

Deﬁnition 3 (MDP reward structure). A reward structure for an MDP M is a tuple

r = (r

, r

), where r

: S → R

is a state reward function and r

: (S×A) → R

is an

action reward function.

2.1. Property Speciﬁcations for MDPs

In order to formally specify the required behaviour of a system modelled as an MDP, we

use quantitative extensions of temporal logic. Below, we show a fragment of the logic used

as the property speciﬁcation language for the PRISM model checker (1), which we refer to

here as the PRISM logic. This is based on the logics PCTL (probabilistic computation tree

logic) (6) and LTL (linear temporal logic) (7), and also incorporates operators to specify

expected reward properties (8).

Deﬁnition 4 (Property syntax). The syntax for a core fragment of the PRISM logic is:

= P

p

[ ψ ] | R

q

[ ρ ]

= φ | ¬ψ | ψ ∧ ψ | X ψ | ψ U

ψ | ψ U ψ

= I

| C

| F φ

= true | a | ¬φ | φ ∧ φ

where a ∈ AP is an atomic proposition,  ∈{<, 6, >, >}, p ∈ [0, 1], r is a reward structure,

q ∈ R

and k ∈ N.

4 Kwiatkowska et al.

Above, we assume that a property Φ for an MDP comprises a single probabilistic (P) or

reward (R) operator. The syntax also includes path (ψ) and reward (ρ) formulae, both

evaluated over paths, and propositional logic (φ) formulae, evaluated over states. The

intuitive meaning of the P and R operators, from the initial state of an MDP, is:

• P

p

[ ψ ] – the probability of a path satisfying path formula ψ satisﬁes the bound  p;

• R

q

[ ρ ] – the expected value of reward formula ρ, under reward structure r, satisﬁes

the bound  q.

A propositional formula φ is satisﬁed (or holds) in a state s if it evaluates to true in that

state, where an atomic proposition a is true if s is labelled with a (i.e., a ∈ L(s)) and the

logical connectives (¬, ∧) are interpreted in the usual way.

For path formulae ψ, the core temporal operators are:

• X ψ (next) – ψ is satisﬁed in the next state;

• ψ

(bounded until) – ψ

is satisﬁed within k steps, and ψ

is satisﬁed until

that point;

• ψ

U ψ

(until) – ψ

is eventually satisﬁed, and ψ

is satisﬁed until then.

As is standard in model checking, we use the equivalences F ψ ≡ true U ψ (eventually)

and G ψ ≡ ¬F ¬ψ (always). If we restrict the sub-formulae of a path formula to be atomic

propositions, then we get the following common property classes:

• F a (reachability) – eventually a stated labelled with a is reached;

• G a (invariance) – a labels all states;

• F

a (step-bounded reachability) – a labels a state within the ﬁrst k steps;

• G

a (step-bounded invariance) – a labels states for at least the ﬁrst k steps.

Without this restriction, path formulae allow temporal operators to be nested. In fact the

syntax of path formulae given in Deﬁnition 4 is that of linear temporal logic (LTL) (7).

LTL can express a range of useful property classes, including:

• G F ψ (recurrence) – ψ is satisﬁed inﬁnitely often;

• F G ψ (persistence) – eventually ψ is always satisﬁed;

• G (ψ

→ X ψ

) – whenever ψ

is satisﬁed, ψ

is satisﬁed in the next state;

• G (ψ

→ F ψ

) – whenever ψ

is satisﬁed, ψ

is satisﬁed in the future.

Finally, considering reward formulae ρ, the three key operators are:

• I

(instantaneous reward) – state reward at time step k;

• C

(bounded cumulative reward) – reward accumulated over k steps;

• F φ (reachability reward ) – reward accumulated until a state satisfying φ is reached.

Although omitted from the syntax here for simplicity, it is also common to generalise the

third case and consider the expected reward accumulated until some co-safe LTL formula

is satisﬁed. Intuitively, these are path formulae ψ whose satisfaction occurs within ﬁnite

time; examples include (F a

) ∧ (F a

) and F (a

∧ F a

), which require states labelled with

and a

to be reached, either in any order (ﬁrst case) or in a speciﬁed order (second case).

www.annualreviews.org

•

Probabilistic Model Checking and Autonomy 5

剩余25页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6453
资源: 9万+

概率模型检验与自治_Probabilistic Model Checking and Autonomy

最新资源

概率模型检验与自治_Probabilistic Model Checking and Autonomy

蚁群寻屋的空间相关概率模型_A Spatially Dependent Probabilistic Model for Hous

Daphne Koller_Probabilistic Graphical Models

manual for Probabilistic model checking prism

BLOB_A Probabilistic Model for Recommendation that Combines Organic and Bandit S

Machine_learning_A_Probabilistic_Perspective

点云拟合的概率超二次曲面_Probabilistic_Superquadrics_fitting_to_point_clouds

A_Neural_Probabilistic_Language_Model

Send_Hardest_Problems_My_Way__Probabilistic_Path_Prioritizati

CMPPF_IEEE34.zip_IEEE34潮流计算_Probabilistic monte_概率 潮流_概率潮流_蒙特卡洛

GSM.rar_GSM_Markov channel model_Probabilistic_gsm matlab_markov

2011_CVPR_Probabilistic simultaneous pose and non-rigid shape recovery

Approaches to Probabilistic Model Learning for Mobile Manipulation Robots

Bayesian Methods for Hackers_ Probabilistic Programming and Bayesian Inference

PMAPS.rar_probabilistic power_无迹变换_概率潮流_电网_能源

NLP：一种神经概率语言模型A Neural Probabilistic Language Model.pdf

PNN.rar_machine learning_probabilistic neural_概率神经网络_神经网络控制

pom.zip_POM_Probabilistic_occupancy map_pom_

Cobalt Strike下载

北京邮电大学计算机考研复试笔试资料

计算机系统-笔记-HUN2021级

cs1.6老版本供下载

合成孔径雷达的经典成像算法cs(matlab)仿真代码（吐血整理，内容全，注释全）

港大CS（MSC）面试整理

合成孔径雷达RD CS OmegaK算法点目标仿真.rar

计算机科学导论原书第二版答案.zip

Cobalt-Strike-4.5

cobaltstrike4.3.zip

在dataGridView的列中出现日历选择控件的类型

541118 深入理解计算机系统（原书第三版）课程ppt

最新资源

CMPPF_IEEE34.zip_IEEE34潮流计算_Probabilistic monte_概率潮流_概率潮流_蒙特卡洛