没有合适的资源?快使用搜索试试~ 我知道了~
An easy and convenient form of dependence is Markov chain dependence. The Markov dependence is perfect for computer simulations since for producing a future realization of the chain, only the current state is needed.
资源推荐
资源详情
资源评论
ISyE8843A, Brani Vidakovic Handout 10
1 MCMC Methodology.
Independence of X
1
, . . . , X
n
is not critical for an approximation of the form E
θ|x
h(X) =
1
n
P
n
i=1
h(X
i
), X
i
∼
π(θ|x). In fact, when X’s are dependent, the ergodic theorems describe the approximation.
An easy and convenient form of dependence is Markov chain dependence. The Markov dependence is
perfect for computer simulations since for producing a future realization of the chain, only the current state
is needed.
1.1 Theoretical Background and Notation
Random variables X
1
, X
2
, . . . , X
n
, . . . constitute a Markov Chain on continuous state space if they possess
a Markov property,
P (X
n+1
∈ A|X
1
, . . . , X
n
) = P (X
n+1
∈ A|X
1
, . . . , X
n
) = Q(X
n
, A) = Q(A|X
n
),
for some probability distribution Q. Typically, Q is assumed a time-homogeneous, i.e., independent on n
(“time”). The transition (from the state n to the state n + 1) kernel defines a probability measure on the state
space and we will assume that the density q exists, i.e.,
Q(A|X
n
= x) =
Z
A
q(x, y)dy =
Z
A
q(y|x)dy.
Distribution Π is invariant, if for all measurable sets A
Π(A) =
Z
Q(A|x)Π(dx).
If the transition density π exists, it is stationary if q(x|y)π(y) = q(y|x)π(x). Here and in the sequel we
assume that the density for Π exists, Π(A) =
R
A
π(x)dx.
A distribution Π is an equilibrium distribution if for Q
n
(A|x) = P (X
n
∈ A|X
0
= x),
lim
n→∞
Q
n
(A|x) = Π(A).
In plain terms, the Markov chain will forget the initial distribution and will converge to the stationary distri-
bution.
The Markov Chain is irreducible if for each A for which Π(A) > 0, and for each x, one can find n, so
that Q
n
(A|x) > 0.
The Markov Chain X
1
, . . . , X
n
, . . . is recurrent if for each B such that Π(B) > 0,
P (X
n
∈ B i.o.|X
0
= x) = 1, a.s.(in distribution of X
0
)
It is Harris recurrent if P (X
n
∈ B i.o.|X
0
= x) = 1, (∀x). The acronym i.o. stands for infinitely often.
1
Figure 1: Nicholas Constantine Metropolis, 1915-1999
1.2 Metropolis Algorithm
Metropolis algorithm is the fundamental to MCMC development.
Assume that the target distribution is known up to a normalizing constant. We would like to construct a
chain with π as its stationary distribution.
As in ARM, we take a proposal distribution q(x, y) = q(y|x), where the proposal for a new value of a
chain is y, given that the chain is at value x.
Thus q defines transition kernel Q(A, x) =
R
A
q(y|x)dx which is the probability of transition to some
y ∈ A.
Detailed Balance Equation. A Markov Chain with transition density q(x, y) = q(y|x) satisfies detailed
balance equation if there exists a distribution f such that
q(y|x)f(x) = q(x|y)f(y). (1)
The distribution f is stationary (invariant) and the chain is reversible.
Indeed, if (1) holds,
R
q(x|y)f(y)dy =
R
q(y|x)f(x)dy = f(x)
R
q(y|x)dy = f(x), which is the
definition of invariant distribution.
For a given target distribution π, the proposal q is admissible if
supp π(x) ⊂ ∪
x
supp q(·|x).
Metropolis-Hastings Algorithm is universal. One can select an arbitrary proposal distribution that is
admissible. Of course such arbitrary distribution/kernel cannot be expected to satisfy the detailed balance
equation (1) for the target distribution π, i.e,
q(y|x)π(x) 6= q(x|y)π(y).
Suppose (wlog)
q(y|x)π(x) > q(x|y)π(y).
Then there is a factor ρ(x, y) ≤ 1 such that the above inequality is balanced,
q(y|x) · ρ(x, y) · π(x) = q(x|y)π(y) · 1.
2
By solving with respect to ρ(x, y) one obtains,
ρ(x, y) =
q(x|y)π(y)
q(y|x)π(x)
∧ 1,
where a∧b denotes min{a, b}. What is the transition kernel corresponding to modified equation? q
M
(y|x) =
q(y|x)ρ(x, y) + 1(y = x)(1 −
R
q(y|x)ρ(x, y)dy).
Metropolis-Hastings Algorithm.
Assume that target distribution π is known up to the normalizing constant. This may be the case of
posteriors which are always known up to the proportionality constant as products of the likelihood and a
prior.
STEP 1 Start with arbitrary x
0
from the support of target distribution.
STEP 2 At stage n, generate proposal y from q(y|x
n
).
STEP 3
Take x
n+1
= y with probability ρ(x
n
, y) =
q(x
n
|y)π(y)
q(y|x
n
)π(x
n
)
∧ 1. Oth-
erwise, take x
n+1
= x
n
. This random acceptance is done by gen-
erating a uniform on (0,1) random variable U and accepting the
proposal y if U ≤ ρ(x
n
, y).
STEP 4 Increase n and return to STEP 2.
Some Common Choices for q.
If q(x|y) = q(y|x), i.e. if the kernel is symmetric, the acceptance ratio ρ(x, y) simplifies to
π(y)
π(x)
∧ 1,
since the proposal kernels from the numerator and denominator cancel. If in addition q depends on (x, y) via
|y − x|, i.e., q(x, y) = q
∗
(|y − x|), for some distribution q
∗
, the algorithm is called the Metropolis random
walk. A symmetric kernel is the original proposal from Metropolis et al. (1953).
If the proposal q(x, y) does not depend on x, i.e.,
q(y|x) = q(y),
the algorithm is called theindependence Metropolis. It is similar to the aceptance/rejection method (ARM)
but unlike the ARM, every step produces a realization from the target distribution. That realization may be
repeated many times which is the case when proposal is not accepted and current state is repeatedly taken
to be the new state.
1.2.1 Examples
[From Johnson and Albert (1999)] A small company improved a product and wants to infer about the
proportion of potential customers who will buy the product if the new product is preferred to the old one.
The company is certain that this proportion will exceed 0.5, i.e. and uses the uniform prior on [0.5, 1]. Out
of 20 customers surveyed, 12 prefer the new product. Find the posterior for p.
3
剩余11页未读,继续阅读
资源评论
王宏远
- 粉丝: 9
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功