このページは http://www.slideshare.net/xianblog/a-short-history-of-mcmc の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

5年以上前 (2011/03/28)にアップロードin学び

Talk given in Bristol, April 02, 2011, at the Julian Besag’s memorial

- A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

A Short History of Markov Chain Monte Carlo:

Subjective Recollections from Incomplete Data

Christian P. Robert and George Casella

Universit´

e Paris-Dauphine, IuF, & CRESt

and University of Florida

April 2, 2011 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

In memoriam, Julian Besag, 1945–2010 - Contents: Distinction between Metropolis-Hastings based

algorithms and those related with Gibbs sampling, and brief entry

into “second-generation MCMC revolution”.

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almost

as long as Monte Carlo techniques, even though impact on

Statistics not been truly felt until the late 1980s / early 1990s . - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

Introduction

Markov Chain Monte Carlo (MCMC) methods around for almost

as long as Monte Carlo techniques, even though impact on

Statistics not been truly felt until the late 1980s / early 1990s .

Contents: Distinction between Metropolis-Hastings based

algorithms and those related with Gibbs sampling, and brief entry

into “second-generation MCMC revolution”. - Several reasons:

lack of computing machinery

lack of background on Markov chains

lack of trust in the practicality of the method

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety of

situations only came to “mainstream statisticians” with Gelfand

and Smith (1990) despite earlier publications in the statistical

literature like Hastings (1970) and growing awareness in spatial

statistics (Besag, 1986) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Introduction

A few landmarks

Realization that Markov chains could be used in a wide variety of

situations only came to “mainstream statisticians” with Gelfand

and Smith (1990) despite earlier publications in the statistical

literature like Hastings (1970) and growing awareness in spatial

statistics (Besag, 1986)

Several reasons:

lack of computing machinery

lack of background on Markov chains

lack of trust in the practicality of the method - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Bombs before the revolution

Monte Carlo methods born in Los

Alamos, New Mexico, during WWII,

mostly by physicists working on atomic

bombs and eventually producing the

Metropolis algorithm in the early

1950’s.

[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Bombs before the revolution

Monte Carlo methods born in Los

Alamos, New Mexico, during WWII,

mostly by physicists working on atomic

bombs and eventually producing the

Metropolis algorithm in the early

1950’s.

[Metropolis, Rosenbluth, Rosenbluth, Teller and Teller, 1953] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesis

Monte Carlo method usually traced to

Ulam and von Neumann:

Stanislaw Ulam associates idea

with an intractable combinatorial

computation attempted in 1946

about “solitaire”

Idea was enthusiastically adopted

by John von Neumann for

implementation on neutron

diffusion

Name “Monte Carlo“ being

suggested by Nicholas Metropolis

[Eckhardt, 1987] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesis

Monte Carlo method usually traced to

Ulam and von Neumann:

Stanislaw Ulam associates idea

with an intractable combinatorial

computation attempted in 1946

about “solitaire”

Idea was enthusiastically adopted

by John von Neumann for

implementation on neutron

diffusion

Name “Monte Carlo“ being

suggested by Nicholas Metropolis

[Eckhardt, 1987] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo genesis

Monte Carlo method usually traced to

Ulam and von Neumann:

Stanislaw Ulam associates idea

with an intractable combinatorial

computation attempted in 1946

about “solitaire”

Idea was enthusiastically adopted

by John von Neumann for

implementation on neutron

diffusion

Name “Monte Carlo“ being

suggested by Nicholas Metropolis

[Eckhardt, 1987]

Before the revolution

Los Alamos

Monte Carlo genesis

Monte Carlo method usually traced to

Ulam and von Neumann:

Stanislaw Ulam associates idea

with an intractable combinatorial

computation attempted in 1946

about “solitaire”

Idea was enthusiastically adopted

by John von Neumann for

implementation on neutron

diffusion

Name “Monte Carlo“ being

suggested by Nicholas Metropolis

[Eckhardt, 1987]- Same year Ulam and von Neumann (re)invented inversion and

accept-reject techniques

In 1949, very first symposium on Monte Carlo and very first paper

[Metropolis and Ulam, 1949]

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo with computers

Very close “coincidence” with

appearance of very first

computer, ENIAC, born Feb.

1946, on which von Neumann

implemented Monte Carlo in

1947 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Los Alamos

Monte Carlo with computers

Very close “coincidence” with

appearance of very first

computer, ENIAC, born Feb.

1946, on which von Neumann

implemented Monte Carlo in

1947

Same year Ulam and von Neumann (re)invented inversion and

accept-reject techniques

In 1949, very first symposium on Monte Carlo and very first paper

[Metropolis and Ulam, 1949] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

The Metropolis et al. (1953) paper

Very first MCMC algorithm associated

with the second computer, MANIAC,

Los Alamos, early 1952.

Besides Metropolis, Arianna W.

Rosenbluth, Marshall N. Rosenbluth,

Augusta H. Teller, and Edward Teller

contributed to create the Metropolis

algorithm... - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Motivating problem

Computation of integrals of the form

F (p, q) exp{−E(p, q)/kT }dpdq

I =

,

exp{−E(p, q)/kT }dpdq

with energy E defined as

N

N

1

E(p, q) =

V (dij),

2 i=1 j=1

j=i

and N number of particles, V a potential function and dij the

distance between particles i and j. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Boltzmann distribution

Boltzmann distribution exp{−E(p, q)/kT } parameterised by

temperature T , k being the Boltzmann constant, with a

normalisation factor

Z(T ) =

exp{−E(p, q)/kT }dpdq

not available in closed form. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Computational challenge

Since p and q are 2N -dimensional vectors, numerical integration is

impossible

Plus, standard Monte Carlo techniques fail to correctly

approximate I: exp{−E(p, q)/kT } is very small for most

realizations of random configurations (p, q) of the particle system. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Metropolis algorithm

Consider a random walk modification of the N particles: for each

1 ≤ i ≤ N , values

xi = xi + αξ1i and yi = yi + αξ2i

are proposed, where both ξ1i and ξ2i are uniform U (−1, 1). The

energy difference between new and previous configurations is ∆E

and the new configuration is accepted with probability

1 ∧ exp{−∆E/kT } ,

and otherwise the previous configuration is replicated∗

∗counting one more time in the average of the F (pt, pt)’s over the τ moves

of the random walk. - Second part obtained via discretization of the space: Metropolis et

al. note that the proposal is reversible, then establish that

exp{−E/kT } is invariant.

Application to the specific problem of the rigid-sphere collision

model. The number of iterations of the Metropolis algorithm

seems to be limited: 16 steps for burn-in and 48 to 64 subsequent

iterations (that still required four to five hours on the Los Alamos

MANIAC).

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution. - Application to the specific problem of the rigid-sphere collision

model. The number of iterations of the Metropolis algorithm

seems to be limited: 16 steps for burn-in and 48 to 64 subsequent

iterations (that still required four to five hours on the Los Alamos

MANIAC).

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution.

Second part obtained via discretization of the space: Metropolis et

al. note that the proposal is reversible, then establish that

exp{−E/kT } is invariant. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Convergence

Validity of the algorithm established by proving

1. irreducibility

2. ergodicity, that is convergence to the stationary distribution.

Second part obtained via discretization of the space: Metropolis et

al. note that the proposal is reversible, then establish that

exp{−E/kT } is invariant.

Application to the specific problem of the rigid-sphere collision

model. The number of iterations of the Metropolis algorithm

seems to be limited: 16 steps for burn-in and 48 to 64 subsequent

iterations (that still required four to five hours on the Los Alamos

MANIAC). - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Physics and chemistry

The method of Markov chain Monte Carlo immediately

had wide use in physics and chemistry.

[Geyer & Thompson, 1992]

Hammersley and Handscomb, 1967

Piekaar and Clarenburg, 1967

Kennedy and Kutil, 1985

Sokal, 1989

&tc... - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Metropolis et al., 1953

Physics and chemistry

Statistics has always been fuelled by energetic mining of

the physics literature.

[Clifford, 1993]

Hammersley and Handscomb, 1967

Piekaar and Clarenburg, 1967

Kennedy and Kutil, 1985

Sokal, 1989

&tc... - Generic acceptance probability for a move from state i to state j is

sij

αij =

q

,

1 + πi ij

πj qji

where sij is a symmetric function.

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings defines MCMC methodology for

finite and reversible Markov chains, the continuous case being

discretised: - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

A fair generalisation

In Biometrika 1970, Hastings defines MCMC methodology for

finite and reversible Markov chains, the continuous case being

discretised:

Generic acceptance probability for a move from state i to state j is

sij

αij =

q

,

1 + πi ij

πj qji

where sij is a symmetric function. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

State of the art

Note

Generic form that encompasses both Metropolis et al. (1953) and

Barker (1965).

Peskun’s ordering not yet discovered: Hastings mentions that little

is known about the relative merits of those two choices (even

though) Metropolis’s method may be preferable.

Warning against high rejection rates as indicative of a poor choice

of transition matrix, but not mention of the opposite pitfall of low

rejection. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

What else?!

Items included in the paper are

a Poisson target with a ±1 random walk proposal,

a normal target with a uniform random walk proposal mixed with its

reflection (i.e. centered at −X(t) rather than X(t)),

a multivariate target where Hastings introduces Gibbs sampling,

updating one component at a time and defining the composed

transition as satisfying the stationary condition because each

component does leave the target invariant

a reference to Erhman, Fosdick and Handscomb (1960) as a

preliminary if specific instance of this Metropolis-within-Gibbs

sampler

an importance sampling version of MCMC,

some remarks about error assessment,

a Gibbs sampler for random orthogonal matrices - Proof direct consequence of Kemeny and Snell (1960) on

asymptotic variance. Peskun also establishes that this variance can

improve upon the iid case if and only if the eigenvalues of P − A

are all negative, when A is the transition matrix corresponding to

the iid simulation and P the transition matrix corresponding to the

Metropolis algorithm, but he concludes that the trace of P − A is

always positive.

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptance

probabilities and shows (again in a discrete setup) that Metropolis’

is optimal (in terms of the asymptotic variance of any empirical

average). - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Hastings, 1970

Three years later

Peskun (1973) compares Metropolis’ and Barker’s acceptance

probabilities and shows (again in a discrete setup) that Metropolis’

is optimal (in terms of the asymptotic variance of any empirical

average).

Proof direct consequence of Kemeny and Snell (1960) on

asymptotic variance. Peskun also establishes that this variance can

improve upon the iid case if and only if the eigenvalues of P − A

are all negative, when A is the transition matrix corresponding to

the iid simulation and P the transition matrix corresponding to the

Metropolis algorithm, but he concludes that the trace of P − A is

always positive. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on the

specification of joint distributions from conditional distributions

and on necessary and sufficient conditions for the conditional

distributions to be compatible with a joint distribution. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on the

specification of joint distributions from conditional distributions

and on necessary and sufficient conditions for the conditional

distributions to be compatible with a joint distribution.

[Hammersley and Clifford, 1971] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Julian’s early works (1)

Early 1970’s, Hammersley, Clifford, and Besag were working on the

specification of joint distributions from conditional distributions

and on necessary and sufficient conditions for the conditional

distributions to be compatible with a joint distribution.

What is the most general form of the conditional probability

functions that define a coherent joint function? And what will the

joint look like?

[Besag, 1972] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

Joint distribution of vector associated with a dependence graph

must be represented as product of functions over the cliques of the

graphs, i.e., of functions depending only on the components

indexed by the labels in the clique.

[Cressie, 1993; Lauritzen, 1996] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

A probability distribution P with positive and continuous density f

satisfies the pairwise Markov property with respect to an undirected

graph G if and only if it factorizes according to G , i.e., (F ) ≡ (G)

[Cressie, 1993; Lauritzen, 1996] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Hammersley-Clifford theorem

Theorem (Hammersley-Clifford)

Under the positivity condition, the joint distribution g satisfies

p

g (y |y , . . . , y

, y

, . . . , y )

j

j

1

j−1

g(y

j+1

p

1, . . . , yp) ∝

g (y |y , . . . , y

, y

, . . . , y )

j=1

j

j

1

j−1

j+1

p

for every permutation

on {1, 2, . . . , p} and every y ∈ Y .

[Cressie, 1993; Lauritzen, 1996] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

An apocryphal theorem

The Hammersley-Clifford theorem was never published by its

authors, but only through Grimmet (1973), Preston (1973),

Sherman (1973), Besag (1974). The authors were dissatisfied with

the positivity constraint: The joint density could only be recovered

from the full conditionals when the support of the joint was made

of the product of the supports of the full conditionals (with

obvious counter-examples. Moussouris’ counter-example put a full

stop to their endeavors.

[Hammersley, 1974] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the

(re?-)discovery of the Gibbs sampler. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

To Gibbs or not to Gibbs?

Julian Besag should certainly be credited to a large extent of the

(re?-)discovery of the Gibbs sampler.

The simulation procedure is to consider the sites

cyclically and, at each stage, to amend or leave unaltered

the particular site value in question, according to a

probability distribution whose elements depend upon the

current value at neighboring sites (...) However, the

technique is unlikely to be particularly helpful in many

other than binary situations and the Markov chain itself

has no practical interpretation.

[Besag, 1974] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Broader perspective

In 1964, Hammersley and Handscomb wrote a (the first?)

textbook on Monte Carlo methods: they cover

They cover such topics as

“Crude Monte Carlo“;

importance sampling;

control variates; and

“Conditional Monte Carlo”, which looks surprisingly like a

missing-data Gibbs completion approach.

They state in the Preface

We are convinced nevertheless that Monte Carlo methods

will one day reach an impressive maturity. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Clicking in

After Peskun (1973), MCMC mostly dormant in mainstream

statistical world for about 10 years, then several papers/books

highlighted its usefulness in specific settings:

Geman and Geman (1984)

Besag (1986)

Strauss (1986)

Ripley (Stochastic Simulation, 1987)

Tanner and Wong (1987)

Younes (1988) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Enters the Gibbs sampler

Geman and Geman (1984), building on Metropolis et al. (1953),

Hastings (1970), and Peskun (1973), constructed a Gibbs sampler

for optimisation in a discrete image processing problem without

completion.

Responsible for the name Gibbs sampling, because method used for

the Bayesian study of Gibbs random fields linked to the physicist

Josiah Willard Gibbs (1839–1903)

Back to Metropolis et al., 1953: the Gibbs sampler is used as a

simulated annealing algorithm and ergodicity is proven on the

collection of global maxima - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Besag (1986) integrates GS for SA...

...easy to construct the transition matrix Q, of a discrete

time Markov chain, with state space Ω and limit

distribution (4). Simulated annealing proceeds by

running an associated time inhomogeneous Markov chain

with transition matrices QT , where T is progressively

decreased according to a prescribed “schedule” to a value

close to zero.

[Besag, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...and links with Metropolis-Hastings...

There are various related methods of constructing a

manageable QT (Hastings, 1970). Geman and Geman

(1984) adopt the simplest, which they term the ”Gibbs

sampler” (...) time reversibility, a common ingredient in

this type of problem (see, for example, Besag, 1977a), is

present at individual stages but not over complete cycles,

though Peter Green has pointed out that it returns if QT

is taken over a pair of cycles, the second of which visits

pixels in reverse order

[Besag, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...seeing the larger picture,...

As Geman and Geman (1984) point out, any property of

the (posterior) distribution P (x|y) can be simulated by

running the Gibbs sampler at “temperature” T = 1.

Thus, if ˆ

xi maximizes P (xi|y), then it is the most

frequently occurring colour at pixel i in an infinite

realization of the Markov chain with transition matrix Q

of Section 2.3. The ˆ

xi’s can therefore be simultaneously

estimated from a single finite realization of the chain. It

is not yet clear how long the realization needs to be,

particularly for estimation near colour boundaries, but the

amount of computation required is generally prohibitive

for routine purposes

[Besag, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...seeing the larger picture,...

P (x|y) can be simulated using the Gibbs sampler, as

suggested by Grenander (1983) and by Geman and

Geman (1984). My dismissal of such an approach for

routine applications was somewhat cavalier:

purpose-built array processors could become relatively

inexpensive (...) suppose that, for 100 complete cycles

say, images have been collected from the Gibbs sampler

(or by Metropolis’ method), following a “settling-in”

period of perhaps another 100 cycles, which should cater

for fairly intricate priors (...) These 100 images should

often be adequate for estimating properties of the

posterior (...) and for making approximate associated

confidence statements, as mentioned by Mr Haslett.

[Besag, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameter

estimation] is to adopt maximum ”pseudo-likelihood”

estimation (Besag, 1975) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameter

estimation] is to adopt maximum ”pseudo-likelihood”

estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesian

paradigm

[Besag, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameter

estimation] is to adopt maximum ”pseudo-likelihood”

estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesian

paradigm

[Besag, 1986]

The pair (xi, βi) is then a (bivariate) Markov field and

can be reconstructed as a bivariate process by the

methods described in Professor Besag’s paper.

[Clifford, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

...if not going fully Bayes!

...a neater and more efficient procedure [for parameter

estimation] is to adopt maximum ”pseudo-likelihood”

estimation (Besag, 1975)

I have become increasingly enamoured with the Bayesian

paradigm

[Besag, 1986]

The simulation-based estimator EpostΨ(X) will differ

from the m.a.p. estimator ˆ

Ψ(x).

[Silverman, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Discussants of Besag (1986)

Impressive who’s who: D.M. Titterington, P. Clifford, P. Green, P.

Brown, B. Silverman, F. Critchley, F. Kelly, K. Mardia, C.

Jennison, J. Kent, D. Spiegelhalter, H. Wynn, D. and S. Geman, J.

Haslett, J. Kay, H. K¨

unsch, P. Switzer, B. Torsney, &tc - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

A comment on Besag (1986)

While special purpose algorithms will determine the

utility of the Bayesian methods, the general purpose

methods-stochastic relaxation and simulation of solutions

of the Langevin equation (Grenander, 1983; Geman and

Geman, 1984; Gidas, 1985a; Geman and Hwang, 1986)

have proven enormously convenient and versatile. We are

able to apply a single computer program to every new

problem by merely changing the subroutine that

computes the energy function in the Gibbs representation

of the posterior distribution.

[Geman and McClure, 1986] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Another one

It is easy to compute exact marginal and joint posterior

probabilities of currently unobserved features, conditional

on those clinical findings currently available

(Spiegelhalter, 1986a,b), the updating taking the form of

‘propagating evidence’ through the network (...) it would

be interesting to see if the techniques described tonight,

which are of intermediate complexity, may have any

applications in this new and exciting area [causal

networks].

[Spiegelhalter, 1986] - Why candidate?

“Equation (2) appeared without explanation in a Durham

University undergraduate final examination script of 1984.

Regrettably, the student’s name is no longer known to me.”

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

The candidate’s formula

Representation of the marginal likelihood as

π(θ)f (x|θ)

m(x)

π(θ|x)

or of the marginal predictive as

pn(y |y) = f (y |θ)πn(θ|y) πn+1(θ|y, y )

[Besag, 1989] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

The candidate’s formula

Representation of the marginal likelihood as

π(θ)f (x|θ)

m(x)

π(θ|x)

or of the marginal predictive as

pn(y |y) = f (y |θ)πn(θ|y) πn+1(θ|y, y )

[Besag, 1989]

Why candidate?

“Equation (2) appeared without explanation in a Durham

University undergraduate final examination script of 1984.

Regrettably, the student’s name is no longer known to me.” - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

Newton and Raftery (1994) used this representation to derive

the [infamous] harmonic mean approximation to the marginal

likelihood

Gelfand and Dey (1994)

Geyer and Thompson (1995)

Chib (1995) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

Newton and Raftery (1994)

Gelfand and Dey (1994) also relied on this formula for the

same purpose in a more general perspective

Geyer and Thompson (1995)

Chib (1995) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

Newton and Raftery (1994)

Gelfand and Dey (1994)

Geyer and Thompson (1995) derived MLEs by a Monte Carlo

approximation to the normalising constant

Chib (1995) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Before the revolution

Julian’s early works

Implications

Newton and Raftery (1994)

Gelfand and Dey (1994)

Geyer and Thompson (1995)

Chib (1995) uses this representation to build a MCMC

approximation to the marginal likelihood - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Impact

“This is surely a revolution.”

[Clifford, 1993]

Geman and Geman (1984) is one more spark that led to the

explosion, as it had a clear influence on Gelfand, Green, Smith,

Spiegelhalter and others.

Sparked new interest in Bayesian methods, statistical computing,

algorithms, and stochastic processes through the use of computing

algorithms such as the Gibbs sampler and the Metropolis–Hastings

algorithm. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Impact

“[Gibbs sampler] use seems to have been isolated in the spatial

statistics community until Gelfand and Smith (1990)”

[Geyer, 1990]

Geman and Geman (1984) is one more spark that led to the

explosion, as it had a clear influence on Gelfand, Green, Smith,

Spiegelhalter and others.

Sparked new interest in Bayesian methods, statistical computing,

algorithms, and stochastic processes through the use of computing

algorithms such as the Gibbs sampler and the Metropolis–Hastings

algorithm. - Lower impact:

emphasis on missing data problems (hence data augmentation)

MCMC approximation to the target at every iteration

K

1

π(θ|x) ≈

π(θ|x, zt,k) ,

zt,k ∼ ˆ

πt−1(z|x) ,

K k=1

too close to Rubin’s (1978) multiple imputation

theoretical backup based on functional analysis (Markov kernel had

to be uniformly bounded and equicontinuous)

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Data augmentation

Tanner and Wong (1987) has essentialy the same ingredients as

Gelfand and Smith (1990): simulating from conditionals is

simulating from the joint - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Final steps to

Data augmentation

Tanner and Wong (1987) has essentialy the same ingredients as

Gelfand and Smith (1990): simulating from conditionals is

simulating from the joint

Lower impact:

emphasis on missing data problems (hence data augmentation)

MCMC approximation to the target at every iteration

K

1

π(θ|x) ≈

π(θ|x, zt,k) ,

zt,k ∼ ˆ

πt−1(z|x) ,

K k=1

too close to Rubin’s (1978) multiple imputation

theoretical backup based on functional analysis (Markov kernel had

to be uniformly bounded and equicontinuous) - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Epiphany

In June 1989, at a Bayesian workshop in Sherbrooke,

Qu´

ebec, Adrian Smith exposed for the first time (?)

the generic features of Gibbs sampler, exhibiting a ten

line Fortran program handling a random effect model

Yij

=

θi + εij,

i = 1, . . . , K,

j = 1, . . . , J,

θi ∼ N(µ, σ2θ) εij ∼ N(0, σ2ε)

by full conditionals on µ, σθ, σε...

[Gelfand and Smith, 1990]

This was enough to convince the whole audience! - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -

Hastings algorithms would crack almost any problem!

Flood of papers followed applying MCMC:

linear mixed models (Gelfand et al., 1990; Zeger and Karim, 1991;

Wang et al., 1993, 1994)

generalized linear mixed models (Albert and Chib, 1993)

mixture models (Tanner and Wong, 1987; Diebolt and X., 1990,

1994; Escobar and West, 1993)

changepoint analysis (Carlin et al., 1992)

point processes (Grenander and Møller, 1994)

&tc - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

Garden of Eden

In early 1990s, researchers found that Gibbs and then Metropolis -

Hastings algorithms would crack almost any problem!

Flood of papers followed applying MCMC:

genomics (Stephens and Smith, 1993; Lawrence et al., 1993;

Churchill, 1995; Geyer and Thompson, 1995)

ecology (George and X, 1992; Dupuis, 1995)

variable selection in regression (George and mcCulloch, 1993)

spatial statistics (Raftery and Banfield, 1991)

longitudinal studies (Lange et al., 1992)

&tc - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992, relied on MCMC methods for ML

estimation

Smith and Roberts, 1993

Besag and Green, 1993

Tierney, 1994

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993 discussed convergence diagnoses and

applications, incl. mixtures for Gibbs and Metropolis–Hastings

Besag and Green, 1993

Tierney, 1994

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993

Besag and Green, 1993 stated the desideratas for

convergences, and connect MCMC with auxiliary and

antithetic variables

Tierney, 1994

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993

Besag and Green, 1993

Tierney, 1994 laid out all of the assumptions needed to

analyze the Markov chains and then developed their

properties, in particular, convergence of ergodic averages and

central limit theorems

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993

Besag and Green, 1993

Tierney, 1994

Liu, Wong and Kong, 1994,95 analyzed the covariance

structure of Gibbs sampling, and were able to formally

establish the validity of Rao-Blackwellization in Gibbs

sampling

Mengersen and Tweedie, 1996 - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993

Besag and Green, 1993

Tierney, 1994

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996 set the tone for the study of

the speed of convergence of MCMC algorithms to the target

distribution - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Gelfand and Smith, 1990

[some of the] early theoretical advances

“It may well be remembered as the afternoon of the 11 Bayesians”

[Clifford, 1993]

Geyer and Thompson, 1992,

Smith and Roberts, 1993

Besag and Green, 1993

Tierney, 1994

Liu, Wong and Kong, 1994,95

Mengersen and Tweedie, 1996

Gilks, Clayton and Spiegelhalter, 1993

&tc... - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

The Revolution

Convergence diagnoses

Convergence diagnoses

Can we really tell when a complicated Markov chain has

reached equilibrium? Frankly, I doubt it.

[Clifford, 1993]

Explosion of methods

Gelman and Rubin (1991)

Besag and Green (1992)

Geyer (1992)

Raftery and Lewis (1992)

Cowles and Carlin (1996) coda

Brooks and Roberts (1998)

&tc - Use of the term “particle” dates back to Kitagawa (1996), and Carpenter

et al. (1997) coined the term “particle filter”.

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlo

methods themselves!

[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]

Found in the molecular simulation literature of the 50’s with

self-avoiding random walks and signal processing

[Marshall, 1965; Handschin and Mayne, 1969] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Particles, again

Iterating importance sampling is about as old as Monte Carlo

methods themselves!

[Hammersley and Morton,1954; Rosenbluth and Rosenbluth, 1955]

Found in the molecular simulation literature of the 50’s with

self-avoiding random walks and signal processing

[Marshall, 1965; Handschin and Mayne, 1969]

Use of the term “particle” dates back to Kitagawa (1996), and Carpenter

et al. (1997) coined the term “particle filter”. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

Bootstrap filter and sequential Monte Carlo

Gordon, Salmon and Smith (1993) introduced the bootstrap filter

which, while formally connected with importance sampling,

involves past simulations and possible MCMC steps (Gilks and

Berzuini, 2001).

Sequential imputation was developped in Kong, Liu and Wong

(1994), while Liu and Chen (1995) first formally pointed out the

importance of resampling in “sequential Monte Carlo”, a term they

coined - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Particle systems

pMC versus pMCMC

Recycling of past simulations legitimate to build better

importance sampling functions as in population Monte Carlo

[Iba, 2000; Capp´

e et al, 2004; Del Moral et al., 2007]

Recent synthesis by Andrieu, Doucet, and Hollenstein (2010)

using particles to build an evolving MCMC kernel ˆ

pθ(y1:T ) in

state space models p(x1:T )p(y1:T |x1:T ), along with Andrieu’s

and Roberts’ (2009) use of approximations in MCMC

acceptance steps

[Kennedy and Kulti, 1985] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Reversible jump

Reversible jump

Generaly considered as the second Revolution.

Formalisation of a Markov chain moving across

models and parameter spaces allows for the

Bayesian processing of a wide variety of models

and to the success of Bayesian model choice

Definition of a proper balance condition on cross-model Markov

kernels gives a generic setup for exploring variable dimension

spaces, even when the number of models under comparison is

infinite.

[Green, 1995] - Outburst of papers, particularly from Jesper Møller and coauthors,

but the excitement somehow dried out [except in dedicated areas]

as construction of perfect samplers is hard and coalescence times

very high...

[Møller and Waagepetersen, 2003]

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to use

MCMC methods to produce an exact (or perfect) simulation from

the target. - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Perfect sampling

Perfect sampling

Seminal paper of Propp and Wilson (1996) showed how to use

MCMC methods to produce an exact (or perfect) simulation from

the target.

Outburst of papers, particularly from Jesper Møller and coauthors,

but the excitement somehow dried out [except in dedicated areas]

as construction of perfect samplers is hard and coalescence times

very high...

[Møller and Waagepetersen, 2003] - A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

After the Revolution

Envoi

To be continued...

...standing on the shoulders of giants