このページは http://www.slideshare.net/pklehre/a-tutorial-on-drift-analysis の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

4年以上前 (2012/07/06)にアップロードin学び

Drift analysis is among the most powerful theoretical tools available for estimating the optimisa...

Drift analysis is among the most powerful theoretical tools available for estimating the optimisation time of meta-heuristics. Informally, it shows how the challenging problem of predicting the long-term behaviour of a meta-heuristic can be reduced to the often trivial problem of describing how the state of the heuristic changes during one iteration.

Drift analysis has dramatically simplified the analysis of meta-heuristics. Many of the most important results about the optimisation time of meta-heuristics were obtained with the help of drift analysis.

This tutorial gives a gentle, yet comprehensive, introduction to drift analysis, assuming only basic knowledge of probability theory. We approach the area by examining a few simple drift theorems that are both straightforward to apply, and that yield useful bounds on the expected optimisation time. We then turn to more sophisticated drift theorems that, while needing stronger conditions, allow us to make very precise statements about the success probability of meta-heuristics. Finally, we show how to investigate complex evolutionary algorithms with the aid of a new population-drift theorem that was discovered recently.

- Drift Analysis - A Tutorial

Per Kristian Lehre

ASAP Research Group

School of Computer Science

University of Nottingham, UK

PerKristian.Lehre@nottingham.ac.uk

July 6, 2012 - Prediction of the long term behaviour of a process X

hitting time, stability, occupancy time etc.

from properties of ∆.

What is Drift Analysis? - What is Drift Analysis?

Prediction of the long term behaviour of a process X

hitting time, stability, occupancy time etc.

from properties of ∆. - What is Drift Analysis1?

“Drift” ≥ ε0

a

Yk

b

1NB! Drift is a different concept than genetic drift in evolutionary biology. - Runtime of (1+1) EA on Linear Functions [3]

Droste, Jansen & Wegener (2002) - Runtime of (1+1) EA on Linear Functions [2]

Doerr, Johannsen, and Winzen (GECCO 2010) - Some history

Origins

Stability of equilibria in ODEs (Lyapunov, 1892)

Stability of Markov Chains (see eg [14])

1982 paper by Hajek [6]

Simulated annealing [19]

Drift Analysis of Evolutionary Algorithms

Introduced to EC in 2001 by He and Yao [7, 8]

(1+1) EA on linear functions: O(n ln n) [7]

(1+1) EA on maximum matching by Giel and Wegener [5]

Simplified drift in 2008 by Oliveto and Witt [18]

Multiplicative drift by Doerr et al [2]

(1+1) EA on linear functions: en ln(n) + O(n) [22]

Variable drift by Johannsen [11] and Mitavskiy et al. [15]

Population drift by L. [12] - About this tutorial...

Assumes no or little background in probability theory

Main focus will be on drift theorems and their proofs

Some theorems are presented in a simplified form

full details are available in the references

A few simple applications will be shown

Please feel free to interrupt me with questions! - General Assumptions

a

Yk

b

Xk a stochastic process2 in some general state space X

Yk := g(Xk) were g : X → R is a “distance function”

Two stopping times τa and τb

τa := min{k ≥ 0 | Yk ≤ a}

τb := min{k ≥ 0 | Yk ≥ b}

where we assume −∞ ≤ a < b < ∞ and Y0 ∈ (a, b).

2not necessarily Markovian. - Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

3Some drift theorems need additional conditions. - Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

3Some drift theorems need additional conditions. - Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

3Some drift theorems need additional conditions. - Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

3Some drift theorems need additional conditions. - Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

3Some drift theorems need additional conditions. - E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

3Some drift theorems need additional conditions. - Overview of Tutorial

E [Yk+1 − Yk | Fk]

a

Yk

b

Drift Condition3

Statement

Note

E [Yk+1 | Fk] ≤ Yk − ε0

E [τa] ≤

Additive drift [7, 10]

Pr (τa > B1) ≤

[6]

Pr (τb < B2) ≤

Simplified drift [6, 17]

E [Yk+1 | Fk] ≥ Yk − ε0

≤ E [τa]

Additive drift (lower b.) [7, 9]

E [Yk+1 | Fk] ≤ Yk

E [τa] ≤

Supermartingale [16]

E [Yk+1 | Fk] ≤ (1 − δ)Yk

E [τa] ≤

Multiplicative drift [2, 4]

Pr (τa > B3) ≤

[1]

E [Yk+1 | Fk] ≥ (1 − δ)Yk

≤ E [τa]

Multipl. drift (lower b.) [13]

E [Yk+1 | Fk] ≤ Yk − h(Yk)

E [τa] ≤

Variable drift [11]

E eλYk+1 | Fk ≤ eλYk

Pr (τ

α

b < B) ≤

Population drift [12]

0

3Some drift theorems need additional conditions. - Part 1 - Basic Probability Theory
- R

X

6

5

4

Events

E ∈ F

3

Random Variable

2

X : Ω → R and X−1 : B → F

X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}

1

Expectation

E [X] :=

y Pr (X = y).

y

Basic Probability Theory

Probability Triple (Ω, F , Pr)

Ω : Sample space

F : σ-algebra (family of events)

Pr : F → R probability function

(satisfying probability axioms)

Ω - R

X

6

5

4

Events

E ∈ F

3

Random Variable

2

X : Ω → R and X−1 : B → F

X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}

1

Expectation

E [X] :=

y Pr (X = y).

y

Basic Probability Theory

Probability Triple (Ω, F , Pr)

Ω : Sample space

F : σ-algebra (family of events)

Pr : F → R probability function

(satisfying probability axioms)

Ω - R

X

6

5

4

3

Random Variable

2

X : Ω → R and X−1 : B → F

X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}

1

Expectation

E [X] :=

y Pr (X = y).

y

Basic Probability Theory

Probability Triple (Ω, F , Pr)

Ω : Sample space

F : σ-algebra (family of events)

Pr : F → R probability function

(satisfying probability axioms)

Events

E ∈ F

Ω - Expectation

E [X] :=

y Pr (X = y).

y

Basic Probability Theory

R

Probability Triple (Ω, F , Pr)

X

Ω : Sample space

6

F : σ-algebra (family of events)

5

Pr : F → R probability function

(satisfying probability axioms)

4

Events

E ∈ F

3

Random Variable

2

X : Ω → R and X−1 : B → F

X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}

1

Ω - Basic Probability Theory

R

Probability Triple (Ω, F , Pr)

X

Ω : Sample space

6

F : σ-algebra (family of events)

5

Pr : F → R probability function

(satisfying probability axioms)

4

Events

E ∈ F

3

Random Variable

2

X : Ω → R and X−1 : B → F

X = y ⇐⇒ {ω ∈ Ω | X(ω) = y}

1

Expectation

y Pr (X = y).

Ω

E [X] :=

y - E [X | E ] :=

x

x

x

x

E [X | Z = z]

E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)

Definition

Y = E [X | G ] if

1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B

2. E [|Y |] < ∞

3. E [Y IF ] = E [XIF ] for all F ∈ G

Conditional Expectation

Pr (X = x ∧ E )

Pr (X = x | E ) =

Pr (E ) - E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)

Definition

Y = E [X | G ] if

1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B

2. E [|Y |] < ∞

3. E [Y IF ] = E [XIF ] for all F ∈ G

Conditional Expectation

Pr (X = x ∧ E )

E [X | E ] :=

x Pr (X = x | E ) =

x

Pr (E )

x

x

E [X | Z = z] - Definition

Y = E [X | G ] if

1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B

2. E [|Y |] < ∞

3. E [Y IF ] = E [XIF ] for all F ∈ G

Conditional Expectation

Pr (X = x ∧ E )

E [X | E ] :=

x Pr (X = x | E ) =

x

Pr (E )

x

x

E [X | Z = z]

E [X | Z](ω) = E [X | Z = z] , where z = Z(ω) - Conditional Expectation

Pr (X = x ∧ E )

E [X | E ] :=

x Pr (X = x | E ) =

x

Pr (E )

x

x

E [X | Z = z]

E [X | Z](ω) = E [X | Z = z] , where z = Z(ω)

Definition

Y = E [X | G ] if

1. Y is G -measurable, ie, Y −1(A) ∈ G for all A ∈ B

2. E [|Y |] < ∞

3. E [Y IF ] = E [XIF ] for all F ∈ G - Three Properties

E [a1X1 + a2X2 | G ] = a1E [X1 | G ] + a2E [X2 | G ]

If X is G -measurable then E [X | G ] = X

If H ⊂ G , then E [E [X | G ] | H ] = E [X | H ] - Stochastic Processes and Filtration

Definition

A stochastic process is a sequence of rv Y1, Y2, . . .

A filtration is an increasing family of sub-σ-algebras of F

F0 ⊆ F1 ⊆ · · · ⊆ F

A stochastic process Yk is adapted to a filtration Fk

if Yk is Fk-measurable for all k

=⇒ Informally, Fk represents the information that has been

revealed about the process during the first k steps. - Stopping Time

Definition

A rv. τ : Ω → N is called a stopping time if for all k ≥ 0

{τ ≤ k} ∈ Fk

The information obtained until step k is sufficient

to decide whether the event {τ ≤ k} is true or not.

Example

The smallest k such that Yk < a in a stochastic process.

The runtime of an evolutionary algorithm - Zk := Yk + kε0

+ (k + 1)ε0

+ (k + 1)ε0

Example

Let ∆1, ∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0

Then the following sequence is a super-martingale

Yk := ∆1 + · · · + ∆k

E [Yk+1 | Fk] = ∆1 + · · · + ∆k + E [∆k+1 | Fk]

≤ ∆1 + · · · + ∆k − ε0

< Yk

Martingales

Definition (Supermartingale)

Any process Y st ∀k

1. Y is adapted to F

2. E [|Yk|] < ∞

0

20

40

60

80

100

3. E [Yk+1 | Fk] ≤ Yk

k - Zk := Yk + kε0

+ (k + 1)ε0

+ (k + 1)ε0

Martingales

Definition (Supermartingale)

Any process Y st ∀k

1. Y is adapted to F

2. E [|Yk|] < ∞

0

20

40

60

80

100

3. E [Yk+1 | Fk] ≤ Yk

k

Example

Let ∆1, ∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0

Then the following sequence is a super-martingale

Yk := ∆1 + · · · + ∆k

E [Yk+1 | Fk] = ∆1 + · · · + ∆k + E [∆k+1 | Fk]

≤ ∆1 + · · · + ∆k − ε0

< Yk - Martingales

Definition (Supermartingale)

Any process Y st ∀k

1. Y is adapted to F

2. E [|Yk|] < ∞

0

20

40

60

80

100

3. E [Yk+1 | Fk] ≤ Yk

k

Example

Let ∆1, ∆2, . . . be rvs with −∞ < E [∆k+1 | Fk] ≤ −ε0 for k ≥ 0

Then the following sequence is a super-martingale

Yk := ∆1 + · · · + ∆k

Zk := Yk + kε0

E [Zk+1 | Fk] = ∆1 + · · · + ∆k + E [∆k+1 | Fk] + (k + 1)ε0

≤ ∆1 + · · · + ∆k − ε0 + (k + 1)ε0 = Zk - Proof.

E [Yk | F0] = E [E [Yk | Fk] | F0]

≤ E [Yk−1 | F0] ≤ · · · ≤ E [Y0 | F0] = Y0

by induction on k

Example

Where is the process Y in the previous example after k steps?

Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]

Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising...

Martingales

Lemma

If Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0. - Example

Where is the process Y in the previous example after k steps?

Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]

Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising...

Martingales

Lemma

If Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0.

Proof.

E [Yk | F0] = E [E [Yk | Fk] | F0]

≤ E [Yk−1 | F0] ≤ · · · ≤ E [Y0 | F0] = Y0

by induction on k - Martingales

Lemma

If Y is a supermartingale, then E [Yk | F0] ≤ Y0 for all fixed k ≥ 0.

Proof.

E [Yk | F0] = E [E [Yk | Fk] | F0]

≤ E [Yk−1 | F0] ≤ · · · ≤ E [Y0 | F0] = Y0

by induction on k

Example

Where is the process Y in the previous example after k steps?

Y0 = Z0 ≥ E [Zk | F0] = E [Yk + kε0 | F0]

Hence, E [Yk | F0] ≤ Y0 − ε0k, which is not surprising... - Part 2 - Additive Drift
- Theorem ([7, 9, 10])

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R.

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0 ≤ b/ε0.

If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.

Additive Drift

ε0

a = 0

Yk

b

(C1+) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −ε0

(C1−) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≥ −ε0 - Additive Drift

ε0

a = 0

Yk

b

(C1+) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −ε0

(C1−) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≥ −ε0

Theorem ([7, 9, 10])

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R.

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0 ≤ b/ε0.

If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0. - Definition (Stopped Process)

Let Y be a stochastic process and τ a stopping time.

Yk

if k < τ

Yk∧τ := Yτ if k ≥ τ

But the “stopped process” Yk∧τ is a supermartingale, so

a

∀k

Y0 ≥ E [Yk∧τ | F

a

0]

∀k

Y0 ≥ E [Yk∧τ + (k ∧ τ

a

a)ε0 | F0]

Obtaining Supermartingales from Drift Conditions

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

Yk not necessarily a supermartingale,

because (C1) assumes Yk > a - Obtaining Supermartingales from Drift Conditions

Definition (Stopped Process)

Let Y be a stochastic process and τ a stopping time.

Yk

if k < τ

Yk∧τ := Yτ if k ≥ τ

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

Yk not necessarily a supermartingale,

because (C1) assumes Yk > a

But the “stopped process” Yk∧τ is a supermartingale, so

a

∀k

Y0 ≥ E [Yk∧τ | F

a

0]

∀k

Y0 ≥ E [Yk∧τ + (k ∧ τ

a

a)ε0 | F0] - Dominated Convergence Theorem

Theorem

Suppose Xk is a sequence of random variables such that

for each outcome in the sample space

lim Xk = X.

k→∞

Let Y ≥ 0 be a random variable with E [Y ] < ∞ such that

for each outcome in the sample space, and for each k

|Xk| ≤ Y.

Then it holds

lim E [Xk] = E lim Xk = E [X]

k→∞

k→∞ - Proof.

By (C1+), Zk := Yk∧τ + ε0(k ∧ τ ) is a super-martingale, so

Y0 = E [Z0 | F0] ≥ E [Zk | F0]

∀k.

Since Yk is bounded to [0, b], and τ has finite expectation,

the dominated convergence theorem applies and

Y0 ≥ lim E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .

k→∞

Proof of Additive Drift Theorem

(C1+) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −ε0

(C1−) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≥ −ε0

Theorem

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.

If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0. - Proof of Additive Drift Theorem

(C1+) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −ε0

(C1−) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≥ −ε0

Theorem

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.

If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.

Proof.

By (C1+), Zk := Yk∧τ + ε0(k ∧ τ ) is a super-martingale, so

Y0 = E [Z0 | F0] ≥ E [Zk | F0]

∀k.

Since Yk is bounded to [0, b], and τ has finite expectation,

the dominated convergence theorem applies and

Y0 ≥ lim E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .

k→∞ - Proof of Additive Drift Theorem

(C1+) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −ε0

(C1−) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≥ −ε0

Theorem

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (C1+) holds for an ε0 > 0, then E [τ | F0] ≤ Y0/ε0.

If (C1−) holds for an ε0 > 0, then E [τ | F0] ≥ Y0/ε0.

Proof.

By (C1−), Zk := Yk∧τ + ε0(k ∧ τ ) is a sub-martingale, so

Y0 = E [Z0 | F0] ≤E [Zk | F0]

∀k.

Since Yk is bounded to [0, b], and τ has finite expectation,

the dominated convergence theorem applies and

Y0≤ lim E [Zk | F0] = E [Yτ + ε0τ | F0] = ε0E [τ | F0] .

k→∞ - Examples: (1+1) EA

1 (1+1) EA

1: Sample x(0) uniformly at random from {0, 1}n.

2: for k = 0, 1, 2, . . . do

3:

Set y := x(k), and flip each bit of y with probability 1/n.

4:

y

if f (y) ≥ f (x(k))

x(k+1) :=

x(k)

otherwise.

5: end for - Law of Total Probability

E [X] = Pr (E ) E [X | E ] + Pr E E X | E - Let Yk := n − Lo(x(k)) be the “remaining” bits in step k ≥ 0.

Let E be the event that only the left-most 0-bit flipped in y.

The sequence Yk is non-increasing, so

E [Yk+1 − Yk | Yk > 0 ∧ Fk]

≤ (−1) Pr (E | Yk > 0 ∧ Fk)

= (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.

By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.

Example 1: (1+1) EA on LeadingOnes

n

i

Lo(x) :=

xj

i=1 j=1

Leading 1-bits

Remaining bits

x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .

Left-most 0-bit - Let E be the event that only the left-most 0-bit flipped in y.

The sequence Yk is non-increasing, so

E [Yk+1 − Yk | Yk > 0 ∧ Fk]

≤ (−1) Pr (E | Yk > 0 ∧ Fk)

= (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.

By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.

Example 1: (1+1) EA on LeadingOnes

n

i

Lo(x) :=

xj

i=1 j=1

Leading 1-bits

Remaining bits

x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .

Left-most 0-bit

Let Yk := n − Lo(x(k)) be the “remaining” bits in step k ≥ 0. - The sequence Yk is non-increasing, so

E [Yk+1 − Yk | Yk > 0 ∧ Fk]

≤ (−1) Pr (E | Yk > 0 ∧ Fk)

= (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.

By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.

Example 1: (1+1) EA on LeadingOnes

n

i

Lo(x) :=

xj

i=1 j=1

Leading 1-bits

Remaining bits

x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .

Left-most 0-bit

Let Yk := n − Lo(x(k)) be the “remaining” bits in step k ≥ 0.

Let E be the event that only the left-most 0-bit flipped in y. - By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2.

Example 1: (1+1) EA on LeadingOnes

n

i

Lo(x) :=

xj

i=1 j=1

Leading 1-bits

Remaining bits

x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .

Left-most 0-bit

Let Yk := n − Lo(x(k)) be the “remaining” bits in step k ≥ 0.

Let E be the event that only the left-most 0-bit flipped in y.

The sequence Yk is non-increasing, so

E [Yk+1 − Yk | Yk > 0 ∧ Fk]

≤ (−1) Pr (E | Yk > 0 ∧ Fk)

= (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en. - Example 1: (1+1) EA on LeadingOnes

n

i

Lo(x) :=

xj

i=1 j=1

Leading 1-bits

Remaining bits

x = 1111111111111111 0∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ .

Left-most 0-bit

Let Yk := n − Lo(x(k)) be the “remaining” bits in step k ≥ 0.

Let E be the event that only the left-most 0-bit flipped in y.

The sequence Yk is non-increasing, so

E [Yk+1 − Yk | Yk > 0 ∧ Fk]

≤ (−1) Pr (E | Yk > 0 ∧ Fk)

= (−1)(1/n)(1 − 1/n)n−1 ≤ −1/en.

By the additive drift theorem, E [τ | F0] ≤ enY0 ≤ en2. - n

E [Yk+1 − Yk | Fk] ≤

Pr (Ei | Fk)

i=1

1

1

n−1 n

Yk

wmin

1 −

≤ −

≤ −

n

n

en

en

i=1

Let Yk be the function value that “remains” at time k, ie

n

n

n

(k)

(k)

Yk :=

wi

−

wix

=

w

.

i

i

1 − xi

i=1

i=1

i=1

Let Ei be the event that only bit i flipped in y, then

E [Yk+1 − Yk | Ei ∧ Fk]

≤

(k)

wi x

− 1

i

By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).

Example 2: (1+1) EA on Linear Functions

Given some constants w1, . . . , wn ∈ [wmin, wmax], define

f (x) := w1x1 + w2x2 + · · · + wnxn - n

E [Yk+1 − Yk | Fk] ≤

Pr (Ei | Fk)

i=1

1

1

n−1 n

Yk

wmin

1 −

≤ −

≤ −

n

n

en

en

i=1

Let Ei be the event that only bit i flipped in y, then

E [Yk+1 − Yk | Ei ∧ Fk]

≤

(k)

wi x

− 1

i

By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).

Example 2: (1+1) EA on Linear Functions

Given some constants w1, . . . , wn ∈ [wmin, wmax], define

f (x) := w1x1 + w2x2 + · · · + wnxn

Let Yk be the function value that “remains” at time k, ie

n

n

n

(k)

(k)

Yk :=

wi

−

wix

=

w

.

i

i

1 − xi

i=1

i=1

i=1 - n

E [Yk+1 − Yk | Fk] ≤

Pr (Ei | Fk)

i=1

1

1

n−1 n

Yk

wmin

1 −

≤ −

≤ −

n

n

en

en

i=1

By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).

Example 2: (1+1) EA on Linear Functions

Given some constants w1, . . . , wn ∈ [wmin, wmax], define

f (x) := w1x1 + w2x2 + · · · + wnxn

Let Yk be the function value that “remains” at time k, ie

n

n

n

(k)

(k)

Yk :=

wi

−

wix

=

w

.

i

i

1 − xi

i=1

i=1

i=1

Let Ei be the event that only bit i flipped in y, then

E [Yk+1 − Yk | Ei ∧ Fk]

≤

(k)

wi x

− 1

i - By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin).

Example 2: (1+1) EA on Linear Functions

Given some constants w1, . . . , wn ∈ [wmin, wmax], define

f (x) := w1x1 + w2x2 + · · · + wnxn

Let Yk be the function value that “remains” at time k, ie

n

n

n

(k)

(k)

Yk :=

wi

−

wix

=

w

.

i

i

1 − xi

i=1

i=1

i=1

Let Ei be the event that only bit i flipped in y, then

n

E [Yk+1 − Yk | Fk] ≤

Pr (Ei | Fk)E [Yk+1 − Yk | Ei ∧ Fk]

i=1

1

1

n−1 n

Y

w

≤

(k)

k

min

1 −

wi x

− 1 ≤ −

≤ −

n

n

i

en

en

i=1 - Example 2: (1+1) EA on Linear Functions

Given some constants w1, . . . , wn ∈ [wmin, wmax], define

f (x) := w1x1 + w2x2 + · · · + wnxn

Let Yk be the function value that “remains” at time k, ie

n

n

n

(k)

(k)

Yk :=

wi

−

wix

=

w

.

i

i

1 − xi

i=1

i=1

i=1

Let Ei be the event that only bit i flipped in y, then

n

E [Yk+1 − Yk | Fk] ≤

Pr (Ei | Fk)E [Yk+1 − Yk | Ei ∧ Fk]

i=1

1

1

n−1 n

Y

w

≤

(k)

k

min

1 −

wi x

− 1 ≤ −

≤ −

n

n

i

en

en

i=1

By the additive drift theorem, E [τ | F0] ≤ en2(wmax/wmin). - Remarks on Example Applications

Example 1: (1+1) EA on LeadingOnes

The upper bound en2 is very accurate.

The exact expression is c(n)n2, where c(n) → (e − 1)/2 [20].

Example 2: (1+1) EA on Linear Functions

The upper bound en2(wmax/wmin) is correct, but very loose.

The linear function BinVal has (wmax/wmin) = 2n−1.

The tightest known bound is en log(n) + O(n) [22].

=⇒ A poor choice of distance function gives an imprecise bound! - What is a good distance function?

Theorem ([8])

Assume Y is a homogeneous Markov chain, and τ the time to

absorption. Then the function g(x) := E [τ | Y0 = x], satisfies

g(x) = 0

if x is an absorbing state

E [g(Yk+1) − g(Yk) | Fk] = −1

otherwise.

Distance function g gives exact expected runtime!

But g requires complete knowledge of the expected runtime!

Still provides insight into what is a good distance function:

a good approximation (or guess) for the remaining runtime - Part 3 - Variable Drift
- Drift may be Position-Dependant

Constant Drift

Variable Drift - Drift may be Position-Dependant

Idea: Find a function g : R → R

st. the transformed stochastic

process g(X1), g(X2), g(X3), . . .

has constant drift.

Variable Drift - Drift may be Position-Dependant

Idea: Find a function g : R → R

st. the transformed stochastic

process g(X1), g(X2), g(X3), . . .

has constant drift.

Constant Drift - Jensen’s Inequality

Theorem

If g : R → R concave, then E [g(X) | F ] ≤ g(E [X | F ]).

If g (x) < 0, then g is concave. - Theorem ([2, 4])

Given a sequence (Yk, Fk) over an interval [a, b] ⊂ R, a > 0

Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0] < ∞.

If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ.

Proof.

g(s) := ln(s/a) is concave, so by Jensen’s inequality

E [g(Yk+1) − g(Yk) | Yk > a ∧ Fk]

≤ ln(E [Yk+1 | Yk > a ∧ Fk]) − ln(Yk) ≤ ln(1 − δ) ≤ −δ.

δY

Multiplicative Drift

k

a

Yk

b

(M) ∀k

E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −δYk - Proof.

g(s) := ln(s/a) is concave, so by Jensen’s inequality

E [g(Yk+1) − g(Yk) | Yk > a ∧ Fk]

≤ ln(E [Yk+1 | Yk > a ∧ Fk]) − ln(Yk) ≤ ln(1 − δ) ≤ −δ.

δY

Multiplicative Drift

k

a

Yk

b

(M) ∀k

E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −δYk

Theorem ([2, 4])

Given a sequence (Yk, Fk) over an interval [a, b] ⊂ R, a > 0

Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0] < ∞.

If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ. - δY

Multiplicative Drift

k

a

Yk

b

(M) ∀k

E [Yk+1 | Yk > a ∧ Fk] ≤ (1 − δ)Yk

Theorem ([2, 4])

Given a sequence (Yk, Fk) over an interval [a, b] ⊂ R, a > 0

Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0] < ∞.

If (M) holds for a δ > 0, then E [τa | F0] ≤ ln(Y0/a)/δ.

Proof.

g(s) := ln(s/a) is concave, so by Jensen’s inequality

E [g(Yk+1) − g(Yk) | Yk > a ∧ Fk]

≤ ln(E [Yk+1 | Yk > a ∧ Fk]) − ln(Yk) ≤ ln(1 − δ) ≤ −δ. - Example: Linear Functions Revisited

For any c ∈ (0, 1), define the distance at time k as

n

(k)

Yk := cwmin +

wi 1 − xi

i=1

We have already seen that

n

1

(k)

E [Yk+1 − Yk | Fk] ≤

wi x

− 1

en

i

i=1

Yk − cwmin

Yk(1 − c)

= −

≤ −

en

en

By the multiplicative drift theorem (a := cwmin and δ := 1−c )

en

en

nwmax

E [τa | F0] ≤

ln

1 +

1 − c

cwmin - =⇒ The multiplicative drift theorem is the special case h(x) = δx.

Variable Drift Theorem

h(Yk)

a

Yk

b

(V) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −h(Yk)

Theorem ([15, 11])

Given a sequence (Yk, Fk) over an interval [a, b] ⊂ R, a > 0.

Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0] < ∞.

If there exists a function h : R → R such that

h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and

drift condition (V) holds, then

Y0

1

E [τa | F0] ≤

dz

a

h(z) - Variable Drift Theorem

h(Yk)

a

Yk

b

(V) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ −h(Yk)

Theorem ([15, 11])

Given a sequence (Yk, Fk) over an interval [a, b] ⊂ R, a > 0.

Define τa := min{k ≥ 0 | Yk = a}, and assume E [τa | F0] < ∞.

If there exists a function h : R → R such that

h(x) > 0 and h (x) > 0 for all x ∈ [a, b], and

drift condition (V) holds, then

Y0

1

E [τa | F0] ≤

dz

a

h(z)

=⇒ The multiplicative drift theorem is the special case h(x) = δx. - 1

h(Yk)

E [Yk+1 | Fk]

h(Yk)

The function g is concave (g < 0), so by Jensen’s inequality

≥ g(Yk) − g(E [Yk+1 | Fk])

Yk

1

≥

dz ≥ 1

Y

h(z)

k −h(Yk )

Variable Drift Theorem: Proof

x

1

g(x) :=

dz

a

h(z)

1

h(x)

Yk

Proof.

E [g(Yk) − g(Yk+1) | Fk] - 1

h(Yk)

h(Yk)

Yk

1

≥

dz ≥ 1

Y

h(z)

k −h(Yk )

Variable Drift Theorem: Proof

x

1

g(x) :=

dz

a

h(z)

1

h(x)

E [Yk+1 | Fk]

Yk

Proof.

The function g is concave (g < 0), so by Jensen’s inequality

E [g(Yk) − g(Yk+1) | Fk] ≥ g(Yk) − g(E [Yk+1 | Fk]) - 1

h(Yk)

The function g is concave (g < 0), so by Jensen’s inequality

≥ 1

Variable Drift Theorem: Proof

x

1

g(x) :=

dz

a

h(z)

1

h(x)

E [Yk+1 | Fk]

Yk

h(Yk)

Proof.

E [g(Yk) − g(Yk+1) | Fk] ≥ g(Yk) − g(E [Yk+1 | Fk])

Yk

1

≥

dz

Y

h(z)

k −h(Yk ) - The function g is concave (g < 0), so by Jensen’s inequality

Variable Drift Theorem: Proof

x

1

g(x) :=

dz

a

h(z)

1

1

h(x)

h(Yk)

E [Yk+1 | Fk]

Yk

h(Yk)

Proof.

E [g(Yk) − g(Yk+1) | Fk] ≥ g(Yk) − g(E [Yk+1 | Fk])

Yk

1

≥

dz ≥ 1

Y

h(z)

k −h(Yk ) - Part 4 - Supermartingale
- Theorem (See eg. [16])

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R.

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)

σ2

Proof.

Let Zk := b2 − (b − Yk)2, and note that b − Yk ≤ E [b − Yk+1 | Fk].

E [Zk+1 − Zk | Fk] = −E (b − Yk+1)2 | Fk + (b − Yk)2

≤ −E (b − Yk+1)2 | Fk + E [b − Yk+1 | Fk]2

= −Var [b − Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2

Supermartingale

a = 0

Yk

b

(S1) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ 0

(S2) ∀k

Var [Yk+1 | Yk > 0 ∧ Fk] ≥ σ2 - Proof.

Let Zk := b2 − (b − Yk)2, and note that b − Yk ≤ E [b − Yk+1 | Fk].

E [Zk+1 − Zk | Fk] = −E (b − Yk+1)2 | Fk + (b − Yk)2

≤ −E (b − Yk+1)2 | Fk + E [b − Yk+1 | Fk]2

= −Var [b − Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2

Supermartingale

a = 0

Yk

b

(S1) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ 0

(S2) ∀k

Var [Yk+1 | Yk > 0 ∧ Fk] ≥ σ2

Theorem (See eg. [16])

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R.

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)

σ2 - Supermartingale

a = 0

Yk

b

(S1) ∀k

E [Yk+1 − Yk | Yk > 0 ∧ Fk] ≤ 0

(S2) ∀k

Var [Yk+1 | Yk > 0 ∧ Fk] ≥ σ2

Theorem (See eg. [16])

Given a sequence (Yk, Fk) over an interval [0, b] ⊂ R.

Define τ := min{k ≥ 0 | Yk = 0}, and assume E [τ | F0] < ∞.

If (S1) and (S2) hold for σ > 0, then E [τ | F0] ≤ Y0(2b−Y0)

σ2

Proof.

Let Zk := b2 − (b − Yk)2, and note that b − Yk ≤ E [b − Yk+1 | Fk].

E [Zk+1 − Zk | Fk] = −E (b − Yk+1)2 | Fk + (b − Yk)2

≤ −E (b − Yk+1)2 | Fk + E [b − Yk+1 | Fk]2

= −Var [b − Yk+1] = −Var [Yk+1 | Fk] ≤ −σ2 - Part 5 - Hajek’s Theorem
- If λ, ε0, D ∈ O(1) and b − a ∈ Ω(n),

then there exists a constant c > 0 such that

Pr (τb ≤ ecn | F0) ≤ e−Ω(n)

Hajek’s Theorem4

Theorem

If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

then for any δ ∈ (0, 1)

(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)

(*) Pr (τb < B | Y0 < a) ≤

BD

· eη(a−b)

(1−δ)ηε0

for some η ≥ min{λ, δε0λ2/D} > 0.

4The theorem presented here is a corollary to Theorem 2.3 in [6]. - Hajek’s Theorem4

Theorem

If there exist λ, ε0 > 0 and D < ∞ such that for all k ≥ 0

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

then for any δ ∈ (0, 1)

(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)

(*) Pr (τb < B | Y0 < a) ≤

BD

· eη(a−b)

(1−δ)ηε0

for some η ≥ min{λ, δε0λ2/D} > 0.

If λ, ε0, D ∈ O(1) and b − a ∈ Ω(n),

then there exists a constant c > 0 such that

Pr (τb ≤ ecn | F0) ≤ e−Ω(n)

4The theorem presented here is a corollary to Theorem 2.3 in [6]. - 1. If Y ≤ Z, then Y ≺ Z.

2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).

Then |V (Xk+1) − V (Xk)| ≺ d(Xk+1, Xk)

Example

Xk+1

V (Xk+1)

d(Xk, Xk+1)

x∗

V (X

X

k )

k

Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z

Definition

Y ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R

1.0

0.8

0.6

0.4

0.2

3

2

1

1

2

3

4 - 2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).

Then |V (Xk+1) − V (Xk)| ≺ d(Xk+1, Xk)

Xk+1

V (Xk+1)

d(Xk, Xk+1)

x∗

V (X

X

k )

k

Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z

Definition

Y ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R

1.0

0.8

0.6

0.4

0.2

3

2

1

1

2

3

4

Example

1. If Y ≤ Z, then Y ≺ Z. - Stochastic Dominance - (|Yk+1 − Yk| | Fk) ≺ Z

Definition

Y ≺ Z if Pr (Z ≤ c) ≤ Pr (Y ≤ c) for all c ∈ R

1.0

0.8

0.6

0.4

0.2

3

2

1

1

2

3

4

Example

1. If Y ≤ Z, then Y ≺ Z.

2. Let (Ω, d) be a metric space, and V (x) := d(x, x∗).

Then |V (Xk+1) − V (Xk)| ≺ d(Xk+1, Xk)

Xk+1

V (Xk+1)

d(Xk, Xk+1)

x∗

V (X

X

k )

k - Condition (C2) implies that “long jumps” must be rare

Assume that

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

Then for any j ≥ 0,

Pr (|Yk+1 − Yk| ≥ j) = Pr eλ|Yk+1−Yk| ≥ eλj

≤ E eλ|Yk+1−Yk| e−λj

≤ E eλZ e−λj

= De−λj.

Markov’s inequality

If X ≥ 0, then Pr (X ≥ k) ≤ E [X] /k. - Example

Let X :=

n

X

i=1

i where Xi are independent rvs with

Pr (Xi = 1) = p and Pr (Xi = 0) = 1 − p. Then

MX (λ) = (1 − p)eλ·0 + peλ·1

i

MX (λ) = MX (λ)M

(λ) · · · M

(λ) = (1 − p + peλ)n.

1

X2

Xn

Moment Generating Function (mgf) E eλZ

Definition

The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R.

(n)

The n-th derivative at t = 0 is M

(0) =

X

E [Xn],

hence MX provides all moments of X, thus the name.

If X and Y are independent rv. and a, b ∈ R, then

MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX(at)MY (bt) - Moment Generating Function (mgf) E eλZ

Definition

The mgf of a rv X is MX (λ) := E eλX for all λ ∈ R.

(n)

The n-th derivative at t = 0 is M

(0) =

X

E [Xn],

hence MX provides all moments of X, thus the name.

If X and Y are independent rv. and a, b ∈ R, then

MaX+bY (t) = E et(aX+bX) = E etaX E etbX = MX(at)MY (bt)

Example

Let X :=

n

X

i=1

i where Xi are independent rvs with

Pr (Xi = 1) = p and Pr (Xi = 0) = 1 − p. Then

MX (λ) = (1 − p)eλ·0 + peλ·1

i

MX (λ) = MX (λ)M

(λ) · · · M

(λ) = (1 − p + peλ)n.

1

X2

Xn - The mgf. of X ∼ Bin(n, p) at t = ln(2) is

(1 − p + pet)n = (1 + p)n ≤ epn.

Moment Generating Functions

Distribution

mgf

Bernoulli

Pr (X = 1) = p

1 − p + pet

Binomial

X ∼ Bin(n, p)

(1 − p + pet)n

Geometric

Pr (X = k) = (1 − p)k−1p

pet

1−(1−p)et

Uniform

X ∼ U (a, b)

etb−eta

t(b−a)

Normal

X ∼ N (µ, σ2)

exp(tµ + 1 σ2t2)

2 - Moment Generating Functions

Distribution

mgf

Bernoulli

Pr (X = 1) = p

1 − p + pet

Binomial

X ∼ Bin(n, p)

(1 − p + pet)n

Geometric

Pr (X = k) = (1 − p)k−1p

pet

1−(1−p)et

Uniform

X ∼ U (a, b)

etb−eta

t(b−a)

Normal

X ∼ N (µ, σ2)

exp(tµ + 1 σ2t2)

2

The mgf. of X ∼ Bin(n, p) at t = ln(2) is

(1 − p + pet)n = (1 + p)n ≤ epn. - Assume

Fitness function f has unique maximum x∗ ∈ {0, 1}n.

Distance function is g(x) = H(x, x∗)

Then

|g(x(k+1)) − g(x(k))| ≺ Z where Z := H(x(k), x )

Z ∼ Bin(n, p) so E eλZ ≤ enp for λ = ln(2)

Condition (C2) often holds trivially

Example ((1+1) EA)

Choose x uniformly from {0, 1}n

for k = 0, 1, 2, ...

Set x := x(k), and flip each bit of x with probability p.

If f (x ) ≥ f (x(k)), then x(k+1) := x else x(k+1) := x(k) - Condition (C2) often holds trivially

Example ((1+1) EA)

Choose x uniformly from {0, 1}n

for k = 0, 1, 2, ...

Set x := x(k), and flip each bit of x with probability p.

If f (x ) ≥ f (x(k)), then x(k+1) := x else x(k+1) := x(k)

Assume

Fitness function f has unique maximum x∗ ∈ {0, 1}n.

Distance function is g(x) = H(x, x∗)

Then

|g(x(k+1)) − g(x(k))| ≺ Z where Z := H(x(k), x )

Z ∼ Bin(n, p) so E eλZ ≤ enp for λ = ln(2) - Simple Application (1+1) EA on Needle

(1+1) EA with mutation rate p = 1/n on

n

Yk := H(x(k), 0n)

Needle(x) :=

xi

a := (3/4)n

i=1

b := n

Condition (C2) satisfied5 with D = E eλZ ≤ e where λ = ln(2).

Condition (C1) satisfied for ε0 := 1/2 because

E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ (n − a)p − ap = −ε0.

Thus, η ≥ min{λ, δε0λ2/D} > 1/25 when δ = 1/2 and

Pr (τa > n + k | F0) ≤ e(1/25)(Y0−a−(n+k)(1−δ)ε0) ≤ e−k/100

Pr τb < en/200 | F0 = e−Ω(n)

5See previous slide. - Proof overview

Theorem (2.3 in [6])

Assume that there exists 0 < ρ < 1 and D ≥ 1 such that

(D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk

(D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa

Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) + Deη(a−b)(1 − ρk)/(1 − ρ).

(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1 − ρ)

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk

Lemma

Assume that there exists a ε0 > 0 such that

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D < ∞ for a λ > 0.

then (D1) and (D2) hold for some η and ρ < 1 - Ineq. (1), and induction on k

≤ ρE eηYk | F0 + Deηa

≤ ρkeηY0 + (1 + ρ + ρ2 + · · · + ρk−1)Deηa.

By the law of total probability, and the conditions (D1) and (D2)

E eηYk+1 | Fk ≤ ρeηYk + Deηa

(1)

By the law of total expectation,

E eηYk+1 | F0 = E E eηYk+1 | Fk | F0

Theorem

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

Proof. - Ineq. (1), and induction on k

≤ ρE eηYk | F0 + Deηa

≤ ρkeηY0 + (1 + ρ + ρ2 + · · · + ρk−1)Deηa.

By the law of total expectation,

E eηYk+1 | F0 = E E eηYk+1 | Fk | F0

Theorem

(D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk

(D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

Proof.

By the law of total probability, and the conditions (D1) and (D2)

E eηYk+1 | Fk ≤ ρeηYk + Deηa

(1) - Ineq. (1), and induction on k

≤ ρE eηYk | F0 + Deηa

≤ ρkeηY0 + (1 + ρ + ρ2 + · · · + ρk−1)Deηa.

Theorem

(D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk

(D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

Proof.

By the law of total probability, and the conditions (D1) and (D2)

E eηYk+1 | Fk ≤ ρeηYk + Deηa

(1)

By the law of total expectation,

E eηYk+1 | F0 = E E eηYk+1 | Fk | F0 - and induction on k

≤ ρkeηY0 + (1 + ρ + ρ2 + · · · + ρk−1)Deηa.

Theorem

(D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk

(D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

Proof.

By the law of total probability, and the conditions (D1) and (D2)

E eηYk+1 | Fk ≤ ρeηYk + Deηa

(1)

By the law of total expectation, Ineq. (1),

E eηYk+1 | F0 = E E eηYk+1 | Fk | F0

≤ ρE eηYk | F0 + Deηa - Theorem

(D1) E eηYk+1 | Yk > a ∧ Fk ≤ ρeηYk

(D2) E eηYk+1 | Yk ≤ a ∧ Fk ≤ Deηa

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

Proof.

By the law of total probability, and the conditions (D1) and (D2)

E eηYk+1 | Fk ≤ ρeηYk + Deηa

(1)

By the law of total expectation, Ineq. (1), and induction on k

E eηYk+1 | F0 = E E eηYk+1 | Fk | F0

≤ ρE eηYk | F0 + Deηa

≤ ρkeηY0 + (1 + ρ + ρ2 + · · · + ρk−1)Deηa. - Proof of (2.8)

Theorem

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Assume that (D1) and (D2) hold. Then

(2.6) E eηYk+1 | F0 ≤ ρkeηY0 + Deηa(1 − ρk)/(1 − ρ).

(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) + Deη(a−b)(1 − ρk)/(1 − ρ).

Proof.

(2.8) follows from Markov’s inequality and (2.6)

Pr (Yk+1 ≥ b | F0) = Pr eηYk+1 ≥ eηb | F0

≤ E eηYk+1 | F0 e−ηb - Proof.

By the union bound and (2.8)

B

Pr (τb < B | Y0 < a ∧ F0) ≤

Pr (Yk ≥ b | Y0 < a ∧ F0)

k=1

B

1 − ρk

BDeη(a−b)

≤

Deη(a−b)

ρk +

≤

1 − ρ

1 − ρ

k=1

Proof of (*)

Theorem

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Assume that (D1) and (D2) hold for D ≥ 1. Then

(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) + Deη(a−b)(1 − ρk)/(1 − ρ).

(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1 − ρ) - Proof of (*)

Theorem

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Assume that (D1) and (D2) hold for D ≥ 1. Then

(2.8) Pr (Yk ≥ b | F0) ≤ ρkeη(Y0−b) + Deη(a−b)(1 − ρk)/(1 − ρ).

(*) Pr (τb < B | Y0 < a) ≤ eη(a−b)BD/(1 − ρ)

Proof.

By the union bound and (2.8)

B

Pr (τb < B | Y0 < a ∧ F0) ≤

Pr (Yk ≥ b | Y0 < a ∧ F0)

k=1

B

1 − ρk

BDeη(a−b)

≤

Deη(a−b)

ρk +

≤

1 − ρ

1 − ρ

k=1 - Union Bound

Ω

E1

E2

E4

E3

Pr (E1 ∨ E2 ∨ · · · ∨ Ek) ≤ Pr (E1) + Pr (E2) + · · · + Pr (Ek) - = Pr (τa > k | F0) E eηYkρ−k | τa > k ∧ F0

≥ Pr (τa > k | F0) eηaρ−k

Proof.

By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

eηY0 = Z0 ≥ E [Zk | F0] = E eηYk∧τ ρ−k∧τ | F0

(2)

By (2) and the law of total probability

eηY0 ≥ Pr (τa > k | F0) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

Proof of (2.9)

Theorem

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

Assume that (D1) hold. Then

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk - = Pr (τa > k | F0) E eηYkρ−k | τa > k ∧ F0

≥ Pr (τa > k | F0) eηaρ−k

By (2) and the law of total probability

eηY0 ≥ Pr (τa > k | F0) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

Proof of (2.9)

Theorem

(D1) E eηYk+1ρ−1 | Yk > a ∧ Fk ≤ eηYk

Assume that (D1) hold. Then

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk

Proof.

By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

eηY0 = Z0 ≥ E [Zk | F0] = E eηYk∧τ ρ−k∧τ | F0

(2) - = Pr (τa > k | F0) E eηYkρ−k | τa > k ∧ F0

≥ Pr (τa > k | F0) eηaρ−k

Proof of (2.9)

Theorem

(D1) E eηYk+1ρ−1 | Yk > a ∧ Fk ≤ eηYk

Assume that (D1) hold. Then

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk

Proof.

By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

eηY0 = Z0 ≥ E [Zk | F0] = E eηYk∧τ ρ−k∧τ | F0

(2)

By (2) and the law of total probability

eηY0 ≥ Pr (τa > k | F0) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0 - ≥ Pr (τa > k | F0) eηaρ−k

Proof of (2.9)

Theorem

(D1) E eηYk+1ρ−1 | Yk > a ∧ Fk ≤ eηYk

Assume that (D1) hold. Then

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk

Proof.

By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

eηY0 = Z0 ≥ E [Zk | F0] = E eηYk∧τ ρ−k∧τ | F0

(2)

By (2) and the law of total probability

eηY0 ≥ Pr (τa > k | F0) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

= Pr (τa > k | F0) E eηYkρ−k | τa > k ∧ F0 - Proof of (2.9)

Theorem

(D1) E eηYk+1ρ−1 | Yk > a ∧ Fk ≤ eηYk

Assume that (D1) hold. Then

(2.9) Pr (τa > k | F0) ≤ eη(Y0−a)ρk

Proof.

By (D1) Zk := eηYk∧τ ρ−k∧τ is a supermartingale, so

eηY0 = Z0 ≥ E [Zk | F0] = E eηYk∧τ ρ−k∧τ | F0

(2)

By (2) and the law of total probability

eηY0 ≥ Pr (τa > k | F0) E eηYk∧τ ρ−k∧τ | τa > k ∧ F0

= Pr (τa > k | F0) E eηYkρ−k | τa > k ∧ F0

≥ Pr (τa > k | F0) eηaρ−k - (C1) and (C2) =⇒ (D1)

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D < ∞ for a λ > 0.

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

Lemma

Assume (C1) and (C2). Then (D1) holds when ρ ≥ 1 − ηε0 + η2c,

and 0 < η ≤ min{λ, ε

λk−2

0/c} where c :=

∞

k=2

k!

E Zk .

Proof.

Let X := (Yk+1 − Yk | Yk > a ∧ Fk).

By (C2) it holds, |X| ≺ Z, so E Xk ≤ E |X|k ≤ E Zk .

From ex =

∞

xk/(k!) and linearity of expectation

k=0

∞ ηk

0 < E eηX = 1 + ηE [X] +

E Xk

≤ ρ.

k!

k=2 - (C2) =⇒ (D2)

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D < ∞ for a λ > 0.

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Theorem

Assume (C2) and 0 < η ≤ λ. Then (D2) holds.

Proof.

If Yk ≤ a then Yk+1 − a ≤ Yk+1 − Yk ≤ |Yk+1 − Yk|, so

E eη(Yk+1−a) | Yk ≤ a ∧ Fk

≤ E eλ|Yk+1−Yk| | Yk ≤ a ∧ Fk

Furthermore, by (C2)

E eλ|Yk+1−Yk| | Yk ≤ a ∧ Fk

≤ E eλZ = D. - (D1) and (D2) hold for

=⇒ δε0 ≥ ηc

= 1 − ηε0 + ηδε0 ≥ 1 − ηε + η2c

(C1) and (C2) =⇒ (D1) and (D2)

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D < ∞ for a λ > 0.

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Lemma

Assume (C1) and (C2). Then (D1) and (D2) hold when

ρ ≥ 1 − ηε0 + η2c

and

0 < η ≤ min{λ, ε0/c}

where c :=

∞

λk−2

k=2

k!

E Zk = (D − 1 − λE [Z])λ−2 > 0.

Corollary

Assume (C1), (C2) and 0 < δ < 1. Then

η := min{λ, δε0/c}

ρ := 1 − (1 − δ)ηε0 - (C1) and (C2) =⇒ (D1) and (D2)

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D < ∞ for a λ > 0.

(D1) E eη(Yk+1−Yk) | Yk > a ∧ Fk ≤ ρ

(D2) E eη(Yk+1−a) | Yk ≤ a ∧ Fk ≤ D

Lemma

Assume (C1) and (C2). Then (D1) and (D2) hold when

ρ ≥ 1 − ηε0 + η2c

and

0 < η ≤ min{λ, ε0/c}

where c :=

∞

λk−2

k=2

k!

E Zk = (D − 1 − λE [Z])λ−2 > 0.

Corollary

Assume (C1), (C2) and 0 < δ < 1. Then (D1) and (D2) hold for

η := min{λ, δε0/c} =⇒ δε0 ≥ ηc

ρ := 1 − (1 − δ)ηε0 = 1 − ηε0 + ηδε0 ≥ 1 − ηε + η2c - 1. Note that ln(ρ) ≤ ρ − 1 = −(1 − δ)ηε0 so

Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)

≤ eη(Y0−a−B(1−δ)ε0).

2. c = (D − 1 − λE [Z])λ−2 < D/λ2 so η ≥ min{λ, δε0λ2/D}.

Reformulation of Hajek’s Theorem

Theorem

If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

then for any δ ∈ (0, 1)

(2.9) Pr (τa > B | F0) ≤ eη(Y0−a)ρB

(*) Pr (τb < B | Y0 < a) ≤ BD · eη(a−b)

(1−ρ)

where η := min{λ, δε0/c} and ρ := 1 − (1 − δ)ηε0 - Reformulation of Hajek’s Theorem

Theorem

If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

then for any δ ∈ (0, 1)

(2.9) Pr (τa > B | F0) ≤ eη(Y0−a)ρB

(*) Pr (τb < B | Y0 < a) ≤ BD · eη(a−b)

(1−ρ)

where η := min{λ, δε0/c} and ρ := 1 − (1 − δ)ηε0

1. Note that ln(ρ) ≤ ρ − 1 = −(1 − δ)ηε0 so

Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)

≤ eη(Y0−a−B(1−δ)ε0).

2. c = (D − 1 − λE [Z])λ−2 < D/λ2 so η ≥ min{λ, δε0λ2/D}. - 1. Note that ln(ρ) ≤ ρ − 1 = −(1 − δ)ηε0 so

Pr (τa > B | F0) ≤ eη(Y0−a)ρB = eη(Y0−a)eB ln(ρ)

≤ eη(Y0−a−B(1−δ)ε0).

2. c = (D − 1 − λE [Z])λ−2 < D/λ2 so η ≥ min{λ, δε0λ2/D}.

Reformulation of Hajek’s Theorem

Theorem

If there exist λ, ε > 0 and 1 < D < ∞ such that for all k ≥ 0

(C1) E [Yk+1 − Yk | Yk > a ∧ Fk] ≤ −ε0

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

then for any δ ∈ (0, 1)

(2.9) Pr (τa > B | F0) ≤ eη(Y0−a−B(1−δ)ε0)

(*) Pr (τb < B | Y0 < a) ≤

BD

· eη(a−b)

(1−δ)ηε0

for some η ≥ min{λ, δε0λ2/D}. - The simplified drift theorem replaces (C2) with

(S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0.

and with some additional assumptions, provides a bound of type6

Pr τb < 2c(b−a) ≤ 2−Ω(b−a).

(3)

Until 2008, conditions (D1) and (D2) were used in EC.

(D1) and (D2) can lead to highly tedious calculations.

Oliveto and Witt were the first in EC to point out that the

much simpler to verify (C1), along with (S) is sufficient.

Simplified Drift Theorem [17]

We have already seen that

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

implies Pr (|Yk+1 − Yk| ≥ j) ≤ De−λj for all j ∈ N0.

6See [17] for the exact statement. - Simplified Drift Theorem [17]

We have already seen that

(C2) (|Yk+1 − Yk| | Fk) ≺ Z and E eλZ = D

implies Pr (|Yk+1 − Yk| ≥ j) ≤ De−λj for all j ∈ N0.

The simplified drift theorem replaces (C2) with

(S) Pr (Yk+1 − Yk ≥ j | Yk < b) ≤ r(n)(1 + δ)−j for all j ∈ N0.

and with some additional assumptions, provides a bound of type6

Pr τb < 2c(b−a) ≤ 2−Ω(b−a).

(3)

Until 2008, conditions (D1) and (D2) were used in EC.

(D1) and (D2) can lead to highly tedious calculations.

Oliveto and Witt were the first in EC to point out that the

much simpler to verify (C1), along with (S) is sufficient.

6See [17] for the exact statement. - Part 6 - Population Drift
- Drift Analysis of Population-based Evolutionary Algorithms

Evolutionary algorithms generally use populations.

So far, we have analysed the drift of the (1+1) EA,

ie an evolutionary algorithm with population size one.

The state aggregation problem makes analysis of

population-based EAs with classical drift theorems difficult:

How to define an appropriate distance function?

Should reflect the progress of the algorithm

Often hard to define for single-individual algorithms

Highly non-trivial for population-based algorithms

=⇒ This part of the tutorial focuses on a drift theorem for

populations which alleviates the state aggregation problem. - Population-based Evolutionary Algorithms

x

Pt

Require: ,

Finite set X , and initial population P0 ∈ X λ

Selection mechanism psel : X λ × X → [0, 1]

Variation operator pmut : X × X → [0, 1]

for t = 0, 1, 2, . . . until termination condition do

for i = 1 to λ do

Sample i-th parent x according to psel(Pt, ·)

Sample i-th offspring Pt+1(i) according to pmut(x, ·)

end for

end for - Selection and Variation - Example

Tournament Selection wrt

10001110111011

Bitwise Mutation

10010110111001 - Population Drift

Central Parameters

Reproductive rate of selection mechanism psel

α0 = max E [#offspring from parent j],

1≤j≤λ

Random walk process corresponding to variation operator pmut

Xk+1∼ pmut(Xk) - Population Drift

Central Parameters

Reproductive rate of selection mechanism psel

α0 = max E [#offspring from parent j],

1≤j≤λ

Random walk process corresponding to variation operator pmut

Xk+1∼ pmut(Xk) - Population Drift

Central Parameters

Reproductive rate of selection mechanism psel

α0 = max E [#offspring from parent j],

1≤j≤λ

Random walk process corresponding to variation operator pmut

Xk+1∼ pmut(Xk)

Central Parameters

Reproductive rate of selection mechanism psel

α0 = max E [#offspring from parent j],

1≤j≤λ

Random walk process corresponding to variation operator pmut

Xk+1∼ pmut(Xk)- Population Drift [12]

(C1P) ∀k

E eκ(g(Xk+1)−g(Xk)) | a < g(Xk) < b < 1/α0

Theorem

Define τb := min{k ≥ 0 | g(Pk(i)) > b for some i ∈ [λ]}.

If there exists constants α0 ≥ 1 and κ > 0 such that

psel has reproductive rate less than α0

the random walk process corresponding to pmut satisfies (C1P)

and some other conditions hold,7 then for some constants c, c > 0

Pr τb ≤ ec(b−a) = e−c (b−a)

7Some details are omitted. See Theorem 1 in [12] for all details. - Population Drift: Decoupling Selection & Variation

Population drift

If there exists a κ > 0 such that

M∆

(κ) < 1/α

mut

0

where

∆mut= g(Xk+1) − g(Xk)

Xk+1∼ pmut(Xk)

and

α0 = max E [#offspring from parent j],

j

then the runtime is exponential. - Population Drift: Decoupling Selection & Variation

Population drift

Classical drift [6]

If there exists a κ > 0 such that

If there exists a κ > 0 such that

M∆

(κ) < 1/α

M

mut

0

∆(κ) < 1

where

where

∆mut= g(Xk+1) − g(Xk)

∆ = h(Pk+1) − h(Pk),

Xk+1∼ pmut(Xk)

and

α0 = max E [#offspring from parent j],

j

then the runtime is exponential.

then the runtime is exponential. - Conclusion

Drift analysis is a powerful tool for analysis of EAs

Mainly used in EC to bound the expected runtime of EAs

Useful when the EA has non-monotonic progress,

eg. when the fitness value is a poor indicator of progress

The “art” consists in finding a good distance function

No simple receipe

A large number of drift theorems are available

Additive, multiplicative, variable, population drift...

Significant related literature from other fields than EC

Not the only tool in the toolbox, also

Artificial fitness levels, Markov Chain theory, Concentration of

measure, Branching processes, Martingale theory, Probability

generating functions, ... - Acknowledgements

Thanks to

David Hodge

Carsten Witt, and

Daniel Johannsen

for insightful discussions. - References I

[1]

Benjamin Doerr and Leslie Ann Goldberg.

Drift analysis with tail bounds.

In Proceedings of the 11th international conference on Parallel problem solving

from nature: Part I, PPSN’10, pages 174–183, Berlin, Heidelberg, 2010.

Springer-Verlag.

[2]

Benjamin Doerr, Daniel Johannsen, and Carola Winzen.

Multiplicative drift analysis.

In GECCO ’10: Proceedings of the 12th annual conference on Genetic and

evolutionary computation, pages 1449–1456, New York, NY, USA, 2010. ACM.

[3]

Stefan Droste, Thomas Jansen, and Ingo Wegener.

On the analysis of the (1+1) Evolutionary Algorithm.

Theoretical Computer Science, 276:51–81, 2002.

[4]

Simon Fischer, Lars Olbrich, and Berthold V¨

ocking.

Approximating wardrop equilibria with finitely many agents.

Distributed Computing, 21(2):129–139, 2008.

[5]

Oliver Giel and Ingo Wegener.

Evolutionary algorithms and the maximum matching problem.

In Proceedings of the 20th Annual Symposium on Theoretical Aspects of

Computer Science (STACS 2003), pages 415–426, 2003. - References II

[6]

Bruce Hajek.

Hitting-time and occupation-time bounds implied by drift analysis with

applications.

Advances in Applied Probability, 14(3):502–525, 1982.

[7]

Jun He and Xin Yao.

Drift analysis and average time complexity of evolutionary algorithms.

Artificial Intelligence, 127(1):57–85, March 2001.

[8]

Jun He and Xin Yao.

A study of drift analysis for estimating computation time of evolutionary

algorithms.

Natural Computing, 3(1):21–35, 2004.

[9]

Jens J¨

agersk¨

upper.

Algorithmic analysis of a basic evolutionary algorithm for continuous

optimization.

Theoretical Computer Science, 379(3):329–347, 2007.

[10] Jens J¨

agersk¨

upper.

A Blend of Markov-Chain and Drift Analysis.

In Proceedings of the 10th International Conference on Parallel Problem Solving

from Nature (PPSN 2008), 2008. - References III

[11] Daniel Johannsen.

Random combinatorial structures and randomized search heuristics.

PhD thesis, Universit¨

at des Saarlandes, 2010.

[12] Per Kristian Lehre.

Negative drift in populations.

In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume

6238 of LNCS, pages 244–253. Springer Berlin / Heidelberg, 2011.

[13] Per Kristian Lehre and Carsten Witt.

Black-box search by unbiased variation.

Algorithmica, pages 1–20, 2012.

[14] Sean P. Meyn and Richard L. Tweedie.

Markov Chains and Stochastic Stability.

Springer-Verlag, 1993.

[15] B. Mitavskiy, J. E. Rowe, and C. Cannings.

Theoretical analysis of local search strategies to optimize network communication

subject to preserving the total number of links.

International Journal of Intelligent Computing and Cybernetics, 2(2):243–284,

2009.

[16] Frank Neumann, Dirk Sudholt, and Carsten Witt.

Analysis of different mmas aco algorithms on unimodal functions and plateaus.

Swarm Intelligence, 3(1):35–68, 2009. - References IV

[17] Pietro Oliveto and Carsten Witt.

Simplified drift analysis for proving lower bounds inevolutionary computation.

Algorithmica, pages 1–18, 2010.

10.1007/s00453-010-9387-z.

[18] Pietro S. Oliveto and Carsten Witt.

Simplified drift analysis for proving lower bounds in evolutionary computation.

Technical Report Reihe CI, No. CI-247/08, SFB 531, Technische Universit¨

at

Dortmund, Germany, 2008.

[19] Galen H. Sasaki and Bruce Hajek.

The time complexity of maximum matching by simulated annealing.

Journal of the ACM, 35(2):387–403, 1988.

[20] Dirk Sudholt.

General lower bounds for the running time of evolutionary algorithms.

In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume

6238 of LNCS, pages 124–133. Springer Berlin / Heidelberg, 2010.

[21] David Williams.

Probability with Martingales.

Cambridge University Press, 1991. - References V

[22] Carsten Witt.

Optimizing linear functions with randomized search heuristics - the robustness of

mutation.

In Christoph D¨

urr and Thomas Wilke, editors, 29th International Symposium on

Theoretical Aspects of Computer Science (STACS 2012), volume 14 of Leibniz

International Proceedings in Informatics (LIPIcs), pages 420–431, Dagstuhl,

Germany, 2012. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.