このページは https://speakerdeck.com/jakevdp/frequentism-and-bayesianism-whats-the-big-deal-scipy-2014 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

2年以上前 (2014/07/08)にアップロードin学び

Statistical analysis comes in two main flavors: frequentist and Bayesian. The subtle differences ...

Statistical analysis comes in two main flavors: frequentist and Bayesian. The subtle differences between the two can lead to widely divergent approaches to common data analysis tasks. After a brief discussion of the philosophical distinctions between the views, I’ll utilize well-known Python libraries to demonstrate how this philosophy affects practical approaches to several common analysis tasks.

- Jake VanderPlas

SciPy 2014 - What this talk is…

An introduction to the essential diﬀerences

between frequentist & Bayesian analyses.

A brief discussion of tools available in

Python to perform these analyses

A thinly-veiled argument for the use of

Bayesian methods in science. - What this talk is not…

A complete discussion of frequentist/

Bayesian statistics & the associated

examples.

(For more detail, see the accompanying SciPy

proceedings paper & references within) - The frequentist/Bayesian divide is

fundamentally a question of philosophy:

the deﬁnition of probability. - What is probability?

Fundamentally related to the

frequencies of repeated events.

- Frequentists

Fundamentally related to our own

certainty or uncertainty of events.

- Bayesians - Thus we analyze…

Variation of data & derived quantities

in terms of ﬁxed model parameters.

- Frequentists

Variation of beliefs about parameters

in terms of ﬁxed observed data.

- Bayesians - Simple Example:

Photon Flux

Given the observed

data, what is the best

estimate of the true

value? - Frequentist Approach: Maximum Likelihood

Model: each observation Fi drawn

from a Gaussian of width ei - Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- Building the Likelihood…
- “Maximum Likelihood” estimate…
- Frequentist Point Estimate:

Analytically maximize to ﬁnd: - Frequentist Point Estimate:

Analytically maximize to ﬁnd:

In Python:

For our 30 data points, we have 999 +/- 4 - Bayesian Approach: Posterior Probability

Compute our knowledge of F given the

data, encoded as a probability:

To compute this, we use Bayes’ Theorem - Bayes’ Theorem

Likelihood

Prior

Posterior

Model Evidence - Bayes’ Theorem

Likelihood

Prior

Posterior

Model Evidence

(Often simply a normalization, but

useful for model evaluation, etc.)

Again, we ﬁnd 999 +/- 4 - For very simple problems,

frequentist & Bayesian results are

often practically indistinguishable - The diﬀerence becomes apparent in

more complicated situations…

- Handling of nuisance parameters

- Interpretation of Uncertainty

- Incorporation of prior information

- Comparison & evaluation of Models

- etc. - The diﬀerence becomes apparent in

more complicated situations…

- Handling of nuisance parameters

- Interpretation of Uncertainty

- Incorporation of prior information

- Comparison & evaluation of Models

- etc. - Example 1: Nuisance Parameters
- Nuisance Parameters: Bayes’ Billiard Game

Alice and Bob have a

gambling problem…

Bayes 1763

Eddy 2004 - Nuisance Parameters: Bayes’ Billiard Game

Carol has designed a game

for them to play… - Nuisance Parameters: Bayes’ Billiard Game

- The ﬁrst ball divides the table

- Additional balls give a point to A or B

- First person to six points wins

Bob’s Area

Alice’s Area - Nuisance Parameters: Bayes’ Billiard Game

- The ﬁrst ball divides the table

- Additional balls give a point to A or B

- First person to six points wins

Bob’s Area

“A Black Box”

Alice’s Area - Nuisance Parameters: Bayes’ Billiard Game

Question: in a certain game, Alice

has 5 points and Bob has 3. What are

the odds that Bob will go on to win?

Bob’s Area

“A Black Box”

Alice’s Area - Nuisance Parameters: Bayes’ Billiard Game

Note: the division of the table is a nuisance

parameter: a parameter which aﬀects the

problem and must be accounted for, but is

not of immediate interest.

Bob’s Area

“A Black Box”

Alice’s Area - A Frequentist Approach

p = probability of Alice winning any roll

(nuisance parameter)

Maximum likelihood estimate gives

Probability of Bob winning (he needs 3 points):

P(B) = 0.053; Odds of 18 to 1 against - A Bayesian Approach

Marginalization:

B = Bob wins

D = observed data

Some algebraic manipulation…

Find P(B|D) = 0.091; odds of 10 to 1 against - Bayes’ Billiard Game Results:

Frequentist: 18 to 1 odds

Bayesian: 10 to 1 odds - Bayes’ Billiard Game Results:

Frequentist: 18 to 1 odds

Bayesian: 10 to 1 odds

Diﬀerence: Bayes approach allows nuisance

parameters to vary, through marginalization. - Conditioning vs. Marginalization

Conditioning (akin to Frequentist here)

B

B

p - Conditioning vs. Marginalization

Marginalization (Bayesian approach here)

B

B

p - Example 2: Uncertainties
- Uncertainties: “Conﬁdence” vs “Credibility”

“If this experiment is repeated many times,

in 95% of these cases the computed

conﬁdence interval will contain the true θ.”

- Frequentists - Uncertainties: “Conﬁdence” vs “Credibility”

“If this experiment is repeated many times,

in 95% of these cases the computed

conﬁdence interval will contain the true θ.”

- Frequentists

“Given our observed data, there is a 95%

probability that the value of θ lies within

the credible region”.

- Bayesians - Uncertainties: “Conﬁdence” vs “Credibility”

“If this experiment is repeated many times,

in 95% of these cases the computed

conﬁdence interval will contain the true θ.”

- Frequentists

“Given our observed data, there is a 95%

probability that the value of θ lies within

the credible region”.

- Bayesians

Varying

Fixed - Uncertainties: Jaynes’ Truncated Exponential

Consider a model:

We observe D = {10, 12, 15}

What are the 95% bounds on Θ?

Jaynes 1976 - Common-sense Approach

D = {10, 12, 15}

Each point must be greater than Θ, and the

smallest observed point is x = 10.

Therefore we can

immediately write

the common-sense

bound Θ < 10 - Frequentist Approach

The expectation of x is:

So an unbiased estimator is:

Now we compute the

sampling distribution

of the mean for p(x): - Frequentist Approach

The expectation of x is:

95% conﬁdence interval:

So an unbiased estimator is:

10.2 < Θ < 12.2

Now we compute the

sampling distribution

of the mean for p(x): - Bayesian Approach

Bayes’ Theorem:

Likelihood:

With a ﬂat prior, we

get this posterior: - Bayesian Approach

Bayes’ Theorem:

95% credible region:

Likelihood:

9.0 < Θ < 10.0

With a ﬂat prior, we

get this posterior: - Jaynes’ Truncated Exponential Results:

Common Sense Bound:

Θ < 10

Frequentist unbiased 95%

conﬁdence interval:

10.2 < Θ < 12.2

Bayesian 95% credible region:

9.0 < Θ < 10.0 - Frequentism is not wrong!

It’s just answering a diﬀerent question than

we might expect. - Conﬁdence vs. Credibility

Bayesianism: probabilisitic statement about

model parameters given a ﬁxed

credible region

Frequentism: probabilistic statement about

a recipe for generating conﬁdence

intervals given a ﬁxed model parameter - Conﬁdence vs. Credibility

= Parameter

= Interval

Bayesian Credible Region: - Conﬁdence vs. Credibility

= Parameter

= Interval

Bayesian Credible Region:

Frequentist Conﬁdence Interval: - Conﬁdence vs. Credibility

= Parameter

= Interval

Bayesian Credible Region:

Frequentist Conﬁdence Interval:

Our Particular Interval - Please Remember This:

In general, a frequentist 95% Conﬁdence

Interval is not 95% likely to contain the true

value!

This very common mistake is a Bayesian

interpretation of a frequentist construct. - Typical Conversation:

Statistician: “95% of such conﬁdence

intervals in repeated experiments will

contain the true value”

Scientist: “So there’s a 95% chance that the

value is in this interval?” - Typical Conversation:

Statistician: “No: you see, parameters by

deﬁnition can’t vary, so referring to chance in

that context is meaningless. The 95% refers

to the interval itself.”

Scientist: “Oh, so there’s a 95% chance that

the value is in this interval?” - Typical Conversation:

Statistician: “No. It’s this: the long-term

limiting frequency of the procedure for

constructing this interval ensures that 95% of

the resulting ensemble of intervals contains

the value.

Scientist: “Ah, I see: so there’s a 95% chance

that the value is in this interval, right?” - Typical Conversation:

Statistician: “No… it’s that… well… just write

down what I said, OK?”

Scientist: “OK, got it. The value is 95% likely

to be in the interval.” - (Editorial aside…)

Non-statisticians naturally understand

uncertainty in a Bayesian manner.

Wouldn’t it be less confusing if we simply

used Bayesian methods? - A more practical example…
- Final Example: Line of Best Fit
- Final Example: Line of Best Fit

The Model:

Bayesian Approach uses Bayes’ Theorem: - Final Example: Line of Best Fit

The Prior:

Is a ﬂat prior on the slope appropriate? - Final Example: Line of Best Fit

The Prior:

Is a ﬂat prior on the slope appropriate?

No! - Final Example: Line of Best Fit

By symmetry arguments, we can motivate the

following uninformative prior:

Or equivalently, a ﬂat prior on these: - Frequentist Result: StatsModels
- frequentist
- Bayesian Result: emcee

(1/2) - Bayesian Result: emcee

(2/2) - frequentist

emcee - Bayesian Result: pymc

(1/2) - Bayesian Result: pymc

(2/2) - frequentist

emcee

pyMC - Bayesian Result: PyStan

(1/2) - Bayesian Result: PyStan

(2/2) - frequentist

emcee

pyMC

pyStan - Conclusion:

- Frequentism & Bayesianism fundamentally

diﬀer in their deﬁnition of probability.

- Results are similar for simple problems, but

often diﬀer for more complicated problems.

- Bayesianism provides a more natural handling

of nuisance parameters, and a more natural

interpretation of errors.

- Both paradigms are useful in the right situation,

but be careful to interpret the results

(especially frequentist results) correctly! - Thank You!

jakevdp@cs.washington.edu

@jakevdp

jakevdp

http://jakevdp.github.io

For more details on this topic, see the accompanying

proceedings paper, or the blog posts at the above site