このページは http://www.slideshare.net/EtharAlali/what-abtesting の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

byVeolia

約2年前 (2014/09/28)にアップロードinテクノロジー

IT’s A/B-testing and lean-startup techniques can learn a lot from experimental design and statist...

IT’s A/B-testing and lean-startup techniques can learn a lot from experimental design and statistics. For those of you not that confident or familiar with such techniques, here is a little intro to help you on your way :)

- A/B-Testing: An Introduction

What is it? Why Use it? - Prediction in Predictable Environments

Predictable Models Excel in Deterministic

Environments

Statics & Dynamics Don’t Change

• ‘Fitness’ for purpose always measured the

same

• Frictionless Pendulum swing Very

Predictable

– Simple Harmonic Motion

• Control Systems

– e.g. Anti-lock Braking System

Sacrilege:

Time Period give by

Learning is pointless (it’s all known), thus

Waterfall/Heavy Development Methods

Excel! :-O - Uncertain/Unpredictable Contexts

• Human Interaction

Uncertain.

• Everyone is…

– Different

– [Relatively] fickle

– Growing Older

– Influenced By Other Stuff

– …

• Definition of fitness for

purposes changes

• In fact, Everything Changes! - Story of the Foot

• Once upon a time there was a foot which Belonged to the King

of a Powerful Kingdom

• He Reigned Supreme because Al Swords Had to be 7 ft Long

• King dies natural y and a new King is Coronated

• But he has a Big Ego and Real y Smal Feet

– Half the length of Previous King

• He Ordains Al Swords Now not Fit for Purpose

• So they’re Melted & Remade to 7 of his feet

• Along come Evil Army with swords now Twice as Long

• Nobody in the Kingdom Lived Happily Ever After! :-( - Q: HOW CAN WE EVER BE

PREDICTABLE? - Pick Your Tool: Certainty v Uncertainty

Predictable Environments

Uncertain Environments

• Lots known up front

• Very little known up front

• ‘Variables/factors’ can all be identified…

• Variable levels of traffic,

• …So can predict with high certainty where

experience etc.

whole systems will be in t time-steps

– seconds, minutes, hours, days, weeks, months,

• ‘Fitness function’ itself changes

years…

– e.g. King changes = Foot changes

• Little Need to Adapt

• Most appropriate for Standards Models

• Continual need to check the

– SI Units

fitness function…

– HTTP/SMTP/POP3…

– e.g. Customer reviews,

• ‘Dictate works’, not nice, but true

performance metrics

• e.g. ‘7ft’ Swords will have continued to

exist

• Infers Continual Need to

– Even if the heads of the blacksmiths didn’t.

Change/Improve Systems - EXAMPLE: Running a Bath (Uncertain)

Predictable Models

Uncertainty Models

• Don’t know the water temperature

•

Don’t know the water temperature

• Never done it before

•

Never done it before

1. Put hot tap on for 5 seconds

2. Put cold tap on for 2 seconds

1.

Put hot tap on for 5 minutes

3. Dip toe in

2.

Cold Tap on for 2 minutes

4. If

3.

Get in

• Too hot add cold water

• Too cold add hot water

• Else get in & relax

5. Go to 1 (Rinse, Repeat)

Risks

Risks

Slightly more time to get to ideal temperature,

Scolding your Jewels and More!

but gets there with much less risk of burning

crucial elements and potential less water waste. - EXAMPLE: Running a Bath Cycle

Run Water

(Hot and

Cold) - Build

I burnt my

Best test this with

my toe, so I don’t

toe! Not

scald myself…

doing that

Ahh, F@#*!!!

again!

THAT’S HOT!

Evaluate

Test with

Temperature

‘Toe’ -

- Learn

Measure - Dealing with Uncertainty

• More variables than equations to solve them…

• …Hence optimisation problem (no unique solution)

• Like it or not, iterative cycles work best

– Build-Measure-Learn; DMAIC

• Frequent Experiments & Actionable Change

• Control by Experimental Design Principles

– Test one change in isolation

– Compare against a control group/result

– Randomise Groupings

– Double Blind

• Plus, smaller tasks = smaller variance = greater certainty

Gold Standard: Randomised Double Blind Controlled Trial - Definition: Randomised

• Two groups

• Randomly Assign

Subjects to Each Group - Definition: Double Blind

Both Researcher & Subject

Don’t know which group

they are assigned to.

So researcher and subject

behave the same for A and B

Image via ’John the Math Guy’

tests.

TIP: Automated allocation - Definition: Control ed

Every potential factor is

fixed aside from the factor

under test.

Minimises ‘confounding

variables’

e.g. If someone goes outside and

gets wet, does it mean it’s raining?

Image via ‘Not the average’ blog - Designing Experiments

• Start with Hypothesis

– Include theory if analytical

• Experiment AGAINST a control group!

– Control Group = Baseline to compare against (B-test)

– Experimental Group is A-test

• Randomly Allocate Control & Experimental Group

– Ideally Researcher & Subject Can’t Know

• Analyse Results, Conclude AND Act! - Caution

• Change only one thing at once!

– Can do A/B/n tests, but have to be linearly independent variables

• statistical y, not a certainty!

• Objective: Make sure results aren’t by chance (e.g. against placebo)!

• Analyse against ‘Nul ’ Hypothesis

– Opposite of what you are trying to prove

• Factor in type 1 & 2 statistical errors

– False positive and Negatives

• Your test is alternate hypothesis

• If Nul hypothesis (Chance) is very very smal , accept Alternate hypothesis…

– ‘Small-p’ = probability null hypothesis is true

• …which you are trying to prove!

• Otherwise, no choice but to accept nul hypothesis - Q: Where Can A/B-testing Be Used?

A: EVERYWHERE! - Where Can A/B-Tests Be Used?

• Guerrilla testing

• Lean-Startup A/B-Tests (tech, marketing etc.)

• Pilots

• Experiments

• Proof of Concepts

• Software Development Team Retrospectives

• Manufacturing Processes

• Change Programmes

• Departmental Effectiveness

• … - Q: What tools can we use?

A: STATISTICS - Toolbox: Normal Distribution

Data that is normal y distributed

shown as a continuous line.

Fixed width histogram = Same (right)

Pros:

1. Incredibly diverse

2. Tables/Excel Functions exist

Cons:

3. Needs many samples (25+)

– Errors significantly impact result &

need other ways (e.g. t-test)

4. Can’t Always Force Normality

Source: Critical Numbers Group Sheffield University

–

But story point estimates can! - Toolbox: Confidence Intervals

Indicates reliability of estimate, given

data = Likelihood that result falls within

values of x-standard deviations of the

mean.

Answers “How sure are you that this

result was expected?”

Pros:

1. Easy to do

2. Excel Functions/Libraries exist

Source: Moz.com

Cons:

3. Same weakness as normal distribution

4. Arbitrary confidence intervals

– Researcher chooses, but 95% defacto

standard (2 sigma) - Toolbox: Correlation Matrix

Matrix of elements. Each is correlation

coefficient of data v data.

“How strongly does this relate to

that?”

High correlation -> dig deeper

Pros:

1. Excel Functions/Libraries exist

Cons:

2. Correlation isn’t Causation!

3. More of a ‘faff’ in Excel

Source: Genome biology

– Prone to human error in analysis - Toolbox: Factor Analysis

Using correlation matrix to identify

factors, determine independent

variables for dependent variables.

Pros:

1. Linear Algebra tools to help

2. Identifies combinations of factors

Cons:

3. Excel doesn’t support it native

4. ‘Cancelling’ factors or confounding

factors problematic

5. Have to understand linear algebra

6. Basical y an approximation (so

Source: Kovach Computing Services

what’s good enough?) - Definitions

TERM

DESCRIPTION

Dependent Variable

A variable that depends on one or more other variables

(y = x + 2, y is dependent, x is independent)

Independent Variable

A variable that does not depend on the value of any

other variable.

Confounding Variable

A variable that could independently present the same

result as some other variable. This reduces the

credibility and certainty of a result (e.g. if I go outside

and I get wet, is it because it was raining?)

Distribution

The ‘shape’ of the graph of a random variable

Type 1 Error (False

Declaring a result as confirmed when it’s not, usual y

Positive)

through experimental error.

Type 2 Error (False

Declaring a result as false when it’s true. Usual y by

Negative)

experimental or interpretive error.. - Thanks for Viewing

Further Reading

Random Variables and Probability Distributions

https://

www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/random-variab

les

Khan Academy

Confidence Intervals

http://en.wikipedia.org/wiki/Confidence_interval

Normal Distribution

http://en.wikipedia.org/wiki/Normal_distribution

“Correlation & Dependence” Wikipedia

http://en.wikipedia.org/wiki/Correlation_and_dependence

Factor Analysis

http://en.wikipedia.org/wiki/Factor_analysis

Genome Biology

http://genomebiology.com/

Publishes research, software and new methods

About Us

Specialist ICT Strategists & Advisors.

Ethar Alali

Member of HiveMind Network for some of

@EtharUK @Dynacognetics

the biggest household and corporate multi-

Managing Director & Chief Architect

nationals.

Accr P

e o

diltym

ati a

o t

n h

s -

&M

A a

s t

s h

o M

cia o

ti .

o P

nsrogramming since 9 years old. TOGAF 9 Certified, change agent.

Accredited Growth Voucher Advisors

Blog: GoadingtheITGeek.blogspot.co.uk

certified to deliver IT & Web Growth

Consultancy as part of the government’s

Growth Voucher Scheme.