TokyoR #35 Multi-Armed-Bandit Problem Masayuki Isobe Adfive, Inc.
Who am I ? • Masayuki Isobe – MS in computer science. • Graduation research is modeling statistical model for game player in logic programing with EM-learning. – Now I’m a head of own company - Adfive Inc. • dedicated in ad-tech consulting, related system development. – Interested in “scientific and business”. • Twitter: @chiral, Facebook: masayuki.isobe.14 – Anyone in this room is welcome to touch me in these SNS.
What is multi-armed-bundit? • A problem in search-theory. – Finite targets of investment (like arms of slot machine in casino) which have own reward probabilities respectively, finite resources to be bet, and limited number of trials are given. – Problem is: “what is the best bets in each trials?” • Typical dilemma: – explore vs harvest: difficulty in evaluation for partial knowledge possibly converged to rewards future. - Detailed in wikipedia:en “Multi-armed_bandit” entry
formulation • M arms, N units of res in each trial, T trials. – M, N, T are natural number. • P(Reward): Each arm’s reward probability – Binomial, poisson, etc. • In usual formulation, arms don’t have mutual relations. – Parameter is allowed to vary in trials for a complex model. • Goal: maximize the sum of rewards for all trials.
Betting strategy • Strategies can be put into four typical categories. (by wikipedia) – Semi-uniform • epsilon-greedy and its sophisticated versions. • sample source in an O’reilly book. – https://github.com/johnmyleswhite/BanditsBook – Probability matching • Suppose some distributions and bet infered parameters • Packages ‘bandit’ in CRAN – Pricing • evaluate knowleges for future rewards. – Like ethical constraints • avoid arms found out to be inferior.
Simple random strategy epsilon-greedy, with epsilon parametes from 0.1 to 0.5 e-greedy seems to be worthless to apply..
conclusion & future works • Compared to simple random betting, simple e- greedy strategy seems to have little difference. – More sophisticated strategy is needed. – Some useful algorithms are implemented in Google Analytics. • High performing algorithm for bandit problem could be applied to programmatic marketing such as RTB. – So I’ll continue exploring in R.