webinar register page

IMA Data Science Seminar - Vlad Kobzar
Speaker: Vlad Kobzar
Title: A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit
Abstract: The multi-armed bandit is a classic sequential prediction problem. At
each round, the predictor (player) selects a probability distribution
from a finite collection of distributions (arms) with the goal of
minimizing the difference (regret) between the player’s rewards
sampled from the selected arms and the rewards of the arm with the
highest expected reward. The player’s choice of the arm and the reward
sampled from that arm are revealed to the player, and this prediction
process is repeated until the final round. Our work addresses a
version of the two-armed bandit problem where the arms are distributed
independently according to Bernoulli distributions and the sum of the
means of the arms is one (the symmetric two-armed Bernoulli bandit).
In a regime where the gap between these means goes to zero and the
number of prediction periods approaches infinity, we obtain the
leading order terms of the expected regret for this problem by
associating it with a solution of a linear parabolic partial
differential equation. Our results improve upon the previously known
results; specifically we explicitly compute the leading order term of
the optimal regret in three different scaling regimes for the gap.
Additionally, we obtain new non-asymptotic bounds for any given time
horizon. This is joint work with Robert Kohn available at

Nov 15, 2022 01:25 PM in Central Time (US and Canada)

Webinar is over, you cannot register now. If you have any questions, please contact Webinar host: Georgia M Kroll.