Call for Abstracts

Firms often have to sequentially decide how to allocate a finite budget among competing actions. For example, advertisers must decide which ad to show to the next website visitor given several ad copies. Online retailers must decide which products to recommend next. Learning in such situations is not trivial because choosing one alternative improves estimates about its success probability but takes resources away from a potentially better alternative. When the firm is simultaneously interested in learning and in revenue, i.e., in ‘learning while earning’, these problems are referred to as Multi-Armed Bandit (MAB) problems. These algorithms use tools and techniques ranging from inference, predictive analytics, and general learning algorithms to address the exploration-exploitation trade-off inherent to such problems.

Topics & Domains

The topics of the workshop include, but are not limited to, the following:

Applications to online advertising, consumer search, news and product recommendation, product recommendations, energy markets, clinical trials, experimental design, portfolio management, website design, and many other domains
Theoretical aspects of the exploration-exploitation tradeoff
Adaptive learning algorithms, broadly defined
Novel statistical machine-learning methods for informing sequential decision making
POMDPs with different types of rewards (e.g., terminal or cumulative), state (in)dependence, and policies
Optimality and convergence
New approaches to address thorny practical challenges, such as the curse of dimensionality, scalability, or latency (in online problems) when computing, approximating and learning optimal and near-optimal policies
Diverse approaches including regret minimization, dynamic allocation indices, confidence bound, dynamic programming, sequential experimentation, look-ahead, and others