IEEE transactions on neural networks and learning systems | 2021

Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs.

 
 
 

Abstract


We study a family of adversarial (a.k.a. nonstochastic) multi-armed bandit (MAB) problems, wherein not only the player cannot observe the reward on the played arm (self-unaware player) but also it incurs switching costs when shifting to another arm. We study two cases: In Case 1, at each round, the player is able to either play or observe the chosen arm, but not both. In Case 2, the player can choose an arm to play and, at the same round, choose another arm to observe. In both cases, the player incurs a cost for consecutive arm switching due to playing or observing the arms. We propose two novel online learning-based algorithms each addressing one of the aforementioned MAB problems. We theoretically prove that the proposed algorithms for Case 1 and Case 2 achieve sublinear regret of O(√⁴KT³ln K) and O(√³(K-1)T²ln K), respectively, where the latter regret bound is order-optimal in time, K is the number of arms, and T is the total number of rounds. In Case 2, we extend the player s capability to multiple m>1 observations and show that more observations do not necessarily improve the regret bound due to incurring switching costs. However, we derive an upper bound for switching cost as c ≤ 1/√³m² for which the regret bound is improved as the number of observations increases. Finally, through this study, we found that a generalized version of our approach gives an interesting sublinear regret upper bound result of Õ(Ts+1/s+2) for any self-unaware bandit player with s number of binary decision dilemma before taking the action. To further validate and complement the theoretical findings, we conduct extensive performance evaluations over synthetic data constructed by nonstochastic MAB environment simulations and wireless spectrum measurement data collected in a real-world experiment.

Volume PP
Pages None
DOI 10.1109/TNNLS.2021.3110194
Language English
Journal IEEE transactions on neural networks and learning systems

Full Text