Is this you? Create Your Porfile

Sattar Vakili

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sattar Vakili is active.

Explore More

Publication

Featured researches published by Sattar Vakili.

IEEE Journal of Selected Topics in Signal Processing | 2013

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Sattar Vakili; Keqin Liu; Qing Zhao

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies. It is shown that for all light-tailed reward distributions, DSEE achieves the optimal logarithmic order of the regret, where regret is defined as the total expected reward loss against the ideal case with known reward models. For heavy-tailed reward distributions, DSEE achieves O(T1/p) regret when the moments of the reward distributions exist up to the pth order for and O(T1/(1+p/2)) for p > 2. With the knowledge of an upper bound on a finite moment of the heavy-tailed reward distributions, DSEE offers the optimal logarithmic regret order. The proposed DSEE approach complements existing work on MAB by providing corresponding results for general reward distributions. Furthermore, with a clearly defined tunable parameter-the cardinality of the exploration sequence, the DSEE approach is easily extendable to variations of MAB, including MAB with various objectives, decentralized MAB with multiple players and incomplete reward observations under collisions, restless MAB with unknown dynamics, and combinatorial MAB with dependent arms that often arise in network optimization problems such as the shortest path, the minimum spanning tree, and the dominating set problems under unknown random weights.

IEEE Journal of Selected Topics in Signal Processing | 2016

Risk-Averse Multi-Armed Bandit Problems Under Mean-Variance Measure

Sattar Vakili; Qing Zhao

The multi-armed bandit (MAB) problems have been studied mainly under the measure of expected total reward accrued over a horizon of length T . In this paper, we address the issue of risk in MAB problems and develop parallel results under the measure of mean-variance, a commonly adopted risk measure in economics and mathematical finance. We show that the model-specific regret and the model-independent regret in terms of the mean-variance of the reward process are lower bounded by Ω(logT) and Ω(T2/3), respectively. We then show that variations of the UCB policy and the DSEE policy developed for the classic risk-neutral MAB achieve these lower bounds.

allerton conference on communication, control, and computing | 2015

Mean-variance and value at risk in multi-armed bandit problems

Sattar Vakili; Qing Zhao

We study risk-averse multi-armed bandit problems under different risk measures. We consider three risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second and the third models, the quantity of interest is the total reward at the end of the time horizon, and the objective is to minimize the mean-variance and maximize the value at risk of the total reward, respectively. We develop risk-averse online learning policies and analyze their regret performance. We also provide tight lower bounds on regret under the model of mean-variance of observations.

asilomar conference on signals, systems and computers | 2014

Time-varying stochastic multi-armed bandit problems

Sattar Vakili; Qing Zhao; Yuan Zhou

In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.

asilomar conference on signals, systems and computers | 2013

Achieving complete learning in Multi-Armed Bandit problems

Sattar Vakili; Qing Zhao

In the classic Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward distributions. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. It is known that the minimum growth rate of regret (defined as the total expected loss with respect to the ideal scenario of known reward models of all arms) is logarithmic with T. In other words, mistakes in selecting suboptimal arms occur infinitely often, and the player will never converge to the arm with the largest reward mean. In this paper, we are interested in the questions that whether side information on the reward model can lead to bounded regret, thus, complete learning, and what is the minimum side information to achieve complete learning. We show that the knowledge of a value η between the largest and the second largest reward mean (among all arms) leads to complete learning by constructing an online learning policy with bounded regret. This result applies to both light-tailed and heavy-tailed reward distributions.

international conference on acoustics, speech, and signal processing | 2015

Risk-averse online learning under mean-variance measures

Sattar Vakili; Qing Zhao

We study risk-averse multi-armed bandit problems under mean-variance measures. We consider two risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second model, the quantity of interest is the total reward at the end of the time horizon and the objective is to minimize the mean-variance of the total reward. Under both models, we establish asymptotic as well as finite-time lower bounds on regret and develop online learning a time horizon algorithms that achieve the lower bounds.

international conference on acoustics, speech, and signal processing | 2015

Quickest detection of short-term voltage instability with PMU measurements

Sattar Vakili; Qing Zhao; Lang Tong

The quickest detection of short-term voltage instability in a smart grid is considered. The problem is formulated as a binary sequential composite hypothesis testing where the null hypothesis is a non-stationary process with an unknown exponentially decaying mean and the alternative is a nonstationary process with an unknown exponentially increasing mean. A sequential generalized likelihood ratio test (SGLRT) is proposed and analyzed. It is shown that the proposed SGLRT is asymptotically optimal.

military communications conference | 2017

Online learning with side information

Xiao Xu; Sattar Vakili; Qing Zhao; Ananthram Swami

An online learning problem with side information is considered. The problem is formulated as a graph structured stochastic Multi-Armed Bandit (MAB). Each node in the graph represents an arm in the bandit problem and an edge between two arms indicates closeness in their mean rewards. It is shown that such side information induces a Unit Interval Graph and several graph properties can be leveraged to achieve a sublinear regret in the number of arms while preserving the optimal logarithmic regret in time. A lower bound on regret is established and a hierarchical learning policy that is order optimal in terms of both the number of arms and the learning horizon is developed.

conference on decision and control | 2015

Bayesian quickest short-term voltage instability detection in power systems

Sattar Vakili; Qing Zhao; Lang Tong

The quickest detection of short-term voltage instability in power systems is considered. The problem is formulated as a Bayesian quickest change-point detection where the pre-change and post-change measurements are non-stationary processes with exponentially decaying and exponentially increasing expectations, respectively. Quickest change detection schemes are proposed and analyzed under both known and unknown post-change models. It is shown that the proposed tests are asymptotically optimal. The results also find applications in instability detection of a general linear system with distinct real eigenvalues.

asilomar conference on signals, systems and computers | 2013

Distributed node-weighted connected dominating set problems

Sattar Vakili; Qing Zhao

The Minimum Connected Dominating Set (MCDS) problem is to find a subset of vertices in a given graph G such that the set is connected and any vertex of G is either in the set or adjacent to a node in the set. This problem is shown to be NP-Hard and the best polynomial time approximation ratio is O(log n) where n is the number of vertices. The MCDS problem and its derivations are of interest in many network applications such as finding a minimum size virtual backbone for routing and broadcasting in ad-hoc networks. In this paper, we consider node-weighted CDS problem where positive real valued weights are assigned to the vertices and the objective is to find a CDS with the minimum weight. We propose the first distributed algorithm for the problem and demonstrate its optimal O(log n) approximation ratio. We then consider the case where the node weights are random variables with unknown distributions and develop a distributed learning algorithm based on the multi-armed bandit theory. We show that the distributed learning algorithm offers the optimal logarithmic regret order with respect to the time horizon length.

Explore More