Yusen Zhan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yusen Zhan is active.

Explore More

Publication

Featured researches published by Yusen Zhan.

Autonomous Agents and Multi-Agent Systems | 2017

Efficiently detecting switches against non-stationary opponents

Pablo Hernandez-Leal; Yusen Zhan; Matthew E. Taylor; L. Enrique Sucar; Enrique Munoz de Cote

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiagent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in these cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the environment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that (1) learns a model of the opponent, (2) uses that to obtain an optimal policy and then (3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

Pattern Recognition | 2017

Scalable lifelong reinforcement learning

Yusen Zhan; Haitham Bou Ammar; Matthew E. Taylor

Abstract Lifelong reinforcement learning provides a successful framework for agents to learn multiple consecutive tasks sequentially. Current methods, however, suffer from scalability issues when the agent has to solve a large number of tasks. In this paper, we remedy the above drawbacks and propose a novel scalable technique for lifelong reinforcement learning. We derive an algorithm which assumes the availability of multiple processing units and computes shared repositories and local policies using only local information exchange. We then show an improvement to reach a linear convergence rate compared to current lifelong policy search methods. Finally, we evaluate our technique on a set of benchmark dynamical systems and demonstrate learning speed-ups and reduced running times.

Autonomous Agents and Multi-Agent Systems | 2017

An exploration strategy for non-stationary opponents

Pablo Hernandez-Leal; Yusen Zhan; Matthew E. Taylor; L. Enrique Sucar; Enrique Munoz de Cote

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environments. Our proposed adversarial drift exploration (DE) is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non-stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial environment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present DE as a strategy for switch detection. Second, we propose a new algorithm called R-max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: (1) to maximize utilities in the short term while learning and (2) eventually explore opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient DE, to deal with the non-stationary nature of the opponent. We show experimentally how using DE outperforms the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

international conference on tools with artificial intelligence | 2012

On the Complexity and Algorithms of Coalition Structure Generation in Overlapping Coalition Formation Games

Yusen Zhan; Jun Wu; Chongjun Wang; Junyuan Xie

The issues of coalition formation have been investigated from many aspects and recently more and more attention has been paid to overlapping coalition formation. The (optimal) coalition structure generation(CSG) problem is one of the essential problems in coalition formation which is an important topic of cooperation in multiagent system. In this paper, we strictly define the coalition structure generation problem of overlapping extensions. Based on these definitions, we systematically prove some computational complexity results for overlapping coalition formation(OCF) games and threshold task games(TTGs). Moreover, dynamic programming and greedy approaches are adopted to solve the CSG for TTG.

Neural Computation | 2017

Nonconvex Policy Search Using Variational Inequalities

Yusen Zhan; Haitham Bou Ammar; Matthew E. Taylor

Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units. Motivated by such constraints, we propose projection-based methods for safe policies. These methods, however, can handle only convex policy constraints. In this letter, we propose the first safe policy search reinforcement learner capable of operating under nonconvex policy constraints. This is achieved by observing, for the first time, a connection between nonconvex variational inequalities and policy search problems. We provide two algorithms, Mann and two-step iteration, to solve the above problems and prove convergence in the nonconvex stochastic setting. Finally, we demonstrate the performance of the algorithms on six benchmark dynamical systems and show that our new method is capable of outperforming previous methods under a variety of settings.

international joint conference on artificial intelligence | 2016