Jia an Yu
Concordia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jia an Yu.
Mathematics of Operations Research | 2009
Jia Yuan Yu; Shie Mannor; Nahum Shimkin
We consider a learning problem where the decision maker interacts with a standard Markov decision process, with the exception that the reward functions vary arbitrarily over time. We show that, against every possible realization of the reward process, the agent can perform as well---in hindsight---as every stationary policy. This generalizes the classical no-regret result for repeated games. Specifically, we present an efficient online algorithm---in the spirit of reinforcement learning---that ensures that the agents average performance loss vanishes over time, provided that the environment is oblivious to the agents actions. Moreover, it is possible to modify the basic algorithm to cope with instances where reward observations are limited to the agents trajectory. We present further modifications that reduce the computational cost by using function approximation and that track the optimal policy through infrequent changes.
international conference on machine learning | 2009
Jia Yuan Yu; Shie Mannor
We consider a sequential decision problem where the rewards are generated by a piecewise-stationary distribution. However, the different reward distributions are unknown and may change at unknown instants. Our approach uses a limited number of side observations on past rewards, but does not require prior knowledge of the frequency of changes. In spite of the adversarial nature of the reward process, we provide an algorithm whose regret, with respect to the baseline with perfect knowledge of the distributions and the changes, is O(k log(T)), where k is the number of changes up to time T. This is in contrast to the case where side observations are not available, and where the regret is at least Ω(√T).
algorithmic learning theory | 2011
Sébastien Bubeck; Gilles Stoltz; Jia Yuan Yu
We consider the setting of stochastic bandit problems with a continuum of arms indexed by [0, 1]d. We first point out that the strategies considered so far in the literature only provided theoretical guarantees of the form: given some tuning parameters, the regret is small with respect to a class of environments that depends on these parameters. This is however not the right perspective, as it is the strategy that should adapt to the specific bandit environment at hand, and not the other way round. Put differently, an adaptation issue is raised. We solve it for the special case of environments whose mean-payoff functions are globally Lipschitz. More precisely, we show that the minimax optimal orders of magnitude Ld/(d+2) T(d+1)/(d+2) of the regret bound over T time instances against an environment whose mean-payoff function f is Lipschitz with constant L can be achieved without knowing L or T in advance. This is in contrast to all previously known strategies, which require to some extent the knowledge of L to achieve this performance guarantee.
conference on decision and control | 2009
Jia Yuan Yu; Shie Mannor
We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., nonstationary) fashion. We propose an online Q-learning style algorithm and give a guarantee on its performance evaluated in retrospect against alternative policies. Unlike previous works, the guarantee depends critically on the variability of the uncertainty in the transition probabilities, but holds regardless of arbitrary changes in rewards and transition probabilities over time. Besides its intrinsic computational efficiency, this approach requires neither prior knowledge nor estimation of the transition probabilities.
international conference on game theory for networks | 2009
Jia Yuan Yu; Shie Mannor
We consider decision-making problems in Markov decision processes where both the rewards and the transition probabilities vary in an arbitrary (e.g., non-stationary) fashion. We present algorithms that combine online learning and robust control, and establish guarantees on their performance evaluated in retrospect against alternative policies—i.e., their regret. These guarantees depend critically on the range of uncertainty in the transition probabilities, but hold regardless of the changes in rewards and transition probabilities over time. We present a version of the main algorithm in the setting where the decision-makers observations are limited to its trajectory, and another version that allows a trade-off between performance and computational complexity.
IEEE Transactions on Intelligent Transportation Systems | 2016
Wynita M. Griggs; Jia Yuan Yu; Fabian Wirth; Florian Hausler; Robert Shorten
Parking spaces are resources that can be pooled together and shared, particularly when there exist complementary daytime and nighttime users. We provide solutions to two design questions. First, given a quality of service requirement, how many spaces should be set aside as contingency during the day for nighttime users? Next, how can we replace the first-come-first-served access method by one that aims for optimal efficiency while keeping user preferences private?
IEEE Journal on Selected Areas in Communications | 2007
Jia Yuan Yu; Shie Mannor
Market mechanisms have been suggested in the last few years as a tool for allocating shared networks resources among several competing users. In this paper, we consider the efficiency loss of such mechanisms in the presence of a large number of users. We model the user interactions as a game with a heterogeneous population of players characterized by random utility functions. If the utility functions are bounded, then the non-cooperative equilibrium are nearly as efficient as the social optimum with high probability when the number of users is large. This efficiency result holds for a single link with a fixed or an increasing capacity. Using a standard probabilistic analysis, we show that the efficiency loss incurred by the market mechanism decreases almost exponentially in the number of users. If, however, the utility functions are not bounded, then the loss of efficiency does not converge to zero. We also provide results for networks by sampling the users at random based on their paths.
IEEE Transactions on Vehicular Technology | 2017
Ribal Atallah; Chadi Assi; Jia Yuan Yu
In a vehicular network where roadside units (RSUs) are deprived from a permanent grid-power connection, vehicle-to-infrastructure (V2I) communications are disrupted once the RSUs battery is completely drained. These batteries are recharged regularly either by human intervention or using energy harvesting techniques, such as solar or wind energy. As such, it becomes particularly crucial to conserve battery power until the next recharge cycle in order to maintain network operation and connectivity. This paper examines a vehicular network whose RSU dispossesses a permanent power source but is instead equipped with a large battery, which is periodically recharged. In what follows, a reinforcement learning technique, i.e., protocol for energy-efficient adaptive scheduling using reinforcement learning (PEARL), is proposed for the purpose of optimizing the RSUs downlink traffic scheduling during a discharge period. PEARLs objective is to equip the RSU with the required artificial intelligence to realize and, hence, exploit an optimal scheduling policy that will guarantee the operation of the vehicular network during the discharge cycle while fulfilling the largest number of service requests. The simulation input parameters were chosen in a way that guarantees the convergence of PEARL, whose exploitation showed better results when compared with three heuristic benchmark scheduling algorithms in terms of a vehicles quality of experience and the RSUs throughput. For instance, the deployment of the well-trained PEARL agent resulted in at least 50% improved performance over the best heuristic algorithm in terms of the percentage of vehicles departing with incomplete service requests.
Theoretical Computer Science | 2014
Sébastien Gerchinovitz; Jia Yuan Yu
We consider the problem of online linear regression on individual sequences. The goal in this paper is for the forecaster to output sequential predictions which are, after
International Journal of Control | 2015
Jakub Marecek; Robert Shorten; Jia Yuan Yu
T