Is this you? Create Your Porfile

Ishai Menache

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ishai Menache is active.

Explore More

Publication

Featured researches published by Ishai Menache.

european conference on machine learning | 2002

Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

Ishai Menache; Shie Mannor; Nahum Shimkin

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient Max-Flow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments showsign ificant performance improvements, particulary in the initial learning phase.

Annals of Operations Research | 2005

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Ishai Menache; Shie Mannor; Nahum Shimkin

Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.

international conference on machine learning | 2004

Dynamic abstraction in reinforcement learning via clustering

Shie Mannor; Ishai Menache; Amit Hoze; Uri Klein

We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the topological structure of the state transitions. A clustering algorithm is then used to partition the state space to different regions. Policies for reaching the different parts of the space are separately learned and added to the model in a form of options (macro-actions). The options are used for accelerating the Q-Learning algorithm. We extend the basic algorithm and consider building a map that includes preliminary indication of the location of interesting regions of the state space, where the value gradient is significant and additional exploration might be beneficial. Experiments indicate significant speedups, especially in the initial learning phase.

international conference on computer communications | 2010

Near-Optimal Power Control in Wireless Networks: A Potential Game Approach

Utku Ozan Candogan; Ishai Menache; Asuman E. Ozdaglar; Pablo A. Parrilo

We study power control in a multi-cell CDMA wireless system whereby self-interested users share a common spectrum and interfere with each other. Our objective is to design a power control scheme that achieves a (near) optimal power allocation with respect to any predetermined network objective (such as the maximization of sum-rate, or some fairness criterion). To obtain this, we introduce the potential-game approach that relies on approximating the underlying noncooperative game with a close potential game, for which prices that induce an optimal power allocation can be derived. We use the proximity of the original game with the approximate game to establish through Lyapunov-based analysis that natural user-update schemes (applied to the original game) converge within a neighborhood of the desired operating point, thereby inducing near-optimal performance in a dynamical sense. Additionally, we demonstrate through simulations that the actual performance can in practice be very close to optimal, even when the approximation is inaccurate. As a concrete example, we focus on the sum-rate objective, and evaluate our approach both theoretically and empirically.

IEEE Journal on Selected Areas in Communications | 2008

Rate-Based Equilibria in Collision Channels with Fading

Ishai Menache; Nahum Shimkin

We consider a wireless collision channel, shared by a finite number of users who transmit to a common base station. Each user wishes to minimize its average transmission rate (or power investment), subject to minimum throughput demand. The channel quality between each user and the base station is randomly time-varying, and partially observed by the user through Channel State Information (CSI) signals. Assuming that all users employ stationary, CSI-dependent transmission policies, we investigate the properties of the Nash equilibrium of the resulting game between users. We characterize the feasible region of users throughput demands, and provide lower bounds on the channel capacity that hold both for symmetric and non-symmetric users. Our equilibrium analysis reveals that, when the throughput demands are feasible, there exist exactly two Nash equilibrium points, with one strictly better than the other (in terms of power investment) for each user. We further demonstrate that the performance gap between the two equilibria may be arbitrarily large. This motivates the need for distributed mechanisms that lead to the better equilibrium. To that end, we suggest a simple greedy (best-response) mechanism, and prove convergence to the better equilibrium. Some important stability properties of this mechanism in face of changing user population are derived as well.

international conference on computer communications | 2008

Efficient Rate-Constrained Nash Equilibrium in Collision Channels with State Information

Ishai Menache; Nahum Shimkin

We consider a wireless collision channel, shared by a finite number of users who transmit to a common base station. Users are self-optimizing, and each wishes to minimize its average transmission rate (or power investment), subject to minimum- throughput demand. The channel quality between each user and the base station is time-varying, and partially observed by the user in the form of channel state information (CSI) signals. We assume that each user can transmit at a fixed power level and that its transmission decision at each time slot is stationary in the sense that it can depend only on the current CSI. We are interested in properties of the Nash equilibrium of the resulting game between users. We define the feasible region of users throughput demands, and show that when the demands are within this region, there exist exactly two Nash equilibrium points, with one strictly better than the other (in terms of invested power) for all users. We further provide some lower bounds on the channel capacity that can be obtained, both in the symmetric and non-symmetric case. Finally, we show that a simple greedy mechanism converges to the best equilibrium point without requiring any coordination between the users.

IEEE Transactions on Automatic Control | 2009

Dynamic Discrete Power Control in Cellular Networks

Eitan Altman; Konstantin Avrachenkov; Ishai Menache; Gregory B. Miller; Balakrishna Prabhu; Adam Shwartz

We consider an uplink power control problem where each mobile wishes to maximize its throughput (which depends on the transmission powers of all mobiles) but has a constraint on the average power consumption. A finite number of power levels are available to each mobile. The decision of a mobile to select a particular power level may depend on its channel state. We consider two frameworks concerning the state information of the channels of other mobiles: i) the case of full state information and ii) the case of local state information. In each of the two frameworks, we consider both cooperative as well as non-cooperative power control. We manage to characterize the structure of equilibria policies and, more generally, of best-response policies in the non-cooperative case. We present an algorithm to compute equilibria policies in the case of two non-cooperative players. Finally, we study the case where a malicious mobile, which also has average power constraints, tries to jam the communication of another mobile. Our results are illustrated and validated through various numerical examples.

modeling and optimization in mobile ad hoc and wireless networks | 2008

Battery-state dependent power control as a dynamic game

Ishai Menache; Eitan Altman

Consider an uplink cellular network shared by a finite number of mobile users with limited batteries. Whenever the battery drains out, the user pays a fixed price to recharge the battery. Users, assumed to have always traffic to send, control their transmission power in a noncooperative way. The novelty of our model is in considering the dynamic game in which the transmission power of a player may depend on the amount of energy left in its battery. We consider various models and various types of constraints and derive for each one the structure of the equilibrium. A particular interesting structure is obtained when there are constraints on the maximum transmission power which become tighter as the battery drains out. Using Schur convexity and majorization, we identify an equilibrium where each mobile distributes the power of each battery along the batterypsilas lifetime in a way equivalent to a max-min assignment.

international conference on computer communications | 2011

A state action frequency approach to throughput maximization over uncertain wireless channels

Krishna P. Jagannathan; Shie Mannor; Ishai Menache; Eytan Modiano

We consider scheduling over a wireless system, where the channel state information is not available a priori to the scheduler, but can be inferred from the past. Specifically, the wireless system is modeled as a network of parallel queues. We assume that the channel state of each queue evolves stochastically as an ON/OFF Markov chain. The scheduler, which is aware of the queue lengths but is oblivious of the channel states, has to choose one queue at a time for transmission. The scheduler has no information regarding the current channel states, but can estimate them by using the acknowledgment history. We first characterize the capacity region of the system using tools from Markov Decision Processes (MDP) theory. Specifically, we prove that the capacity region boundary is the uniform limit of a sequence of Linear Programming (LP) solutions. Next, we combine the LP solution with a queue length based scheduling mechanism that operates over long ‘frames,’ to obtain a throughput optimal policy for the system. By incorporating results from MDP theory within the Lyapunov-stability framework, we show that our frame-based policy stabilizes the system for all arrival rates that lie in the interior of the capacity region.

NET-COOP'07 Proceedings of the 1st EuroFGI international conference on Network control and optimization | 2007

Fixed-rate equilibrium in wireless collision channels

Ishai Menache; Nahum Shimkin

We consider a collision channel, shared by a finite number of self-interested users with heterogenous throughput demands. It is assumed that each user transmits with a fixed probability at each time slot, and the transmission is successful if no other user transmits simultaneously. Each user is interested in adjusting its transmission rate so that its throughput demand is met. When throughput requirements are feasible, we show that there exist two equilibrium points where users satisfy their respective demands. In one equilibrium all users transmit at lower rates, compared to their transmission rates at the other equilibrium. This fact is meaningful in wireless systems, where lower transmission rates translate to power savings. Subsequently, we propose a distributed scheme that ensures convergence to the lower-rate equilibrium point. We also provide some lower bounds on the channel throughput that is obtained with self-interested users, both in the symmetric and non-symmetric case.

Explore More