Is this you? Create Your Porfile

Shie Mannor

Technion – Israel Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shie Mannor is active.

Explore More

Publication

Featured researches published by Shie Mannor.

IEEE Transactions on Signal Processing | 2004

The kernel recursive least-squares algorithm

Yaakov Engel; Shie Mannor; Ron Meir

We present a nonlinear version of the recursive least squares (RLS) algorithm. Our algorithm performs linear regression in a high-dimensional feature space induced by a Mercer kernel and can therefore be used to recursively construct minimum mean-squared-error solutions to nonlinear least-squares problems that are frequently encountered in signal processing applications. In order to regularize solutions and keep the complexity of the algorithm bounded, we use a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples. This sparsification procedure allows the algorithm to operate online, often in real time. We analyze the behavior of the algorithm, compare its scaling properties to those of support vector machines, and demonstrate its utility in solving two signal processing problems-time-series prediction and channel equalization.

international conference on machine learning | 2005

Reinforcement learning with Gaussian processes

Yaakov Engel; Shie Mannor; Ron Meir

Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

IEEE Transactions on Signal Processing | 2008

Fully Parallel Stochastic LDPC Decoders

Saeed Sharifi Tehrani; Shie Mannor; Warren J. Gross

Stochastic decoding is a new approach to iterative decoding on graphs. This paper presents a hardware architecture for fully parallel stochastic low-density parity-check (LDPC) decoders. To obtain the characteristics of the proposed architecture, we apply this architecture to decode an irregular state-of-the-art (1056,528) LDPC code on a Xilinx Virtex-4 LX200 field-programmable gate-array (FPGA) device. The implemented decoder achieves a clock frequency of 222 MHz and a throughput of about 1.66 Gb/s at Eb/N0=4.25 dB (a bit error rate of 10-8). It provides decoding performance within 0.5 and 0.25 dB of the floating-point sum-product algorithm with 32 and 16 iterations, respectively, and similar error-floor behavior. The decoder uses less than 40% of the lookup tables, flip-flops, and IO ports available on the FPGA device. The results provided in this paper validate the potential of stochastic LDPC decoding as a practical and competitive fully parallel decoding approach.

conference on learning theory | 2002

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

Eyal Even-Dar; Shie Mannor; Yishay Mansour

The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O(n/?2 log 1/?) times to find an ?-optimal arm with probability of at least 1 - ?. This is in contrast to the naive bound of O(n/?2 log n/?). We derive another algorithm whose complexity depends on the specific setting of the rewards, rather than the worst case setting. We also provide a matching lower bound. We show how given an algorithm for the PAC model Multi-armed Bandit problem, one can derive a batch learningalg orithm for Markov Decision Processes. This is done essentially by simulatingV alue Iteration, and in each iteration invokingt he multi-armed bandit algorithm. Using our PAC algorithm for the multi-armed bandit problem we improve the dependence on the number of actions.

european conference on machine learning | 2002

Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning

Ishai Menache; Shie Mannor; Nahum Shimkin

We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning agent creates an on-line map of the process history, and uses an efficient Max-Flow/Min-Cut algorithm for identifying bottlenecks. The policies for reaching bottlenecks are separately learned and added to the model in a form of options (macro-actions). We then extend the basic Q-Cut algorithm to the Segmented Q-Cut algorithm, which uses previously identified bottlenecks for state space partitioning, necessary for finding additional bottlenecks in complex environments. Experiments showsign ificant performance improvements, particulary in the initial learning phase.

Annals of Operations Research | 2005

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Ishai Menache; Shie Mannor; Nahum Shimkin

Reinforcement Learning (RL) is an approach for solving complex multi-stage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters. Function approximation is essential for problems with a large state space, as it facilitates compact representation and enables generalization. Linear approximation architectures (where the adjustable parameters are the weights of pre-fixed basis functions) have recently gained prominence due to efficient algorithms and convergence guarantees. Nonetheless, an appropriate choice of basis function is important for the success of the algorithm. In the present paper we examine methods for adapting the basis function during the learning process in the context of evaluating the value function under a fixed control policy. Using the Bellman approximation error as an optimization criterion, we optimize the weights of the basis function while simultaneously adapting the (non-linear) basis function parameters. We present two algorithms for this problem. The first uses a gradient-based approach and the second applies the Cross Entropy method. The performance of the proposed algorithms is evaluated and compared in simulations.

international conference on machine learning | 2006

Automatic basis function construction for approximate dynamic programming and reinforcement learning

Philipp W. Keller; Shie Mannor; Doina Precup

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castañon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

IEEE Communications Letters | 2006

Stochastic decoding of LDPC codes

Saeed Sharifi Tehrani; Warren J. Gross; Shie Mannor

This letter presents the first successful method for iterative stochastic decoding of state-of-the-art low-density parity-check (LDPC) codes. The proposed method shows the viability of the stochastic approach for decoding LDPC codes on factor graphs. In addition, simulation results for a 200 and a 1024 length LDPC code demonstrate the near-optimal performance of this method with respect to sum-product decoding. The proposed method has a significant potential for high-throughput and/or low complexity iterative decoding.

Operations Research | 2010

Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

Erick Delage; Shie Mannor

Markov decision processes are an effective tool in modeling decision making in uncertain dynamic environments. Because the parameters of these models typically are estimated from data or learned from experience, it is not surprising that the actual performance of a chosen strategy often differs significantly from the designers initial expectations due to unavoidable modeling ambiguity. In this paper, we present a set of percentile criteria that are conceptually natural and representative of the trade-off between optimistic and pessimistic views of the question. We study the use of these criteria under different forms of uncertainty for both the rewards and the transitions. Some forms are shown to be efficiently solvable and others highly intractable. In each case, we outline solution concepts that take parametric uncertainty into account in the process of decision making.

international conference on communications | 2009

Stochastic Decoding of LDPC Codes over GF(q)

Gabi Sarkis; Shie Mannor; Warren J. Gross

Nonbinary LDPC codes have been shown to outperform currently used codes for magnetic recording and several other channels. Currently proposed nonbinary decoder architectures have very high complexity for high-throughput implementations and sacrifice error-correction performance to maintain realizable complexity. In this paper, we present an alternative decoding algorithm based on stochastic computation that has a very simple implementation and minimal performance loss when compared to the sum-product algorithm. We demonstrate the performance of the algorithm when applied to a GF(16) code and provide details of the hardware resources required for an implementation.

Explore More