Bikramjit Banerjee
University of Southern Mississippi
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bikramjit Banerjee.
workshop on parallel and distributed simulation | 2008
Bikramjit Banerjee; Ahmed Abukmail; Landon Kraemer
We adapt a scalable layered intelligence technique from the game industry, for agent-based crowd simulation. We extend this approach for planned movements, pursuance of assignable goals, and avoidance of dynamically introduced obstacles/threats, while keeping the system scalable with the number of agents. We exploit parallel processing for expediting the pre-processing step that generates the path-plans offline. We demonstrate the various behaviors in a hall-evacuation scenario, and experimentally establish the scalability of the frame-rates with increasing number of agents.
adaptive agents and multi-agents systems | 2003
Bikramjit Banerjee; Jing Peng
Inspired by the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA, we explore an alternative version of WoLF. We show that our new WoLF criterion (PDWoLF) is also accurate in 2 × 2 games, while being accurately computable even in more than 2-action games, unlike WoLF that relies on estimation. In particular, we show that this difference in accuracy in more than 2-action games translates to faster convergence (to Nash equilibrium policies in self-play) for PDWoLF in conjunction with the general Policy Hill Climbing algorithm. Interestingly, this expedience gets more pronounced with increasing learning rate ratio, for which we also delve into an explanation. We also show experimentally that learning faster with PDWoLF could also entail learning better policies earlier in self play. Finally we present the scalable version of PDWoLF and show that even in such domains requiring generalizations and approximations, PDWoLF could dominate WoLF in performance.
Autonomous Agents and Multi-Agent Systems | 2000
Rajatish Mukherjee; Bikramjit Banerjee; Sandip Sen
The multiagent learning literature has looked at iterated two-player games to develop mechanisms that allow agents to learn to converge on Nash Equilibrium strategy profiles. An equilibrium configuration implies that there is no motivation for one player to change its strategy if the other does not. Often, in general sum games, a higher payoff can be obtained by both players if one chooses not to respond optimally to the other player. By developing mutual trust, agents can avoid iterated best responses that will lead to a lesser payoff Nash Equilibrium. In this paper we consider 1-level agents (modelers) who select actions based on expected utility considering probability distributions over the actions of the opponent(s). We show that in certain situations, such stochastically-greedy agents can perform better (by developing mutually trusting behavior) than those that explicitly attempt to converge to Nash Equilibrium. We also experiment with an interesting action revealation strategy that can give the revealer better payoff on convergence than a non-revealing approach. By revealing, the revealer enables the opponent to agree to a more trusted equilibrium.
adaptive agents and multi-agents systems | 2000
Bikramjit Banerjee; Sandip Sen
1. I N T R O D U C T I O N In an open environment the goal of a self-interested agent would be to interact with or enter into partnership with those agents that will produce maximal local utility for this agent [5]. We assume that an agent can get one of several payoffs (utilities) for joining a particular coalition which is determined by a static probability distribution over the payoffs for each coalition. We consider choosing a single coalition to interact with repeatedly when the total number of interactions is known ahead of time.
adaptive agents and multi-agents systems | 2005
Bikramjit Banerjee; Jing Peng
We provide a uniform framework for learning against a recent history adversary in arbitrary repeated bimatrix games, by modeling such an agent as a Markov Decision Process. We focus on learning an optimal non-stationary policy in such an MDP over a finite horizon and adapt an existing efficient Monte Carlo based algorithm for learning optimal policies in such MDPs. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, a simple experiment in the Prisoners Dilemma game shows that even when no extra domain knowledge (besides that the opponents memory size is known) is assumed, the error can still be small.
Simulation | 2009
Bikramjit Banerjee; Ahmed Abukmail; Landon Kraemer
We adapt a scalable layered intelligence technique from the game industry, for agent-based crowd simulation. We extend this approach for planned movements, pursuance of assignable goals, and avoidance of dynamically introduced obstacles/threats as well as congestions, while keeping the system scalable with the number of agents. We demonstrate the various behaviors in hall-evacuation scenarios, and experimentally establish the scalability of the frame rates with increasing numbers of agents.
european conference on machine learning | 2002
Bikramjit Banerjee; Jing Peng
In this work we look at the recent results in policy gradient learning in a general-sum game scenario, in the form of two algorithms, IGA and WoLF-IGA. We address the drawbacks in convergence properties of these algorithms, and propose a more accurate version of WoLF-IGA that is guaranteed to converge to Nash Equilibrium policies in self-play (or against an IGA learner). We also present a control theoretic interpretation of variable learning rate which not only justifies WoLF-IGA, but also shows it to achieve fastest convergence under some constraints. Finally we derive optimal learning rates for fastest convergence in practical simulations.
Applied Artificial Intelligence | 2000
Bikramjit Banerjee; Anish Biswas; Manisha Mundhe; Sandip Debnath; Sandip Sen
An agent-society of the future is envisioned to be as complex as a human society. Just like human societies, such multiagent systems (MAS) deserve an in-depth study of the dynamics, relationships, and interactions of the constituent agents. An agent in a MAS may have only approximate a priori estimates of the trustworthiness of another agent. But it can learn from interactions with other agents, resulting in more accurate models of these agents and their dependencies together with the influences of other environmental factors. Such models are proposed to be represented as Bayesian or belief networks. An objective mechanism is presented to enable an agent elicit crucial information from the environment regarding the true nature of the other agents. This mechanism allows the modeling agent to choose actions that will produce guaranteed minimal improvement of the model accuracy. The working of the proposed maxim in entropy procedure is demonstrated in a multiagent scenario.
adaptive agents and multi-agents systems | 2006
Bikramjit Banerjee; Jing Peng
We present a new multiagent learning algorithm (<i>RV</i><inf>σ(<i>t</i>)</inf> that can guarantee both no-regret performance (all games) and policy convergence (some games of arbitrary size). Unlike its predecessor ReDVaLeR, it (1) does not need to distinguish whether its opponents are self-play or otherwise non-stationary, (2) is allowed to know its portion of <i>any</i> equilibrium that, we argue, leads to convergence in some games in addition to no-regret. Although the regret of <i>RV</i><inf>σ(<i>t</i>)</inf> is analyzed in continuous time, we show that it grows slower than in other no-regret techniques like GIGA and GIGA-WoLF. We show that <i>RV</i><inf>σ(<i>t</i>)</inf> can converge to coordinated behavior in coordination games, while GIGA, GIGA-WoLF may converge to poorly coordinated (mixed) behaviors.
Neurocomputing | 2016
Landon Kraemer; Bikramjit Banerjee
Decentralized partially observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In some practical scenarios this may not be the case. We propose a novel MARL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on these rehearsal features. We also establish a weak convergence result for our algorithm, RLaR, demonstrating that RLaR converges in probability when certain conditions are met. We show experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also compare RLaR against an existing approximate Dec-POMDP solver which, like RLaR, does not assume a priori knowledge of the model. While RLaRs policy representation is not as scalable, we show that RLaR produces higher quality policies for most problems and horizons studied.