Joni Pajarinen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joni Pajarinen is active.

Explore More

Publication

Featured researches published by Joni Pajarinen.

international joint conference on artificial intelligence | 2011

Efficient planning for factored infinite-horizon DEC-POMDPs

Joni Pajarinen; Jaakko Peltonen

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is restricted by costs or physical limits. DEC-POMDPs are a promising solution, but optimizing policies quickly becomes computationally intractable when problem size grows. Factored DEC-POMDPs allow large problems to be described in compact form, but have the same worst case complexity as non-factored DEC-POMDPs. We propose an efficient optimization algorithm for large factored infinite-horizon DEC-POMDPs. We formulate expectation-maximization based optimization into a new form, where complexity can be kept tractable by factored approximations. Our method performs well, and it can solve problems with more agents and larger state spaces than state of the art DEC-POMDP methods. We give results for factored infinite-horizon DEC-POMDP problems with up to 10 agents.

IEEE Transactions on Mobile Computing | 2014

Optimizing Spatial and Temporal Reuse inWireless Networks by Decentralized Partially Observable Markov Decision Processes

Joni Pajarinen; Ari Hottinen; Jaakko Peltonen

The performance of medium access control (MAC) depends on both spatial locations and traffic patterns of wireless agents. In contrast to conventional MAC policies, we propose a MAC solution that adapts to the prevailing spatial and temporal opportunities. The proposed solution is based on a decentralized partially observable Markov decision process (DEC-POMDP), which is able to handle wireless network dynamics described by a Markov model. A DEC-POMDP takes both sensor noise and partial observations into account, and yields MAC policies that are optimal for the network dynamics model. The DEC-POMDP MAC policies can be optimized for a freely chosen goal, such as maximal throughput or minimal latency, with the same algorithm. We make approximate optimization efficient by exploiting problem structure: the policies are optimized by a factored DEC-POMDP method, yielding highly compact state machine representations for MAC policies. Experiments show that our approach yields higher throughput and lower latency than CSMA/CA based comparison methods adapted to the current wireless network configuration.

Artificial Intelligence | 2017

Robotic manipulation of multiple objects as a POMDP

Joni Pajarinen; Ville Kyrki

This paper investigates manipulation of multiple unknown objects in a crowded environment. Because of incomplete knowledge due to unknown objects and occlusions in visual observations, object observations are imperfect and action success is uncertain, making planning challenging. We model the problem as a partially observable Markov decision process (POMDP), which allows a general reward based optimization objective and takes uncertainty in temporal evolution and partial observations into account. In addition to occlusion dependent observation and action success probabilities, our POMDP model also automatically adapts object specific action success probabilities. To cope with the changing system dynamics and performance constraints, we present a new online POMDP method based on particle filtering that produces compact policies. The approach is validated both in simulation and in physical experiments in a scenario of moving dirty dishes into a dishwasher. The results indicate that: 1) a greedy heuristic manipulation approach is not sufficient, multi-object manipulation requires multi-step POMDP planning, and 2) on-line planning is beneficial since it allows the adaptation of the system dynamics model based on actual experience.

intelligent robots and systems | 2014

Real-Time Recognition of Pointing Gestures for Robot to Robot Interaction

Polychronis Kondaxakis; Joni Pajarinen; Ville Kyrki

This paper addresses the idea of establishing symbolic communication between mobile robots through gesturing. Humans communicate using body language and gestures in addition to other linguistic modalities like prosody and text or dialog structure. This research aims to develop a pointing gesture detection system for robot to robot communication scenarios to grant robots an ability to convey object identity information without global localization of the agents. The detection is based on RGB-D and a NAO humanoid robot is used as the pointing agent in the experiments. The presented algorithms are based on PCL library. The results indicate that real-time detection of pointing gesture can be performed with little information about the embodiment of the pointing agent and that an observing agent can use the gesture detection to perform actions on the pointed targets.

personal, indoor and mobile radio communications | 2009

Latent state models of primary user behavior for opportunistic spectrum access

Joni Pajarinen; Jaakko Peltonen; Mikko A. Uusitalo; Ari Hottinen

Opportunistic spectrum access, where cognitive radio devices detect available unused radio channels and exploit them for communication, avoiding collisions with existing users of the channels, is a central topic of research for future wireless communication. When each device has limited resources to sense which channels are available, the task becomes a reinforcement learning problem that has been studied with partially observable Markov decision processes (POMDPs). However, current POMDP solutions are based on simplistic representations where channels are simply on/off (transmitting or idle). We show that more complicated Markov models where on/off states are part of complicated behavior of the channel owner (primary user) yield better POMDPs achieving more successful transmissions and less collisions.

intelligent robots and systems | 2014

Robotic manipulation in object composition space

Joni Pajarinen; Ville Kyrki

Manipulating unknown objects in a cluttered environment is difficult because object composition is uncertain. Because of this uncertainty, earlier work has concentrated on finding the “best” object composition and based on this composition decided on manipulation actions. Contrary to earlier work, we 1) utilize different possible object compositions in decision making, 2) take advantage of object composition information provided by robot actions, 3) take into account the effect of different competing object hypothesis on the actual task to be performed. We cast the manipulation planning problem as a partially observable Markov decision process (POMDP) which plans over possible hypotheses of object compositions. The POMDP model chooses the action that maximizes the long-term expected task specific utility, and while doing so, considers the value of informative actions and the effect of different object hypotheses on the completion of the task. In experiments with a physical robot arm and an RGB-D sensor, our approach outperforms an approach that only considers the most likely object composition.

Foundations and Trends in Robotics | 2018

An Algorithmic Perspective on Imitation Learning

Takayuki Osa; Joni Pajarinen; Gerhard Neumann; J. Andrew Bagnell; Pieter Abbeel; Jan Peters

As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation. We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning. We pay particular attention to the intimate connection between imitation learning approaches and those of structured prediction Daume III et al. [2009]. To structure this discussion, we categorize imitation learning techniques based on the following key criteria which drive algorithmic decisions: 1) The structure of the policy space. Is the learned policy a time-index trajectory (trajectory learning), a mapping from observations to actions (so called behavioral cloning [Bain and Sammut, 1996]), or the result of a complex optimization or planning problem at each execution as is common in inverse optimal control methods [Kalman, 1964, Moylan and Anderson, 1973]. 2) The information available during training and testing. In particular, is the learning algorithm privy to the full state that the teacher possess? Is the learner able to interact with the teacher and gather corrections or more data? Does the learner have a (typically a priori) model of the system with which it interacts? Does the learner have access to the reward (cost) function that the teacher is attempting to optimize? 3) The notion of success. Different algorithmic approaches provide varying guarantees on the resulting learned behavior. These guarantees range from weaker (e.g., measuring disagreement with the agent’s decision) to stronger (e.g., providing guarantees on the performance of the learner with respect to a true cost function, either known or unknown). We organize our work by paying particular attention to distinction (1): dividing imitation learning into directly replicating desired behavior (sometimes called behavioral cloning) and learning the hidden objectives of the desired behavior from demonstrations (called inverse optimal control or inverse reinforcement learning [Russell, 1998]). In the latter case, behavior arises as the result of an optimization problem solved for each new instance that the learner faces. In addition to method analysis, we discuss the design decisions a practitioner must make when selecting an imitation learning approach. Moreover, application examples—such as robots that play table tennis [Kober and Peters, 2009], programs that play the game of Go [Silver et al., 2016], and systems that understand natural language [Wen et al., 2015]— illustrate the properties and motivations behind different forms of imitation learning. We conclude by presenting a set of open questions and point towards possible future research directions for machine learning.

international conference on robotics and automation | 2015

Decision making under uncertain segmentations

Joni Pajarinen; Ville Kyrki

Making decisions based on visual input is challenging because determining how the scene should be split into individual objects is often very difficult. While previous work mainly considers decision making and visual processing as two separate tasks, we argue that the inherent uncertainty in object segmentation requires an integrated approach that chooses the best decision over all possible segmentations. Our approach over-segments the visual input and combines the segments into possible objects to get a probability distribution over object compositions, represented as particles. We introduce a Markov chain Monte Carlo procedure that aims to produce exact, independent samples. In experiments, where a 6-DOF robot arm moves object hypotheses captured by an RGB-D visual sensor, our approach of probability distribution based decision making outperforms an approach which utilises the traditional most likely object composition.

european conference on machine learning | 2013

Expectation Maximization for Average Reward Decentralized POMDPs

Joni Pajarinen; Jaakko Peltonen

Planning for multiple agents under uncertainty is often based on decentralized partially observable Markov decision processes (Dec-POMDPs), but current methods must de-emphasize long-term effects of actions by a discount factor. In tasks like wireless networking, agents are evaluated by average performance over time, both short and long-term effects of actions are crucial, and discounting based solutions can perform poorly. We show that under a common set of conditions expectation maximization (EM) for average reward Dec-POMDPs is stuck in a local optimum. We introduce a new average reward EM method; it outperforms a state of the art discounted-reward Dec-POMDP method in experiments.

intelligent robots and systems | 2017

Hybrid control trajectory optimization under uncertainty

Joni Pajarinen; Ville Kyrki; Michael C. Koval; Siddhartha S. Srinivasa; Jan Peters; Gerhard Neumann

Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e. hybrid controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.

Explore More