Kazuteru Miyazaki | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kazuteru Miyazaki is active.

Explore More

Publication

Featured researches published by Kazuteru Miyazaki.

Artificial Intelligence | 1997

k -Certainty exploration method: an action selector to identify the environment in reinforcement learning

Kazuteru Miyazaki; Masayuki Yamamura; Shigenobu Kobayashi

Copyright (c) 1996 Elsevier Science B.V. All rights reserved. Reinforcement learning aims to adapt an agent to an unknown environment according to rewards. There are two issues to handle delayed reward and uncertainty. Q-learning is a representative reinforcement learning method. It is used in many works since it can learn an optimum policy. However, Q-learning needs numerous trials to converge to an optimum policy. If the target environments can be described in Markov decision processes, we can identify them from statistics of sensor−action pairs. When we build the correct environment model, we can derive an optimum policy with the Policy Iteration Algorithm. Therefore, we can construct an optimum policy through identifying environments efficiently. We separate the learning process into two phases: identifying an environment and determining an optimum policy. We propose the k-Certainty Exploration Method for identifying an environment. After that, an optimum policy is determined by the Policy Iteration Algorithm. We call a rule k-certainty if and only if it has been selected k times or more. The k-Certainty Exploration Method excepts any loop of rules that already achieve k-certainty. We show its effectiveness by comparing it with Q-learning in two experiments. One is Suttons maze-like environment, the other is an original environment where an optimum policy varies according to a parameter.

systems man and cybernetics | 2000

Reinforcement learning for penalty avoiding policy making

Kazuteru Miyazaki; Shigenobu Kobayashi

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to a reward. In general, the purpose of a reinforcement learning system is to acquire an optimum policy that can maximize expected reward per action. However, it is not always important for any environment. Especially, if we apply reinforcement learning to engineering, we expect the agent to avoid all penalties. In Markov decision processes, we call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per action is larger than zero. We propose the penalty avoiding rational policy making algorithm that can suppress any penalty as stable as possible and get a reward constantly. By applying the algorithm to the tick-tack-toe its effectiveness is shown.

New Generation Computing | 2001

Rationality of reward sharing in multi-agent reinforcement learning

Kazuteru Miyazaki; Shigenobu Kobayashi

AbstractIn multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on theRationality Theorem of Profit Sharing5) and analyze how to share a reward among all profit sharing agents. When an agent gets adirect reward R (R>0), anindirect reward μR (μ≥0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows;n

international symposium on autonomous decentralized systems | 1999

Multi-agent reinforcement learning for crane control problem: designing rewards for conflict resolution

Sachiyo Arai; Kazuteru Miyazaki; Shigenobu Kobayashi

intelligent agents | 1999

Rationality of Reward Sharing in Multi-agent Reinforcement Learning

Kazuteru Miyazaki; Shigenobu Kobayashi

mu < frac{{M - 1}}{{M^W (1 - (tfrac{1}{M})^{W_o } )(n - 1)L}}

systems man and cybernetics | 1999

Proposal for an algorithm to improve a rational policy in POMDPs

Kazuteru Miyazaki; Shigenobu Kobayashi

computational intelligence | 2001

On the rationality of profit sharing in multi-agent reinforcement learning

Kazuteru Miyazaki; Shigenobu Kobayashi

n whereM andL are the maximum number of conflicting all rules and rational rules in the same sensory input,W andWo are the maximum episode length of adirect and anindirect-reward agents, andn is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem.

Archive | 2009

Proposal and Evaluation of the Improved Penalty Avoiding Rational Policy Making Algorithm

Kazuteru Miyazaki; Takuji Namatame; Hiroaki Kobayashi

In recent years, a reinforcement learning approach to build an agents knowledge in a multi-agent world has prevailed when the reinforcement learning is applied to such a world, a concurrent learning among the agents, a perceptual aliasing, and a designing rewards are the most important problems to be considered. We have already confirmed that profit-sharing algorithm shows its robustness against these three problems through some experiments. In this paper, we focus on an advantage of profit-sharing compared to Q-learning through the simulations of controlling cranes where there exist the conflicts among the agents. The conflict resolution problem must become a bottle-neck in the multi-agent world if we approach to it by the top-down method. Similarly, Q-learning is also weak in this problem without exhaustive design of the rewards or detailed information about other agents. We present that profit-sharing method can be available to resolve it, through the results of some experiments on the controlling cranes problem.

intelligent data engineering and automated learning | 2008

Proposal of Exploitation-Oriented Learning PS-r#

Kazuteru Miyazaki; Shigenobu Kobayashi

In multi-agent reinforcement learning systems, it is important to share a reward among all agents. We focus on the Rationality Theorem of Profit Sharing [5] and analyze how to share a reward among all profit sharing agents. When an agent gets a direct reward R (R > 0), an indirect reward µR (µ ≥ 0) is given to the other agents. We have derived the necessary and sufficient condition to preserve the rationality as follows; µ < M - 1/MW(1-(1/M)W0(n-1)L Where M and L are the maximum number of conflicting all rules and rational rules in the same sensory input, W and W0 are the maximum episode length of a direct and an indirect-reward agents, and n is the number of agents. This theory is derived by avoiding the least desirable situation whose expected reward per an action is zero. Therefore, if we use this theorem, we can experience several efficient aspects of reward sharing. Through numerical examples, we confirm the effectiveness of this theorem.

Artificial Life and Robotics | 2004

Development of a reinforcement learning system to play Othello

Kazuteru Miyazaki; Sougo Tsuboi; Shigenobu Kobayashi

Reinforcement learning is a kind of machine learning. Partially observable Markov decision process (POMDP) is a representative class of non-Markovian environments in reinforcement learning. The rational policy making (RPM) algorithm learns a deterministic rational policy in POMDPs. Though RPM can learn a policy very quickly, it needs numerous trials to improve the policy. Furthermore, RPM does not apply the class where there is no deterministic rational policy. In this paper, we propose the rational policy improvement (RPI) algorithm that combines RPM and the mark transit algorithm with /spl chi//sup 2/-goodness-of-fit test. RPI can learn a deterministic or stochastic rational policy in POMDPs. RPI is applied to maze environments. We show that RPI can learn the most stable rational policy in comparison with other methods.

Explore More