強化学習エージェントへの階層化意志決定法の導入―追跡問題を例に―

Transactions of the Japanese Society for Artificial Intelligence 19:279-291 (2004)
  Copy   BIBTEX

Abstract

Reinforcement Learning is a promising technique for creating agents that can be applied to real world problems. The most important features of RL are trial-and-error search and delayed reward. Thus, agents randomly act in the early learning stage. However, such random actions are impractical for real world problems. This paper presents a novel model of RL agents. A feature of our learning agent model is to integrate the Analytic Hierarchy Process into the standard RL agent model, which consists of three modules: state recognition, learning, and action selecting modules. In our model, the AHP module is designed with {\\it primary knowledge} that humans intrinsically have in a process until a goal state is attained. This integration aims at increasing promising actions instead of completely random actions in the standard RL algorithms. Profit Sharing is adopted as a RL method for our model, since PS is known to be useful even in multi-agent environments. To evaluate our approach in a multi-agent environment, we test a PS RL method with our agent model on a pursuit problem in a grid world. Computational results show that our approach outperforms the standard PS in terms of learning speed in the earlier stages of learning. We also show that the learning performance of our approach is superior at least competitive to that of the standard one in the final stages of learning.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,075

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Analytics

Added to PP
2014-03-21

Downloads
17 (#870,460)

6 months
5 (#643,111)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references