重点サンプリングを用いた Ga による強化学習

Transactions of the Japanese Society for Artificial Intelligence 20:1-10 (2005)
  Copy   BIBTEX

Abstract

Reinforcement Learning (RL) handles policy search problems: searching a mapping from state space to action space. However RL is based on gradient methods and as such, cannot deal with problems with multimodal landscape. In contrast, though Genetic Algorithm (GA) is promising to deal with them, it seems to be unsuitable for policy search problems from the viewpoint of the cost of evaluation. Minimal Generation Gap (MGG), used as a generation-alternation model in GA, generates many offspring from two or more parents selected from a population. Therefore, evaluating policies of generated offspring requires much trial and error (i.e. interaction between an agent and an environment). In this paper, we incorporate importance sampling into the framework of MGG in order to reduce the cost of evaluation on policy search. The proposed techniques are applied to Markov Decision Process (MDP) with multimodal landscape. The experimental results show that these techniques can reduce the number of interaction between an agent and an environment, and also mean that MGG and importance sampling are good for each other.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 93,127

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Saving MGG: 実数値 GA/MGG における適応度評価回数の削減.Tsuchiya Chikao Tanaka Masaharu - 2006 - Transactions of the Japanese Society for Artificial Intelligence 21 (6):547-555.
強化学習ロボットによるアフォーダンスの利用.公文 誠 李 銘義 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16 (1):152.
強化学習エージェントへの階層化意志決定法の導入―追跡問題を例に―.輿石 尚宏 謙吾 片山 - 2004 - Transactions of the Japanese Society for Artificial Intelligence 19:279-291.
距離に依存せずに多様性を制御する Ga による高次元関数最適化.Konagaya Akihiko Kimura Shuhei - 2003 - Transactions of the Japanese Society for Artificial Intelligence 18:193-202.
実数値 Ga におけるサンプリングバイアスを考慮した外挿的交叉 Edx.Kobayashi Shigenobu Sakuma Jun - 2002 - Transactions of the Japanese Society for Artificial Intelligence 17:699-707.
強化学習を用いた自律移動型ロボットの行動計画法の提案.五十嵐 治一 - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:501-509.
Ga により探索空間の動的生成を行う Q 学習.Matsuno Fumitoshi Ito Kazuyuki - 2001 - Transactions of the Japanese Society for Artificial Intelligence 16:510-520.
免疫系を用いた遺伝的プログラミングによる多峰性探索.伊庭 斉志 長谷川 禎彦 - 2006 - Transactions of the Japanese Society for Artificial Intelligence 21:176-183.

Analytics

Added to PP
2014-03-20

Downloads
25 (#654,840)

6 months
7 (#491,177)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references