Abstract:
A real world environment is often partially observable for agents either because of noisy sensors or incomplete perception. Autonomous strategy planning under uncer- tainty has two major challenges. The 歔rst one is autonomous segmentation of the state space for a given task, and the second, emerging complex behaviors, that deal with each state segment. This thesis proposes three new approaches, namely ARKAQ-Learning, KAFAQ-Learning and KBVI, that handle both challenges by utilizing combinations of various techniques. ARKAQ makes use of ART2-A Networks augmented with Kalman Filters and Q-Learning. KAFAQ is a 歔nite state automaton using Kalman 歔lters and Q-Learning. KBVI uses Monte Carlo methods and introduces a new technique to calculate Q-values for continuous domains. All are online algorithms with relatively low space and time complexity. The algorithms were run for some well-known Partially Observable Markov Decision Process problems, where the problem of representing the value function is more di±cult than the discrete case because inputs are continuous distributions. The algorithms could reveal the hidden states, mapping non-Markovian observations to internal belief states, and also could construct an approximate optimal policy on the internal belief state space.