Özet:
Uncertainty is a fundamental problem for autonomous agents in a partially observable real world, where the sensors are not able to give the complete state of the environment. Although the outcomes of actions are not predictable, the agents must behave rationally. Furthermore, continuous nature of the environment makes the problem more difficult to model. Markov Decision Process (MDP) is a way to model this kind of problems. Partially observable Markov decision process (POMDP) is an extension of MDP which can be used in environments which are not fully observable. In order to model the real world, the continuous states must be converted to discrete states. The aim of this work is to model the real world environment and implement ARKAQ learning algorithm which is suitable for Partially observable Markov decision problems (POMDPs). The experiments are realized with Sony AIBO four-legged robotic pets under Webots simulation environment. Two problems are studied: ``Ball Approaching'' and ``Scoring Goal''. The predefined targets are achieved by the robots and the results in goal scoring show that ARKAQ is clearly much more successful compared to random actions. The optimality of the results are discussed and the parameters that affect the optimality are explained.