Abstract:
The Decentralized Partially Observable Markov Decision Process (DEC-POMDP) model addresses the multiagent planning problem in partially observable environments. Due to its NEXP-complete complexity, only small size problems can be solved exactly. For this reason, many researchers concentrate on approximate solution algorithms that can handle more complex cases and produce near optimal solutions. However, even the approximate solution techniques developed so far can handle large size problems only for small horizons. One reason for this is the exponential memory requirements while representing the agent policies and searching the policy space. In this thesis, we propose four new approaches to solve finite horizon DEC-POMDP problems approximately. The first approach, called MAP, is based on modeling DEC-POMDP problems as a POMDP problem and then solving using an efficient POMDP solver. The other approaches, namely ES-BV, ES-OH and GA-FSC, are all based on the application of evolutionary algorithms. The ESBV makes use of belief vectors as in the case of MAP and tries to find policy vectors using evolution strategies (ES). The ES-OH proposes to use the observation history and input it into a neural network to make a decision and it uses ES to train the neural networks. The GA-FSC algorithm makes use of finite state controllers for representing the policies and search for the optimal policy using genetic algorithms (GA). All algorithms were tested on the major well-known DEC-POMDP problems. We compared our results with the current state of the art methods and we also compared our algorithms with each other. We showed that all the algorithms developed in this study, except MAP, have comparable performance to that of the existing top algorithms and in the case of the GA-FSC, the solution horizon for the problems are extended at least an order of magnitude.