Show simple item record

dc.contributor Graduate Program in Software Engineering.
dc.contributor.advisor Akın, H. Levent.
dc.contributor.author Ada, Suzan Ece.
dc.date.accessioned 2023-03-16T13:45:29Z
dc.date.available 2023-03-16T13:45:29Z
dc.date.issued 2019.
dc.identifier.other SWE 2019 A43
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/19511
dc.description.abstract Agentstrainedwithdeepreinforcementlearningalgorithmsarecapableofperforming highly complex tasks including locomotion in continuous environments. In order to attain a human-level performance, the next step of research should be to investigate the ability to transfer the learning acquired in one task to unknown tasks. Concerns on generalization and overfitting in deep reinforcement learning are not usually addressed in current transfer learning research. This issue results in simplistic benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this thesis, we propose novel regularization techniques exclusive to policy gradient algorithms for continuous control through the application of sample elimination and early stopping. By discarding samples that lead to overfitting via strict clipping we will generate robust policies for a humanoid with high generalization capacity. We also suggest the inclusion of training iteration to the hyperparameters in deep transfer learning problems. We recommend resorting to earlier snapshots of parameters depending on the target task due to the occurrence of overfitting to the source task. We demonstrate that a humanoid is capable of performing forward locomotion in unseen environments with different gravities and tangential frictions using strict clipping and early stopping. Furthermore, we evaluate our propositions on a delivery task where a humanoid is required to carry a heavy box while walking and inter-robot transfer tasks where the humanoid transfers its learning to taller and shorter robots. Because source task performance is not indicative of the generalization capacity of the algorithm we propose three different transfer learning evaluation methods. We increase the generalization capacity of a state-of-art adversarial algorithm by introducing entropy bonus, proposing different critic architectures and using simpler adversaries. Finally, we evaluate the robustness of these adversarial algorithms on morphologically modified hopper environments and environments with unknown gravities according to the criteria we proposed.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Control theory.
dc.subject.lcsh Transfer of training.
dc.title Transfer learning for continuous control
dc.format.pages xvii, 111 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account