Abstract:
Automatically detecting conversational groups from video footage is a very in triguing and practical research area with applications in video activity recognition and human-robot interaction. Therefore, there is a critical need for improved detection of groups to enhance the relationship between humans and robots. In this thesis, we use Graph Convolutional Networks for the group detection problem as the main novel con tribution. We base our approach on a method from the community detection domain called Deep Modularity Networks. Our approach improves the group detection quality over state-of-the-art group detection methods. Additionally, we develop a graph con struction algorithm using the view frustums, which indicates the individuals’ affinities. As a post-processing step, we utilize temporal information in our system and improve our detection results further.