Özet:
Automatic video data analysis has been a growing interest in order to improve human computer interaction. One of the most challenging parts in video analysis is the ability of evaluating human emotion robustly. Vast applications of human facial expression recognition can be seen everywhere from educational systems to treatment of Asperger's and surveillance. In this thesis, we explore facial expression recognition on both laboratory and realistic videos. After studying recent works about face detection, facial alignment, video description and classi cation, we present our novel approach in, which our proposed pipeline including facial alignment in combination with improved dense trajectory, geometric, encoded with Fisher vector encoding and LGBP-TOP features are fed to extreme learning machine. It is the rst time that improved dense trajectory features are used in facial expression recognition. Furthermore, we extensively study each step of our pipeline in a comparative manner. We evaluate our approach on CK+ and EmotiW 2015 challenge datasets. Videos in rst dataset are captured in laboratory settings and start from neutral state and end with peak expression while the second one is selected from movies with realistic conditions, spontaneous emotions, complicated background and challenging illumination variations. On Ck+ dataset, we obtained 94.80% and 95.79% (without contempt) accuracy, which is among the best results obtained on the CK+. On EmotiW 2015 challenge dataset, we got 43.39% accuracy, which is higher than the baseline of the challenge considerably. In both datasets we were able to obtain the state-of-the-art results. Our results show that using appropriate pipeline of face alignment combined with e cient visual descriptors can result in a robust system with high ability of recognition.