Abstract:
In this thesis, we propose a sign language recognition system in which we have adapted and simplified the Improved Dense Trajectory (IDT) approach which was originally proposed for large-scale human action recognition problem. Since the sign language recognition problem mostly focuses on hand gestures, body posture and facial expressions, we have extracted IDT features and filtered the trajectories around the hand region by matching the trajectory coordinates with hand coordinates obtained by pose extraction. In addition to trajectory filtering, we also propose Hand Descriptors, a spatio-temporal feature extraction method, for sign language recognition. In our proposed method, we extract spatio-temporal descriptors around left and right hands. After descriptor extraction, we encoded each sign video as Fisher Vectors which were derived from a Gaussian Mixture Model which was estimated from the training de scriptors. Then, we have trained Support Vector Machines to perform sign language classification using the Fisher Vectors as its inputs. We have conducted experiments on two subsets of the BosphorusSign dataset and evaluated the performance of the system in terms of feature extraction speed, computational complexity and memory require ment. In our experiments, the combination of all descriptors yields the best recognition performance on both subsets for both features. We have found that trajectory filter ing approach yields a similar recognition performance to the baseline approach while the number of trajectories are drastically reduced. Moreover, we have analysed the effects of using different parameters and video resolutions on the performance of the Hand Descriptors. Our experiments have shown that hand region produces the most important features in our sign language classification system.