Abstract:
In this thesis, we focus on the problem of computer vision based automatic signlanguage recognition and its related subtasks. The study focuses on the recognition of ngerspelling gestures, which are a subset of sign languages that provide manual representation for spoken alphabet letters. Fingerspelling gestures make use of hand shapes, orientation, location and movements. We perform the task of ngerspelling recognition of Turkish, Czech and Russian manual alphabets with the purpose of integrating these sign alphabets to multi-modal and multilingual deployable applications. In the thesis, we divide the automatic ngerspelling recognition task into sub-challenges and design methodologies to improve overall sign recognition performance. We describe an approach to tracking of hands and a face in an image sequence containing the frontal pose of a signing person. A classical Camshift algorithm is extended in this study to contain automatic skin color model initialization, hand re-detection and collision handling. The algorithm performs robust, close to real-time hand tracking. Secondly, we focus on hand gesture representation. We evaluate the usage of appearance based features for describing the manual component of Sign Languages; in particular Elliptic Fourier Descriptors, Hu Moments, Radial Distance Function and Local Binary Patterns. We test the recognition performance of individual features and their combinations. Local Binary Patterns show the best recognition performance on isolated gestures with a recognition rate of up to 92 per cent. We explore the usage of features such as hand motion and motion blur in the problem of temporal segmentation to separate gesture start and end locations in continuous gesture videos. We investigate the fusion of temporal and appearance features using sequence voting, discrete HMMs and continuous HMMs. We test the ngerspelling recognition accuracy of our system on a self collected multilingual ngerspelling dataset consisting of Turkish, Czech and Russian manual alphabets from multiple signers with multiple repetitions. Finally, we have demonstrated the applicability of our system in a prototype application that functions as a multi-lingual ngerspelling to speech translator.