Abstract:
Recognition of non-manual components in sign language has been a neglected topic, partly due to the absence of annotated non-manual sign datasets. We have collected a dataset of videos with non-manual signs, displaying facial expressions and head movements and prepared frame-level annotations. In this thesis, we present the Turkish Sign Language (TSL) non-manual signs dataset and provide a baseline system for non-manual sign recognition. A deep learning based recognition system is proposed, in which the pre-trained ResNet Convolutional Neural Network (CNN) is employed to recognize the question, negation side to side and negation up-down, armation and pain movements and expressions. 483 TSL videos performed by five subjects, who are native TSL signers were temporally annotated. We employ a leave-one-subject-out approach for performance evaluation on the test videos. We have obtained annotation-level accuracy values of 55.77 %, 14.63 %, 72.83 %, 10 % and 11.67 % for question, negation-side, negation up-down, pain and armation classes respectively in the BosphorusSign-HospiSign non-manual sign datasets. Question, negation-side, negation-up-down and armation movements and ex pressions in 87 clips from the TSL translation video of a Turkish movie are tempo rally annotated for cross-database experiments. The models that are fine-tuned on BosphorusSign-HospiSign set are tested with the clip frames. The best performing model classifies 66.67 % of question annotations and 42.31 % of negation-up-down annotations correctly, while the remaining class labels could not be predicted.