Abstract:
In this thesis, we propose a Score-Level Multi Cue Fusion approach that improves the sign language recognition performance of the three dimensional convolutional neural networks. Sign Language is the communication language of the Deaf and Hearing-impaired individuals and performed using hand movements, facial gestures, and body alignment. Sign Language Recognition is the task that aims to understand sign language and gaining increasing popularity with the task becoming feasible due to the e ciency of the neural network. Previous work uses 3D CNN network variants to inspect SL properties in di erent settings. The vanilla 3D variant uses 3D kernels with high processing cost, the mixed convolution variant applies both 3D and 2D kernels respectively, and R(2+1)D variants exploit bottleneck connections to exploit the bottleneck dimension. Various studies use these networks to generate an end to end framework for tasks such as sign classi cation and translation. To achieve better performance, 3D CNN methods use the complicated neural network architectures that have a branch for every cue system. We evaluate the 3D network performances and propose a more straightforward approach which only adopts a single neural network that can process multiple cues at test time. We exploit the hand, body, and face cues by training single individual networks and fuse results by using a weighted score fusion. We test our method on the recently published Turkish Isolated SLR dataset. Despite the simple architecture, our method achieves %94 percent classi cation rate on 744 di erent sign glosses. We hope that the multi cue approach can help with the other SLR tasks such as translation, which is stated as future work.