Archives and Documentation Center
Digital Archives

Attention modeling with temporal shift in sign language recognition

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Akarun, Lale.
dc.contributor.author Çelimli, Ahmet Faruk.
dc.date.accessioned 2023-10-15T06:54:29Z
dc.date.available 2023-10-15T06:54:29Z
dc.date.issued 2022
dc.identifier.other CMPE 2022 C46
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/19712
dc.description.abstract Sign languages (SLs) are the main communication language of deaf people. They are visual languages that establish communication through multiple cues including hand gestures, upper-body movements and facial expressions. Sign language recognition (SLR) models have the potential to ease communication between hearing and deaf people. Advancements in deep learning and the increased availability of public datasets have led more researchers to study SLR. These advancements shifted solution methods for SLR from hand-crafted features to 2 Dimensional Convolutional Neural Network (2D CNN) models. Inadequacy of 2D CNNs on temporal modeling and 3D CNNs' ability of spatio- temporal modeling made 3D CNNs a popular choice. Despite its successful results, high computational costs and memory requirements of 3D CNNs created a need for alternative architectures. In this thesis, we propose an SLR model that uses 2D CNN as backbone and attention modeling with temporal shift. Usage of 2D CNN decreases the number of parameters and required memory size compared to its 3D CNN counterpart. In order to increase adaptability to other datasets and simplify the training process our model uses full frame RGB images instead of cropped images that focus on specific body parts of signers. Since communication in SL is established by using multiple visual cues at the same time or at different moments, the model must learn how these cues are collaborating with each other. While temporal shift modules give our 2D CNN backbone model the ability of temporal modeling, attention modules learn to focus on what, where and when in videos. We tested our model with BosphorusSign22k dataset which is a Turkish isolated SLR dataset. The proposed model achieves 92.97% classification accuracy. Our study shows that attention modeling with temporal shift on top of 2D CNN backbone gives competitive results in isolated SLR.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh Sign language.
dc.subject.lcsh Neural networks (Computer science)
dc.subject.lcsh Deep learning (Machine learning)
dc.title Attention modeling with temporal shift in sign language recognition
dc.format.pages xv, 59 leaves


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account