Alignment and multimodal analysis in signed speech

Santemiz, Pınar.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Akarun, Lale.
dc.contributor.author	Santemiz, Pınar.
dc.date.accessioned	2023-03-16T10:00:05Z
dc.date.available	2023-03-16T10:00:05Z
dc.date.issued	2009.
dc.identifier.other	CMPE 2009 S26
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12141
dc.description.abstract	In this thesis, we attack the problem of extracting isolated signs from continuous signed speech videos. Signed speech is a language that uses the signs of sign language and the grammar of spoken language. It is a visual language and makes use of hand gestures, which consist of hand motion and hand shape. In continuous signed speech, signs are expressed in succession, which results in coarticulation effects, making segmentation a challenging task. In this work, we aim to segment some of the most common signs in Turkish Sign Language using hand gesture information. This process consists of two consecutive steps: First, we apply segmentation to hand regions obtained from a hand tracking module and obtain images containing only the left or the right hand. Then, we represent hand gestures by a variety of features which can be categorized as follows: 1) Center of mass coordinates of each hand and their first-order derivatives, 2) Ellipse parameters for each hand, 3) Discrete Cosine Transform, 4) Histogram of oriented gradients, 5) Local Binary Patterns, 6) Hu Moments, and 7) Radial Distances. Then, we align the sequences with different methods and find the start and end positions for each sign. We use Dynamic Time Warping (DTW), Hidden Markov Models (HMM), and coupled HMMs as different alignment approaches. We also apply some fusion techniques to improve the alignment performances. We experiment on a database from Turkish signed speech videos and report the results. We see that the highest accuracy is obtained by combining the DTW and HMM methods using Center of mass coordinates, their first-order derivatives, and Ellipse features.
dc.format.extent	30cm.
dc.publisher	Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009.
dc.subject.lcsh	Sign language.
dc.title	Alignment and multimodal analysis in signed speech
dc.format.pages	xv, 86 leaves;