Özet:
Extracting and tracking facial features in image sequences automatically is a required first step in many applications including expression classification. When sign language recognition is concerned, expressions imply non-manual gestures (head mo- tion and facial expressions) used in that language. In this work, we aimed to classify the most common non-manual gestures in Turkish Sign Language (TSL). This process is done using two consecutive steps: First, automatic facial landmarking is performed based on Multi-resolution Active Shape Models (MRASMs) on faces. The landmarks are fitted in each frame using MRASMs for multiple views of faces, and the best fit- ted shape which is most similar to the shape found in the preceding frame is chosen. This way, temporal information is used for achieving consistency between consecutive frames. When the found shape is not trusted, deformation of the tracked shape is avoided by leaving that frame as empty and re-initializing the tracker. Afterwards, the empty frames are filled using interpolation, and alpha-trimmed mean filtering is performed on the landmark trajectories to eliminate the erroneous frames. Second, the tracked landmarks are normalized and expression classification is done based on multi- variate Continuous Hidden Markov Models (CHMMs). We collected a video database of non-manual signs to experiment the proposed approach. Single view vs. multi-view and person specific vs. generic MRASM trackers are compared both for tracking and expression parts. Multi-view person-specific tracker seems to perform the best. It is shown that the system tracks the landmarks robustly. For expression classification part, proposed CHMM classifier is experimented on different training and test set se- lections and the results are reported. We see that the classification performances of distinct classes are very high.