Özet:
The human hand has become an important interaction tool in computer systems. Using the articulated hand skeleton for interaction was a challenge until the development of input devices and fast computers. In this thesis, we develop model-based super real-time methods for articulated human hand pose estimation using depth sensors. We use Randomized Decision Forest (RDF) based methods for feature extraction and inference from single depth image. We start by implementing shape recognition using RDFs. We extend the shape recognition by considering a multitude of shapes in a single image representing di erent hand regions centered around di erent joints of the hand. The regions are utilized for joint position estimation by running mean shift mode nding algorithm (RDF-C). We combine shape recognition and joint estimation methods in a hybrid structure for boosting the quality. RDFs, when used for pixel classi cation are not resistant to self-occlusion. We overcome this by skipping the classi cation, and directly inferring the joint positions using regression forests. These methods assume joints are independent, which is not realistic. Therefore, we conclude our single image based framework by considering the geometry constraints of the model (RDF-R+). The accuracies at 10 mm acceptance threshold are acquired for synthetic and real datasets. Comparing RDF-C and RDF-R+ methods respectively, we report signi cant accuracy increase. We nally extend single image methods to tracking dynamic gestures. We learn the grasping motion from synthetic data by extracting a manifold, and x RDF estimations by projecting them onto the manifold. We then track the projections by using a Kalman Filter.