Abstract:
Deep learning techniques hold promise to develop dense topography reconstruc tion and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this thesis, we in troduce a comprehensive endoscopic simultaneous localization and mapping (SLAM) dataset consisting of 3D point cloud data for six porcine organs, capsule and stan dard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography scan ground truth. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on dis tinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner.|Keywords : SLAM Dataset, Capsule Endoscopy, Spatial Attention Module, Monocu lar Depth Estimation, Visual Odometry.