Indoor visual understanding with RGB-D images using deep neural network

Doyran, Metehan.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Akın, H. Levent.
dc.contributor.author	Doyran, Metehan.
dc.date.accessioned	2023-03-16T10:03:17Z
dc.date.available	2023-03-16T10:03:17Z
dc.date.issued	2018.
dc.identifier.other	CMPE 2018 D78
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12355
dc.description.abstract	We created a visual understanding pipeline for house robots with Red-Green-Blue Depth (RGB-D) sensors. Our pipeline consists of three components; (i) an RGB-D object detection network,(ii)asimpletrackingalgorithm,and(iii)a3D semantic mapping module. Our constraint is running the whole system in real-time. Most RGB-D object detection networksdonothavesuchconstraintandcannotruninreal-timebecausetheyrequirecostly preprocessing methods. Instead of the costly methods, we seek ways to make raw depth data more useful by a cheap preprocessing technique of RGB overlaying on the depth data. After adding depth branches into two state-of-the-art RGB object detection networks we compared performances of feeding raw depth input and RGB overlaid depth input on the SUN RGB-D dataset. The results of using only one type of input show that our overlaying method gets 0.9% better mean average precision (mAP) than feeding the network with raw depth data. SSD with decision level fused depth network increased the mAP around 2% compared to RGB only SSD. We collected a small household object detection dataset to test thetrackingmethodcombinedwiththeobjectdetector. Weusedmedianﬂowtrackingonthe objectboxesdetectedbytheobjectdetector,whichincreasedthemAPoftheobjectdetection network by around 5% on our dataset. Our ﬁnal module consists of 3D semantic mapping which we used Robot Operating System node of the Real-time Appearance-Based Mapping library for the 3D mapping part. The semantic information of the objects are created by the object detector network combined with the tracker. Although the object detector network proposes 2D bounding boxes around the objects, labeling these pixels and seeing the object from different angles allow us to create 3D maps with object labels.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2018.
dc.subject.lcsh	Depth of field (Photography)
dc.subject.lcsh	Optical data processing -- Mathematical models.
dc.title	Indoor visual understanding with RGB-D images using deep neural network
dc.format.pages	xv, 69 leaves ;