Unsupervised routing strategies for conditional deep neural networks

Meral, Tuna Han Salih.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Akarun, Lale.
dc.contributor.author	Meral, Tuna Han Salih.
dc.date.accessioned	2023-10-15T06:58:16Z
dc.date.available	2023-10-15T06:58:16Z
dc.date.issued	2022
dc.identifier.other	CMPE 2022 M47
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/19715
dc.description.abstract	Deep convolutional neural networks are considered state-of-the- art solutions due to their high classification performance in image classification tasks. The apparent drawback is the amount of computing power required to process a single input. To deal with this, this thesis proposes a conditional computation method that learns to process an input using only a subset of the network's computation units. Learning to execute only a part of a deep neural network by routing individual samples has several advantages. Firstly, it is beneficial to lower the computational burden. Furthermore, if images with similar semantic features are routed to the same path, that part of the network learns to discriminate finer differences among this subset of classes, resulting in improved classification accuracy with fewer parameters and computational resources. Investigating the network's activation on a single sample can also help interpret the neural network's prediction. Several works have recently exploited this idea using tree-shaped networks or taking a particular child of a node and skipping parts of a network. In this thesis, we follow a trellis-based approach for generating specific execution paths in a deep neural network. We have also designed a routing mechanism that uses unsupervised differentiable information gain-based cost functions to determine which subset of units in a layer block will be executed for a sample. We call our method Conditional Unsupervised Information Gain Trellis (CUTE). We tested the clustering performance of our unsupervised information gain- based objective function under different scenarios. Finally, we tested the classification performance of our trellis-shaped CUTE network on the Fashion MNIST dataset. We show that our conditional execution mechanism achieves comparable or better model performance than unconditional baselines, using only a fraction of the computational resources.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh	Deep learning (Machine learning)
dc.subject.lcsh	Neural networks (Computer science)
dc.title	Unsupervised routing strategies for conditional deep neural networks
dc.format.pages	xiii, 50 leaves