Archives and Documentation Center
Digital Archives

Unsupervised routing strategies for conditional deep neural networks

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Akarun, Lale.
dc.contributor.author Meral, Tuna Han Salih.
dc.date.accessioned 2023-10-15T06:58:16Z
dc.date.available 2023-10-15T06:58:16Z
dc.date.issued 2022
dc.identifier.other CMPE 2022 M47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/19715
dc.description.abstract Deep convolutional neural networks are considered state-of-the- art solutions due to their high classification performance in image classification tasks. The apparent drawback is the amount of computing power required to process a single input. To deal with this, this thesis proposes a conditional computation method that learns to process an input using only a subset of the network's computation units. Learning to execute only a part of a deep neural network by routing individual samples has several advantages. Firstly, it is beneficial to lower the computational burden. Furthermore, if images with similar semantic features are routed to the same path, that part of the network learns to discriminate finer differences among this subset of classes, resulting in improved classification accuracy with fewer parameters and computational resources. Investigating the network's activation on a single sample can also help interpret the neural network's prediction. Several works have recently exploited this idea using tree-shaped networks or taking a particular child of a node and skipping parts of a network. In this thesis, we follow a trellis-based approach for generating specific execution paths in a deep neural network. We have also designed a routing mechanism that uses unsupervised differentiable information gain-based cost functions to determine which subset of units in a layer block will be executed for a sample. We call our method Conditional Unsupervised Information Gain Trellis (CUTE). We tested the clustering performance of our unsupervised information gain- based objective function under different scenarios. Finally, we tested the classification performance of our trellis-shaped CUTE network on the Fashion MNIST dataset. We show that our conditional execution mechanism achieves comparable or better model performance than unconditional baselines, using only a fraction of the computational resources.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh Deep learning (Machine learning)
dc.subject.lcsh Neural networks (Computer science)
dc.title Unsupervised routing strategies for conditional deep neural networks
dc.format.pages xiii, 50 leaves


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account