Archives and Documentation Center
Digital Archives

A data adaptive categorical time series representation for supervised learning

Show simple item record

dc.contributor Graduate Program in Industrial Engineering.
dc.contributor.advisor Baydoğan, Mustafa Gökçe.
dc.contributor.author Çakın, Hande.
dc.date.accessioned 2023-03-16T10:29:07Z
dc.date.available 2023-03-16T10:29:07Z
dc.date.issued 2016.
dc.identifier.other IE 2016 C36
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/13363
dc.description.abstract A vast majority of the studies in machine learning focus on time-directed or in other words sequential processes. Objectives of these studies vary from classi cation to prediction and clustering to segmentation. Since the dimension of these datasets could be very high as a corollary of sequential process, it is required to map the sequences to a lower dimensional representation for learning tasks. Probabilistic and data adaptive representation approaches are prominent in the literature. This thesis provides a new data adaptive representation method for categorical time series to apply any supervised learning algorithm. The proposed method, namely SW-RF (Sliding Window-Random Forest), requires two main steps to learn a representation for categorical time series. The initial representation is constituted with a sliding window algorithm by using a predetermined window size. Then, this simple representation is trained with a decision tree classi er and a numerical vector representation is gathered by using the frequency of subsequences on the leaf nodes of decision trees for each sequence. Categorical sequences of varying length and missing values are handled e ciently by the tree learners in SW-RF. It is able to perform e ciently even the number of symbols in the sequence is high. Classi cation accuracy of the SW-RF is compared with k-mers and Hidden Markov Model representations, since these two are common representation methods in the literature. Experiments show that proposed approach provides signi cantly better results in terms of accuracy on both synthetic data and DNA promoter sequence data.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2016.
dc.subject.lcsh Time series analysis.
dc.subject.lcsh Supervised learning (Machine learning)
dc.title A data adaptive categorical time series representation for supervised learning
dc.format.pages xx, 97 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account