dc.description.abstract |
The aim of dimensionality reduction is to find a lower dimensional, simpler representation while keeping the important information in the data. It is essential to employ dimensionality reduction for high dimensional data in order to extract relevant features and filter the non-relevant ones. This allows obtaining simpler models and useful knowledge from the data. In this thesis, we discuss and compare several unsupervised nonlinear methods for dimensionality reduction, namely, Isomap, Locally Linear Embedding (LLE), Curvilinear Component Analysis (CCA), Curvilinear Distance Analysis (CDA) and Stochastic Neighbor Embedding (SNE), by testing their accuracies on standard benchmark data sets. We propose a modification (SNE-Iso Hybrid), and introduce the implicit learning of mapping functions in order to solve the problem of mapping previously unseen data points. We observe that using the metrics inherent in the data distribution allows better modeling than using the Euclidean distance and increases the model accuracies for nonlinear data. |
|