Abstract:
In this thesis, we focus on the data fusion problem where we have heterogeneous data which is collected from di↵erent sources and stored in the form of matrices and higher-order tensors and propose coupled matrix and tensor factorization models to be able to jointly analyze these relational datasets. This method performs simulta neous factorization of matrices and tensors by extracting the common latent factors from the shared modes. We develop coupled models using various tensor models and cost functions for the missing link prediction problem and report the successful empir ical results. Most of the time, the data matrices and tensors are distributed between several parties. Sharing information across those parties brings the privacy protec tion requirement, therefore the second problem we handle is protecting the privacy of distributed and heterogeneous datasets. We exploit the connection between di?erential privacy and sampling from a Bayesian posterior to derive an efficient coupled tensor factorization algorithm. We empirically show that our methods are able to provide good prediction accuracy on synthetic and real datasets while providing provable pri vacy guarantee. Finally, we propose an approach to preserve the privacy of the neural network’s training data due to the connection between tensor factorization and neural networks. We introduce a dropout technique that provides an elegant Bayesian in terpretation to dropout, and show that the intrinsic noise added can be exploited to obtain a degree of differential privacy.