Abstract:
High throughput biological data (HTBD) targeting understanding of biochemical interactions in the cell can best be analyzed, and explained within the context of networks and pathways. Such data generally represents stochastic nonlinear relations embedded in noise. Bayesian Network (BN) theory provides a framework to analyze the data regarding gene regulation measurements, as this framework naturally handles the aforementioned obstacles. In this dissertation, we provide a two faceted approach to the applications of BNs to HTBD. In the rst facet, a novel method is provided, which models known biological pathways as BNs, and uses given HTBD to nd pathways that best explain underlying interactions. During this process, biological pathways are converted to directed acyclic graphs, and a score measuring tness of the observed HTBD to a given network is calculated. Statistical signi cance of these scores is assessed by "randomization via bootstrapping", and relevant pathways are identi ed with a certainty that can be used as a comparative measure. Simulations using synthetic and real data demonstrated robustness of the proposed approach, called Bayesian Pathway Analysis (BPA). BPA provides improvement over existing similar approaches by not considering genes in a pathway simply as a list, but incorporating to its model the topology via which genes in a given pathway interact with each other. Although network learning techniques are very useful to reveal the underlying biological phenomena with the help of HTBD, these techniques do not always perform well. This is due to the problems created by the small number of samples, inconvenient initial choice for the network structures, noise inherent in the data, and the complexity of the networks. To improve their performance, the learning techniques can be supported by prior biological knowledge, which are already veri ed by experimental assays. In the second facet explored in this dissertation, we established a global approach to integrate known biological information to Bayesian learning in order to reveal gene interactions. The proposed framework makes use of external biological knowledge to predict if two given genes interact with each other. To this end, prior knowledge about interaction of two genes is utilized by generating a Bayesian Network Prior (BNP) model, using existing external biological knowledge. External knowledge types to be utilized were obtained from interaction databases such as BioGrid and Reactome, and consist of protein-protein, protein-DNA/RNA, and gene interactions. The resulting model is incorporated into greedy search algorithm for learning networks from HTBD, and interacting genes are represented in the form of a network. In this process of network generation, the BNP model deducing gene interactions from external knowledge are used to calculate the probability of candidate networks to enhance the structure learning task. Simulations on both synthetic and real data sets showed that the proposed framework can successfully enhance identi cation of the true network, and be used in predicting gene interactions.