dc.description.abstract |
Biological data production has been increasing at an unprecedented pace with the advancements of microarrays and next-generation sequencing technologies. Such High Throughput Biological Data (HTBD), requires detailed analysis methods. From a life science perspective, data analysis results make most sense when interpreted within the context of biological pathways. Bayesian Networks (BNs) capture both linear and nonlinear interactions, and handle stochastic events in a probabilistic framework accounting for noise. These properties make BNs excellent candidates for HTBD analysis. A recent study by Isci et al. [1] proposes an approach, called Bayesian Pathway Analysis (BPA), for analyzing HTBD using BNs in which known biological pathways are modeled as BNs and pathways that best explain the given HTBD are found. In this thesis, we have the following two fundamental aims. Our first aim is to improve the BPA system. In the data processing phase, fold changes between two groups (i.e., cancer and normal) were calculated for genes and discretized using hard cut-off levels to be used in the network scoring module. We evaluated six different discretization methods with various numbers of levels. In the scoring phase, we applied three scoring methods and compared the results with the Bayesian-Dirichlet Equivalent scheme currently applied in the system. The statistical significance assessment phase was improved by obtaining randomized data sets at the gene signal level to overcome the cases where the current BPA fails to provide random data sets. We provide a web portal where the optimized software can be downloaded and used for various organisms including human. Our second aim is to apply the improved pathway analysis approach on various real cancer microarray data sets in order to investigate the pathways that are commonly and differently active. We compared our findings with a comparable approach, SPIA [2]. |
|