Abstract:
Because of using manual methods in some parts of gene expression experiments, reliability of the data is low. If this data is directly utilized as input to a data mining algorithm or a model for evaluating gene expression data, then the adverse affects to the desired results will be inevitable. In order to eliminate aforementioned adverse affects and reduce the fuzziness, we represent the data with sample data sets that are generated by using uncertain data management techniques. Sample data approach not only reduces the percentage of fuzziness, but also it causes the output generation time to be increased due to an increase in the amount of processed data, which is directly proportional to the cardinality of the sample data set. In the first part of the study, we introduce an uncertain data clustering algorithm, named M-FDBSCAN, for enabling one to cluster uncertain data rapidly, which runs on multi-core systems in a concurrent fashion. We show that by using the proposed method, the algorithm yields considerable performance improvement on single core systems, as well. In the second part of the study, M-FDBSCAN algorithm is converted into an evolutionary clustering algorithm, named E-MFDBSCAN, by which time series data can be processed rapidly and efficiently. This new algorithm enables to generate global clusters. In the last part of the study, using time-based evolutionary patterns of global clusters a prediction model is constructed. The proposed prediction model enables us to predict the patterns and the similarities of a global cluster that will be generated at the next time point.