dc.description.abstract |
Software effort estimation has been one of the major challenges in software engineering and previous research has mainly focused on addressing the large deviation problem in estimations by improving prediction accuracies of models. These models are evaluated using measures such as MRE or pred(r), which all assess the models on the basis of overall prediction accuracy. Practitioners and researchers require a software effort estimation model with the following properties: 1)Understand the data that is used to build the model and 2) provide accurate estimations. In our study, we adapt greedy agglomerative clustering algorithm (GAC) to software effort estimation domain and use it as an analogy based estimator to build our model: Tree Estimation and Assessment Knowledge (TEAK). By using GAC based model, TEAK, we are able to provide an analogy number (k) to be used for each individual test project and get lower MRE values than any other k-based method in all datasets. There are multiple problems with case based reasoning (CBR) methods such as feature subset selection, scaling, similarity measure and number analogies to use (suitable k value) [1]. As our intention in this research was to focus on the problem of finding the suitable k value, we do not address other CBR related problems and stick to the dynamic selection of a suitable k value for each single test instance. With TEAK it is possible to better understand the data on which effort estimation is to be done and use different number of analogies (k value) for each test instance. TEAK prunes irrelevant analogies in train set for a test project and there by finds the number of analogies to be used during estimation. This approach has outperformed all other k-based CBR methods in terms of predictive accuracy upto more than 100%. |
|