Abstract:
Although most of machine learning algorithms try to minimize cost-insensitive losses, many real world applications require cost-sensitive approaches where misclassifi cation costs among classes differ from each other. In addition to misclassification costs, examples in data sets may have nonidentical costs which is a case of example-dependent cost- sensitive learning. For example in credit scoring, mistakenly rejecting a good borrower and approving a bad client with financial distress result in different costs. Additionally, providing variety of credit amounts to applicants makes the credit scoring example-dependent. In other words, falsely approving 100M$ and 1M$ loans produce unequal costs. To overcome this problem, this thesis proposes an example- dependent cost-sensitive loss function. With the introduced loss function, cost sensitivity is handled during the learning process. This is achieved by changing the traditional loss function of Gradient Boosting Machines with the proposed one to make it Example-Dependent Cost-Sensitive Gradient Boosting Machines. The proposed algorithm is tested on two real world data sets that include credit amounts and synthetically generated data sets. The algorithm is compared with cost- insensitive learners, previously proposed example-dependent cost-sensitive classifiers that handles cost-sensitivity during learning, a post-processing method called Thresholding and a pre-processing method Oversam pling to make cost-insensitive classifiers cost-sensitive. Results show that our method outperforms those four methods in terms of financial savings.