Abstract:
Throughout the process of detecting lung cancer, using CT scans to predict the malignancy level of pulmonary nodules will be complicated process for radiologists. CAD gives a second opinion to radiologists to identify lesions properly and distinguish malignant nodules at the early stage of lung cancer. In order to develop the CAD scheme, a coherent and consistent database such as the Lung Image Database Consortium (LIDC) database is the most crucial point to consider. In that database, CT scans are evaluated by four di erent radiologists and their annotations on nodule characteristics are highly e cient for researchers. One of these characteristics is malignancy that has 5 ratings: Highly - moderately unlikely, indeterminate, moderately - highly suspicious. In this study, the classi er performances of SVM, RF and ANN are compared using 1018 cases, 907 nodules and 110 extracted features. Experimental results demonstrate that best performing classi ers are respectively ANN, SVM and RF on malignancy prediction. The most critical gap of LIDC Database is the lack of ground truth data that is mainly caused by the absence of biopsy results. Therefore, by using arithmetic mean voting, this problem might be avoided and desired information might be acquired. The results of analyses show that grouping radiologists' malignancy ratings increases classi cation accuracy. Classi ers are examined with the use of 5 class, 3 class (benign, indeterminate, malignant) and 2 class (benign, malignant) ratings on malignancy datasets. Experiments show that the classi cation performance is enhanced by grouping malignancy ratings. Three groups of datasets' classi cation results indicate that moderately and highly malignant separation assessments a ect classi cation performance negatively. However, using two classes under the name of benign and malignant, increases the accuracy rate up to 98%.|Keywords : CAD, Lung Cancer Classi cation, ANN, SVM, RF.