Abstract:
Tuberculosis, which is sometimes referred as white plague, is one of the most dangerous diseases caused by bacteria in our era. The species causing the sickness is in the family of Mycobacteriaceae and called mycobacterium tuberculosis. Bacteria are able to acquire resistance to antibiotics, so mortality rate among tuberculosis patients is increasing. This thesis examines di erent machine learning algorithms to detect antibiotic resistance to four rst-line drugs in tuberculosis treatment. Variants on 23 target genes are included as input for each model. The base mycobacterium tuberculosis genome, which is used to detect variants on each sample in the data set, is the genome with id h37rv. Bacteria having h37rv as genome, are susceptible to all rst-line antibiotics. Di erent machine learning algorithms are investigated and compared to each other. We observe that traditional machine learning algorithms have higher performance than multilayer perceptrons do. The impact of di erent data representations used in information retrieval on antibiotic resistance detection is also examined and we can not nd any clear evidence for them to improve machine learning models' performances. Additionally, the contributions of mutations are ranked via the SHAP methodology used in the interpretation of machine learning models. We propose ten mutations with the highest SHAP values for each target drug as resistance determinants.