Optimizing Tuberculosis Diagnosis: A Comparative Study of Machine Learning Algorithms and Feature Selection MethodsDisorder: A Cross-Sectional Study
DOI:
https://doi.org/10.62019/brac9p17Abstract
The disease known as tuberculosis (TB) is extremely contagious and can be fatal if left untreated. When someone with tuberculosis coughs, sneezes, or speaks, airborne droplets are released into the air. Over the course of a year, one untreated TB patient can infect ten to fifteen others. This study investigates the risk factors associated with tuberculosis (TB) and explores the predictive capabilities of various machine learning algorithms. The research utilizes a dataset comprising 452 patient records from two hospital named as Mardan International Hospital and Khyber Hospital Mardan in district Mardan, encompassing 12 characteristics. The binary response variable differentiates between TB-positive and TB-negative cases. The study employs a range of machine learning techniques, including classification trees, random forests, k-nearest neighbors, random k-nearest neighbors, neural networks, and logistic regression. Feature selection was performed to identify the most relevant predictors. Model performance was evaluated using an independent test data set, assessing metrics such as classification accuracy, sensitivity, specificity, and kappa statistic. Despite its simplicity, the classification tree model demonstrated superior performance across most evaluation metrics compared to more complex algorithms, regardless of the number of selected features.
Keywords: Tuberculosis (TB), Machine Learning, Feature selection, Classification accuracy