DATA DRIVEN MACHINE LEARNING ENSEMBLE APPROACH FOR DIABETES RISK PREDICTION AT EARLY STAGES
Abstract
Diabetes mellitus is characterized as severe illness with disruption in glucose, lipid, and protein metabolism. Hyperglycemia, or high blood sugar levels, is the most common symptom of all types of diabetes. Diabetes has become much more common as a result of contemporary living. As a result, early illness detection is critical. ML is grown among health care professionals and clinicians as it has tremendous potentials for generating tool for disease management, risk prediction, therapy, prognosis. This article, offers an ensemble strategy for diabetes prediction at the early stages that combines AdaBoost and CatBoost. The suggested technique is called Sel stack AdaCat, and it attempts to produce high-efficiency risk prediction tools for type2 diabetes incidence. Characteristics analysis are performed to assess significance and investigate relationships with diabetes. These include the most common diabetic symptoms, which normally grow gradually, and can serve as tools to train and assess various ML algorithms. Different ML algorithms are evaluated and compared in regards to Precision, Recall, F-Measure, Reliability, and AUC utilizing 10-fold cross-validation and information splitting.
Keywords- diabetics, machine learning, preprocessing, ensemble classifier, meta-heuristic optimization.