Ensemble Based Machine Learning Models for Early Stroke Prediction and Risk Factors Identification
- DOI
- 10.2991/978-94-6463-884-4_68How to use a DOI?
- Keywords
- Stroke Prediction; Predictive Model; Ensemble Learning; Model Evolution; Risk Factors
- Abstract
Stroke imposes an immense burden on healthcare systems across the globe as a leading cause for mortality and permanent disablement. The condition occurs by an alteration in blood flow to the brain that results in neurological damage. As a result, early and accurate pre- diction of stroke risk is critical for timely intervention and prevention, which can significantly reduce its impact. Therefore, the primary objective of this study is to develop an ensemble-based Machine Learning (ML) model that can accurately predict the risk of stroke and identify the key factors contributing to it. To ensure optimal performance, data preprocessing was carefully implemented. Using a dataset sourced from Kaggle, comprising 5110 instances with 12 clinical and demographic features, the study employed multiple ensemble-based classifiers, including Decision Tree (DT), Random Forest (RF) classifier, Gradient Boosting (GB) classifier, AdaBoost classifier, Bagging classifier, and Extra Trees (ET) classifier. Among these models, the ET classifier achieved exceptional results, with 99.36% training accuracy and 99.43% testing accuracy. The minimal gap between training (99.36%) and testing (99.43%) accuracy indicates no significant overfitting, ensuring the model generalizes well to new data. Comprehensive evaluation metrics further highlighted the model’s performance, with precision (98.90%), recall (100%), F1-score (99.45%), Matthews Correlation Coefficient (MCC) (98.87%), AUC (1.0), RMSE (0.08%), and MSE (0.01%). Feature importance analysis conducted using the ET classifier identified age, average glucose level, and BMI as the most significant predictors of stroke. These results highlight the effectiveness of ensemble-based models in early stroke detection and risk stratification using demographic and clinical features.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Touhid Alam AU - Shusmita Anjum Aziz AU - Sayedur Rahman AU - Mohammad Sakib Mahmood PY - 2025 DA - 2025/11/18 TI - Ensemble Based Machine Learning Models for Early Stroke Prediction and Risk Factors Identification BT - Proceedings of the 8th International Conference on Engineering Research, Innovation, and Education 2025 (ICERIE 2025) PB - Atlantis Press SP - 566 EP - 574 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-884-4_68 DO - 10.2991/978-94-6463-884-4_68 ID - Alam2025 ER -