Proceedings of the 8th International Conference on Engineering Research, Innovation, and Education 2025 (ICERIE 2025)

Ensemble Based Machine Learning Models for Early Stroke Prediction and Risk Factors Identification

Authors
Touhid Alam1, *, Shusmita Anjum Aziz1, Sayedur Rahman1, Mohammad Sakib Mahmood2
1Department of Computer Science, American International University-Bangaldesh, 408/1, Kuratoli, 1229, Dhaka, Bangladesh
2Dept. of Computer Science, Missouri State University, 901 S. National Ave., Springfield, MO, 65897, USA
*Corresponding author. Email: 22-46330-1@student.aiub.edu
Corresponding Author
Touhid Alam
Available Online 18 November 2025.
DOI
10.2991/978-94-6463-884-4_68How to use a DOI?
Keywords
Stroke Prediction; Predictive Model; Ensemble Learning; Model Evolution; Risk Factors
Abstract

Stroke imposes an immense burden on healthcare systems across the globe as a leading cause for mortality and permanent disablement. The condition occurs by an alteration in blood flow to the brain that results in neurological damage. As a result, early and accurate pre- diction of stroke risk is critical for timely intervention and prevention, which can significantly reduce its impact. Therefore, the primary objective of this study is to develop an ensemble-based Machine Learning (ML) model that can accurately predict the risk of stroke and identify the key factors contributing to it. To ensure optimal performance, data preprocessing was carefully implemented. Using a dataset sourced from Kaggle, comprising 5110 instances with 12 clinical and demographic features, the study employed multiple ensemble-based classifiers, including Decision Tree (DT), Random Forest (RF) classifier, Gradient Boosting (GB) classifier, AdaBoost classifier, Bagging classifier, and Extra Trees (ET) classifier. Among these models, the ET classifier achieved exceptional results, with 99.36% training accuracy and 99.43% testing accuracy. The minimal gap between training (99.36%) and testing (99.43%) accuracy indicates no significant overfitting, ensuring the model generalizes well to new data. Comprehensive evaluation metrics further highlighted the model’s performance, with precision (98.90%), recall (100%), F1-score (99.45%), Matthews Correlation Coefficient (MCC) (98.87%), AUC (1.0), RMSE (0.08%), and MSE (0.01%). Feature importance analysis conducted using the ET classifier identified age, average glucose level, and BMI as the most significant predictors of stroke. These results highlight the effectiveness of ensemble-based models in early stroke detection and risk stratification using demographic and clinical features.

Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 8th International Conference on Engineering Research, Innovation, and Education 2025 (ICERIE 2025)
Series
Advances in Engineering Research
Publication Date
18 November 2025
ISBN
978-94-6463-884-4
ISSN
2352-5401
DOI
10.2991/978-94-6463-884-4_68How to use a DOI?
Copyright
© 2025 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Touhid Alam
AU  - Shusmita Anjum Aziz
AU  - Sayedur Rahman
AU  - Mohammad Sakib Mahmood
PY  - 2025
DA  - 2025/11/18
TI  - Ensemble Based Machine Learning Models for Early Stroke Prediction and Risk Factors Identification
BT  - Proceedings of the 8th International Conference on Engineering Research, Innovation, and Education 2025 (ICERIE 2025)
PB  - Atlantis Press
SP  - 566
EP  - 574
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-884-4_68
DO  - 10.2991/978-94-6463-884-4_68
ID  - Alam2025
ER  -