BSER-Bengali Speech Emotion Recognition using MFCC Features Based on SUBESCO and BanglaSER: A Comparative Study of Machine Learning Models
- DOI
- 10.2991/978-94-6463-884-4_77How to use a DOI?
- Keywords
- SER; MFCC; Data augmentation; KNN; Machine learning; sampling rate
- Abstract
Speech-emotion recognition,(SER) is an important category in human-computer interaction as it makes it possible for a machine to comprehend and interpret human emotions. It seeks to determine the categories of emotions elicited through speech signals during social communication. This study focuses on the impact of feature extraction techniques and machine learning models on SER performance, utilizing a combined dataset of the SUBESCO and BanglaSER datasets and individual also. Data augmentation through noise injection enhances model robustness. Mel-Frequency Cepstral Coefficients (MFCCs) are extracted as primary features, with a sensitivity analysis conducted by varying the number of MFCCs (1 to 50) to determine the optimal value for SER accuracy. The impact of different sampling rates on feature extraction and subsequent model performance is also investigated. Various machine learning models, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest, and Logistic Regression, are trained and evaluated. The sensitivity analysis reveals that using 30 MFCCs yields the highest accuracy, and higher-order MFCCs significantly improve model performance in BSER. The optimal sampling rate for the dataset is found to be 44,100 Hz. Among the evaluated models, KNN demonstrates the best performance with an accuracy of 92.10%. This study provides important insights regarding the practical application of machine learning for SER in the context of the Bengali language.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Nripendro Biswas AU - Ahmed Rijvee AU - Munaem Ahmed Mahdi AU - Alif Azam AU - Md. Takoat Hossain PY - 2025 DA - 2025/11/18 TI - BSER-Bengali Speech Emotion Recognition using MFCC Features Based on SUBESCO and BanglaSER: A Comparative Study of Machine Learning Models BT - Proceedings of the 8th International Conference on Engineering Research, Innovation, and Education 2025 (ICERIE 2025) PB - Atlantis Press SP - 636 EP - 644 SN - 2352-5401 UR - https://doi.org/10.2991/978-94-6463-884-4_77 DO - 10.2991/978-94-6463-884-4_77 ID - Biswas2025 ER -