Distributed Synthetic Minority Oversampling Technique

Sakshi Hooda; Suman Mann

doi:10.2991/ijcis.d.190719.001

<Previous Article In Issue

Next Article In Issue>

Volume 12, Issue 2, 2019, Pages 929 - 936

Distributed Synthetic Minority Oversampling Technique

Authors

Sakshi Hooda¹^{, *}, Suman Mann²

¹Research Scholar, IPU, New Delhi, India

²Associate Professor, MSIT, New Delhi, India

^*Corresponding author. Email: sakshihoodars@gmail.com

Corresponding Author

Sakshi Hooda

Received 1 May 2019, Accepted 10 July 2019, Available Online 30 July 2019.

DOI: 10.2991/ijcis.d.190719.001 How to use a DOI?
Keywords: SMOTE; apache spark; prediction; machine learning; imbalanced classification
Abstract: Real world problems for prediction usually try to predict rare occurrences. Application of standard classification algorithm is biased toward against these rare events, due to this data imbalance. Typical approaches to solve this data imbalance involve oversampling these “rare events” or under sampling the majority occurring events. Synthetic Minority Oversampling Technique is one technique that addresses this class imbalance effectively. However, the existing implementations of SMOTE fail when data grows and can't be stored on a single machine. In this paper present our solution to address the “big data challenge.” We provide a distributed version of SMOTE by using scalable k-means++ and M-Trees. With this implementation of SMOTE, we were able to oversample the “rare events” and achieve results which are better than the existing python version of SMOTE.
Copyright: © 2019 The Authors. Published by Atlantis Press SARL.
Open Access: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)
View full text (HTML)

<Previous Article In Issue

Next Article In Issue>

Journal: International Journal of Computational Intelligence Systems
Volume-Issue: 12 - 2
Pages: 929 - 936
Publication Date: 2019/07/30
ISSN (Online): 1875-6883
ISSN (Print): 1875-6891
DOI: 10.2991/ijcis.d.190719.001 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Sakshi Hooda
AU  - Suman Mann
PY  - 2019
DA  - 2019/07/30
TI  - Distributed Synthetic Minority Oversampling Technique
JO  - International Journal of Computational Intelligence Systems
SP  - 929
EP  - 936
VL  - 12
IS  - 2
SN  - 1875-6883
UR  - https://doi.org/10.2991/ijcis.d.190719.001
DO  - 10.2991/ijcis.d.190719.001
ID  - Hooda2019
ER  -

download .riscopy to clipboard