Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)

Research on improved K - nearest neighbor algorithm based on spark platform

Authors
Yushui Geng, Xianzhao Yan
Corresponding Author
Yushui Geng
Available Online October 2017.
DOI
10.2991/jimec-17.2017.120How to use a DOI?
Keywords
big data; hadoop; spark; K - Nearest Neighbor Algorithm; weight.
Abstract

Today, big data technology is growing rapidly. The birth of Hadoop makes people concerned about the study of MapReduce, And Spark through the introduction of RDD data model and memory-based computing model, So that it can be well adapted to the data mining of big data this scene, And superior to Hadoop in iterative computing, Quickly became the majority of enterprises, scholars of the research focus. K nearest neighbor algorithm(KNN is used instead of the following) is a very important classification algorithm. A lot of people are studying it, But there is no mature solution to the algorithm in the spark platform to achieve parallelization. In this paper, The author realizes the parallelization of the improved KNN on the spark platform. We use clustering algorithms, Find the weight of each training sample in the training sample set, The weights of the K samples are used to distinguish the K nearest neighbors from the test sample. It is proved by experiments that the improved KNN has better accuracy.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
Series
Advances in Computer Science Research
Publication Date
October 2017
ISBN
10.2991/jimec-17.2017.120
ISSN
2352-538X
DOI
10.2991/jimec-17.2017.120How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yushui Geng
AU  - Xianzhao Yan
PY  - 2017/10
DA  - 2017/10
TI  - Research on improved K - nearest neighbor algorithm based on spark platform
BT  - Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
PB  - Atlantis Press
SP  - 553
EP  - 557
SN  - 2352-538X
UR  - https://doi.org/10.2991/jimec-17.2017.120
DO  - 10.2991/jimec-17.2017.120
ID  - Geng2017/10
ER  -