Research on improved K - nearest neighbor algorithm based on spark platform

Yushui Geng; Xianzhao Yan

doi:10.2991/jimec-17.2017.120

<Previous Article In Volume

Next Article In Volume>

Research on improved K - nearest neighbor algorithm based on spark platform

Authors

Yushui Geng, Xianzhao Yan

Corresponding Author

Yushui Geng

Available Online October 2017.

DOI: 10.2991/jimec-17.2017.120 How to use a DOI?
Keywords: big data; hadoop; spark; K - Nearest Neighbor Algorithm; weight.
Abstract: Today, big data technology is growing rapidly. The birth of Hadoop makes people concerned about the study of MapReduce, And Spark through the introduction of RDD data model and memory-based computing model, So that it can be well adapted to the data mining of big data this scene, And superior to Hadoop in iterative computing, Quickly became the majority of enterprises, scholars of the research focus. K nearest neighbor algorithm(KNN is used instead of the following) is a very important classification algorithm. A lot of people are studying it, But there is no mature solution to the algorithm in the spark platform to achieve parallelization. In this paper, The author realizes the parallelization of the improved KNN on the spark platform. We use clustering algorithms, Find the weight of each training sample in the training sample set, The weights of the K samples are used to distinguish the K nearest neighbors from the test sample. It is proved by experiments that the improved KNN has better accuracy.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
Series: Advances in Computer Science Research
Publication Date: October 2017
ISBN: 978-94-6252-366-1
ISSN: 2352-538X
DOI: 10.2991/jimec-17.2017.120 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yushui Geng
AU  - Xianzhao Yan
PY  - 2017/10
DA  - 2017/10
TI  - Research on improved K - nearest neighbor algorithm based on spark platform
BT  - Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
PB  - Atlantis Press
SP  - 553
EP  - 557
SN  - 2352-538X
UR  - https://doi.org/10.2991/jimec-17.2017.120
DO  - 10.2991/jimec-17.2017.120
ID  - Geng2017/10
ER  -

download .riscopy to clipboard