Research on improved K - nearest neighbor algorithm based on spark platform
- DOI
- 10.2991/jimec-17.2017.120How to use a DOI?
- Keywords
- big data; hadoop; spark; K - Nearest Neighbor Algorithm; weight.
- Abstract
Today, big data technology is growing rapidly. The birth of Hadoop makes people concerned about the study of MapReduce, And Spark through the introduction of RDD data model and memory-based computing model, So that it can be well adapted to the data mining of big data this scene, And superior to Hadoop in iterative computing, Quickly became the majority of enterprises, scholars of the research focus. K nearest neighbor algorithm(KNN is used instead of the following) is a very important classification algorithm. A lot of people are studying it, But there is no mature solution to the algorithm in the spark platform to achieve parallelization. In this paper, The author realizes the parallelization of the improved KNN on the spark platform. We use clustering algorithms, Find the weight of each training sample in the training sample set, The weights of the K samples are used to distinguish the K nearest neighbors from the test sample. It is proved by experiments that the improved KNN has better accuracy.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yushui Geng AU - Xianzhao Yan PY - 2017/10 DA - 2017/10 TI - Research on improved K - nearest neighbor algorithm based on spark platform BT - Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017) PB - Atlantis Press SP - 553 EP - 557 SN - 2352-538X UR - https://doi.org/10.2991/jimec-17.2017.120 DO - 10.2991/jimec-17.2017.120 ID - Geng2017/10 ER -