Proceedings of the 2015 International conference on Applied Science and Engineering Innovation

Mining on the subset of raw data set based on clustering

Authors
Yuling Ma
Corresponding Author
Yuling Ma
Available Online May 2015.
DOI
10.2991/asei-15.2015.50How to use a DOI?
Keywords
Big data era; Clustering algorithm; Association rule mining; ID3; Subspace; PAC learnable; Sample complexity.
Abstract

With the advancement of information process, the amount of the data accumulated by all walks of life is increasing exponentially. The emergence of massive data brings challenges to the traditional machine learning and data mining algorithms. In view of this problem, there have been many new researches, such as distributed machine learning, GPU acceleration processing, and the optimization of algorithms. But even so, when the amount of data is very big, for example, the data which come from biological field, mining on these data directly is still time-consuming and memory-consuming. In such big data era, what should we do first before mining In this paper, we proposed mining subset method. It found out a representative subset of raw data through some related algorithms, and then applied data mining algorithms to the subset. Theory and experiments both verify the correctness of our method, especially when the dataset size is very large, the advantage of our method is more obvious.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International conference on Applied Science and Engineering Innovation
Series
Advances in Engineering Research
Publication Date
May 2015
ISBN
978-94-62520-94-3
ISSN
2352-5401
DOI
10.2991/asei-15.2015.50How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yuling Ma
PY  - 2015/05
DA  - 2015/05
TI  - Mining on the subset of raw data set based on clustering
BT  - Proceedings of the 2015 International conference on Applied Science and Engineering Innovation
PB  - Atlantis Press
SP  - 243
EP  - 246
SN  - 2352-5401
UR  - https://doi.org/10.2991/asei-15.2015.50
DO  - 10.2991/asei-15.2015.50
ID  - Ma2015/05
ER  -