Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology

A Distributed Chinese Naive Bayes Classifier Based on Word Embedding

Authors
Mengke Feng, Guoshi Wu
Corresponding Author
Mengke Feng
Available Online March 2016.
DOI
10.2991/icmmct-16.2016.222How to use a DOI?
Keywords
Naive Bayes, Word Embedding, Distributed Classifier.
Abstract

The Naive Bayes classifier is built on the assumption of conditional independence between the attributes in a given class. The algorithm has been shown to be successful in text classification. But when calculating the conditional probability these methods take two different words as two different feature, no matter how close their meanings are.In this paper we proposed an algorithm to improve the calculation of probability that a word belonging to a class by using its related words base on word embedding and we named this model NBCBWE (Naive Bayes classifier based on word embedding). Word embedding provides a way of applying Deep Learning to solve natural language processing problems. In this way every word can be represented by a vector, and we can get the related word by calculate the similarity of two words. What’s more, as the data set grows larger, it can be very time consuming to store and classify text in a single computer. To decrease the consuming time, we parallel Bayes classification algorithm using Map-Reduceto implement the model on Hadoop. Our experiments shows that our model improves the precision in text classification and also processes more efficiently.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology
Series
Advances in Engineering Research
Publication Date
March 2016
ISBN
10.2991/icmmct-16.2016.222
ISSN
2352-5401
DOI
10.2991/icmmct-16.2016.222How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Mengke Feng
AU  - Guoshi Wu
PY  - 2016/03
DA  - 2016/03
TI  - A Distributed Chinese Naive Bayes Classifier Based on Word Embedding
BT  - Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology
PB  - Atlantis Press
SP  - 1120
EP  - 1126
SN  - 2352-5401
UR  - https://doi.org/10.2991/icmmct-16.2016.222
DO  - 10.2991/icmmct-16.2016.222
ID  - Feng2016/03
ER  -