Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2023 (iCAST-ES 2023)

Bertopic and NER Stop Words for Topic Modeling on Agricultural Instructional Sentences

Authors
Trisna Gelar1, *, Aprianti Nanda Sari1
1Department of Computer and Informatics Engineering, Politeknik Negeri Bandung POLBAN, Bandung, Indonesia
*Corresponding author. Email: trisna.gelar@polban.ac.id
Corresponding Author
Trisna Gelar
Available Online 17 February 2024.
DOI
10.2991/978-94-6463-364-1_14How to use a DOI?
Keywords
BERTopic; NER; Stop Words; Topic Modeling
Abstract

A drawback of topic modeling is the lack of consistent sentence frequency within each topic. The outcome of this event manifests as varying levels of topic coherence and topic diversity. One potential approach to addressing this issue involves the modification of stop words, which refers to the removal of unneeded or excessively utilized terms. In the context of specialist areas like health, law, and agriculture, the identification of stop words can be achieved through the utilization of Name Entity Recognition (NER). This procedure involves preprocessing the data before subjecting it to topic modeling. Furthermore, it is possible to investigate the utilization of several topic modeling elements in conjunction with BERTopic to enhance the efficacy of the generated topics. The most effective configuration for the BERTopic pipeline consists of employing Sentence Embedding for text representation, UMAP Dimensionality Reduction for feature reduction, HDBScan Clustering for grouping similar documents, and utilizing a combination of Named Entity Recognition (NER) for removing stop words and C-TF-IDF for topic representation. This has resulted in the highest level of topic diversity performance for JADI and PUW by 0,982 and 0,990. The method generated the minimum number of outliers. However, there has been a decrease in the effectiveness of topic coherence.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2023 (iCAST-ES 2023)
Series
Advances in Engineering Research
Publication Date
17 February 2024
ISBN
10.2991/978-94-6463-364-1_14
ISSN
2352-5401
DOI
10.2991/978-94-6463-364-1_14How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Trisna Gelar
AU  - Aprianti Nanda Sari
PY  - 2024
DA  - 2024/02/17
TI  - Bertopic and NER Stop Words for Topic Modeling on Agricultural Instructional Sentences
BT  - Proceedings of the International Conference on Applied Science and Technology on Engineering Science 2023 (iCAST-ES 2023)
PB  - Atlantis Press
SP  - 129
EP  - 140
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-364-1_14
DO  - 10.2991/978-94-6463-364-1_14
ID  - Gelar2024
ER  -