Comparison of Effectiveness of Stemming Algorithms in Indonesian Documents
- DOI
- 10.2991/aer.k.210810.025How to use a DOI?
- Keywords
- Effectiveness, Stemming, Indonesian, Document
- Abstract
Stemming is a process to determine basic word with some rules. In Bahasa Indonesia, the way is to eliminate prefixes, infixes, suffixes, or combination of prefixes and suffixes in derivative words. Several stemming algorithms for Bahasa Indonesia have been developed. But their effectiveness has not been studied. In this study, these three stemming algorithms will be compared. We used 900 affixes to conduct the comparison. Each word is searched for their basic words using the three algorithms. The basic word resulted then referred to KBBI or Indonesian dictionary to see whether they are right. Comparison process of stemming show that Sastrawi’s could do the best stemming that 95,2% of the affix words tested could be root words. The Nazief & Adriani Algorithm resulted 92,4%, while Arifin Setiono’s finished at 89%. It could state that Arifin Setiono’s needs a lot of improvement because many affixed words could not return to the root word.
- Copyright
- © 2021, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Dyah Mustikasari AU - Ida Widaningrum AU - Rizal Arifin AU - Wahyu Henggal Eka Putri PY - 2021 DA - 2021/08/11 TI - Comparison of Effectiveness of Stemming Algorithms in Indonesian Documents BT - Proceedings of the 2nd Borobudur International Symposium on Science and Technology (BIS-STE 2020) PB - Atlantis Press SP - 154 EP - 158 SN - 2352-5401 UR - https://doi.org/10.2991/aer.k.210810.025 DO - 10.2991/aer.k.210810.025 ID - Mustikasari2021 ER -