An approach to vocabulary expansion for neural network language model by means of hierarchical clustering
- DOI
- 10.2991/eusflat-19.2019.85How to use a DOI?
- Keywords
- NLP Language model Neural Network RNN ULMFiT Clustering Fuzzy graph clustering Word-to-vec
- Abstract
Neural network language models become the main tool to solve tasks in NLP field. These models already have shown state-of-the-art results in classification, translation, named entity recognition and so on. Pre-trained models are distributed freely in the internet, and could be reused with help of transfer learning techniques. However, the real life problem's domain could differ from the origin domain which the network was trained. In this paper an approach to vocabulary expansion for neural network language model by means of hierarchical clustering is proposed. This technique allows to adopt pre-rained language model to a different domain. Firstly, tokens from the language model are hierarchically clustered. Then new words from problem's domain are matched to the tokens accordingly obtained hierarchy. In the experimental part the proposed approach is demonstrated on the slightly modified ULMFiT language model.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Pavel V. Dudarin AU - Nadezhda G. Yarushkina PY - 2019/08 DA - 2019/08 TI - An approach to vocabulary expansion for neural network language model by means of hierarchical clustering BT - Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019) PB - Atlantis Press SP - 614 EP - 618 SN - 2589-6644 UR - https://doi.org/10.2991/eusflat-19.2019.85 DO - 10.2991/eusflat-19.2019.85 ID - Dudarin2019/08 ER -