An approach to vocabulary expansion for neural network language model by means of hierarchical clustering

Pavel V. Dudarin; Nadezhda G. Yarushkina

doi:10.2991/eusflat-19.2019.85

<Previous Article In Volume

Next Article In Volume>

An approach to vocabulary expansion for neural network language model by means of hierarchical clustering

Authors

Pavel V. Dudarin, Nadezhda G. Yarushkina

Corresponding Author

Pavel V. Dudarin

Available Online August 2019.

DOI: 10.2991/eusflat-19.2019.85 How to use a DOI?
Keywords: NLP Language model Neural Network RNN ULMFiT Clustering Fuzzy graph clustering Word-to-vec
Abstract: Neural network language models become the main tool to solve tasks in NLP field. These models already have shown state-of-the-art results in classification, translation, named entity recognition and so on. Pre-trained models are distributed freely in the internet, and could be reused with help of transfer learning techniques. However, the real life problem's domain could differ from the origin domain which the network was trained. In this paper an approach to vocabulary expansion for neural network language model by means of hierarchical clustering is proposed. This technique allows to adopt pre-rained language model to a different domain. Firstly, tokens from the language model are hierarchically clustered. Then new words from problem's domain are matched to the tokens accordingly obtained hierarchy. In the experimental part the proposed approach is demonstrated on the slightly modified ULMFiT language model.
Copyright: © 2019, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
Series: Atlantis Studies in Uncertainty Modelling
Publication Date: August 2019
ISBN: 978-94-6252-770-6
ISSN: 2589-6644
DOI: 10.2991/eusflat-19.2019.85 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Pavel V. Dudarin
AU  - Nadezhda G. Yarushkina
PY  - 2019/08
DA  - 2019/08
TI  - An approach to vocabulary expansion for neural network language model by means of hierarchical clustering
BT  - Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
PB  - Atlantis Press
SP  - 614
EP  - 618
SN  - 2589-6644
UR  - https://doi.org/10.2991/eusflat-19.2019.85
DO  - 10.2991/eusflat-19.2019.85
ID  - Dudarin2019/08
ER  -

download .riscopy to clipboard