Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)

Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction

Authors
Jiyi Xiao1, Lamei Zou, Chuanqi Li
1School of Computer Science and Technology, University of South China
Corresponding Author
Jiyi Xiao
Available Online October 2007.
DOI
https://doi.org/10.2991/iske.2007.48How to use a DOI?
Keywords
hidden Markov model; genetic algorithm; Baum-Welch algorithm; Viterbi algorithm; information extraction
Abstract

This paper demonstrates a new training method based on GA and Baum-Welch algorithms to obtain an HMM model with optimized number of states in the HMM models and its model parameters for web information extraction. This method is not only able to overcome the shortcomings of the slow convergence speed of the HMM approach. In addition, this method also finds better number of states in the HMM topology as well as its model parameters. From the experiments with the 2100 webs extracted from our corpus, this method is able to find the optimal topology in all cases. The experiments are found that the GA-HMM approach has an average precision rate of 84.483% while the HMM trained by the Baum-Welch method has an average precision rate of 71.049%. This implies that the GA-HMM method is more optimized than the HMM trained by the Baum-Welch method.

Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
Series
Advances in Intelligent Systems Research
Publication Date
October 2007
ISBN
978-90-78677-04-8
ISSN
1951-6851
DOI
https://doi.org/10.2991/iske.2007.48How to use a DOI?
Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Jiyi Xiao
AU  - Lamei Zou
AU  - Chuanqi Li
PY  - 2007/10
DA  - 2007/10
TI  - Optimization of Hidden Markov Model by a Genetic Algorithm for Web Information Extraction
BT  - Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
PB  - Atlantis Press
SP  - 282
EP  - 287
SN  - 1951-6851
UR  - https://doi.org/10.2991/iske.2007.48
DO  - https://doi.org/10.2991/iske.2007.48
ID  - Xiao2007/10
ER  -