A New Approach to Extract Text from Images based on DWT and K-means Clustering
- DOI
- 10.1080/18756891.2016.1237189How to use a DOI?
- Keywords
- Text extraction; Texture features; DWT; K-means clustering; sliding window; voting decision
- Abstract
Text present in image provides important information for automatic annotation, indexing and retrieval. Therefore, its extraction is a well known research area in computer vision. However, variations of text due to differences in orientation, alignment, font, size, low image contrast and complex background make the problem of text extraction extremely challenging. In this paper, we propose a texture-based text extraction method using DWT with K-means clustering. First, the edges are detected from image by using DWT. Then, a small size overlapped sliding window is used to scan high frequency component sub-bands from which texture features of text and non-text regions are extracted. Based on these features, K-means clustering is employed to classify the image into text, simple background and complex background clusters. Finally, voting decision process and area based filtering are used to locate text regions exactly. Experimentation is carried out using public dataset ICDAR 2013 and our own dataset for English, Hindi and Punjabi text images for different number of clusters. The results show that the proposed method gives promising results with different languages in terms of detection rate (DR), precision rate (PR) and recall rate (RR).
- Copyright
- © 2016. the authors. Co-published by Atlantis Press and Taylor & Francis
- Open Access
- This is an open access article under the CC BY-NC license (http://creativecommons.org/licences/by-nc/4.0/).
Download article (PDF)
View full text (HTML)
Cite this article
TY - JOUR AU - Deepika Ghai AU - Divya Gera AU - Neelu Jain PY - 2016 DA - 2016/09/01 TI - A New Approach to Extract Text from Images based on DWT and K-means Clustering JO - International Journal of Computational Intelligence Systems SP - 900 EP - 916 VL - 9 IS - 5 SN - 1875-6883 UR - https://doi.org/10.1080/18756891.2016.1237189 DO - 10.1080/18756891.2016.1237189 ID - Ghai2016 ER -