Tesseract OCR Recognition Based on Arabic Machine-Printed Document

Rakesh Ramteke; Mohammed Rashed Ali Omar Maamari

doi:10.2991/978-94-6463-196-8_27

<Previous Article In Volume

Next Article In Volume>

Tesseract OCR Recognition Based on Arabic Machine-Printed Document

Authors

Rakesh Ramteke¹^{, *}, Mohammed Rashed Ali Omar Maamari¹

¹School of Computational Sciences, Kavayitri Bahinabai Chaudhary North Maharashtra University, Jalgaon, (MS), India

^*Corresponding author. Email: rakeshj.ramteke@gmail.com

Corresponding Author

Rakesh Ramteke

Available Online 10 August 2023.

DOI: 10.2991/978-94-6463-196-8_27 How to use a DOI?
Keywords: PyTesseract; OCR; Arabic; Recognizing; Detecting
Abstract: This paper provides technical aspects and the context of Recognizing and Detecting Arabic characters using Tesseract OCR Engine. OCR engine is freely available and gives a better result and also is supporting many languages such as Arabic etc. The procedure begins by transforming the Arabic documents into machine format (scanning) and then recognizing as well as extracting the text using the PyTesseract library. The OCR is a system that can afford the considerable values of split errors, particularly while working with cursive languages like the Arabic language with repeated overlapping between letters. Moreover, The performance is 99.5 accuracy in OCR-tesseract for converting the Arabic image documents to text editable.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022)
Series: Advances in Intelligent Systems Research
Publication Date: 10 August 2023
ISBN: 978-94-6463-196-8
ISSN: 1951-6851
DOI: 10.2991/978-94-6463-196-8_27 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Rakesh Ramteke
AU  - Mohammed Rashed Ali Omar Maamari
PY  - 2023
DA  - 2023/08/10
TI  - Tesseract OCR Recognition Based on Arabic Machine-Printed Document
BT  - Proceedings of the First International Conference on Advances in Computer Vision and Artificial Intelligence Technologies (ACVAIT 2022)
PB  - Atlantis Press
SP  - 347
EP  - 355
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6463-196-8_27
DO  - 10.2991/978-94-6463-196-8_27
ID  - Ramteke2023
ER  -

download .riscopy to clipboard