keyboard_arrow_up
A Case Study in Computer Understanding of Printed-Forms

Authors

Davood Falahati1, Hojat Cheraghi2 and Kazem Ghalamchi3, 1Isfahan University of Technology, Iran, 2Tehran Science and Research University, Iran and 3Ghalamchi Foundation, Iran

Abstract

Data entry is a time consuming and erroneous procedure in its nature. In addition, validity check of submitted information is not easier than retyping it. In a mega-corporation like Kanoon Farhangi Amoozesh, there are almost no way to control the authenticity of students' educational background. By the virtue of fast computer architectures, optical character recognition, a.k.a. OCR, systems have become viable. Unfortunately, general-purpose OCR systems like Google's Tesseract are not handful because they don't have any a-priori information about what they are reading. In this paper the authors have taken a in-depth look on what has done in the field of OCR in the last 60 years. Then, a custom-made system adapted to the problem is presented which is way more accurate than general purpose OCRs. The developed system reads more than 60 digits per second. As shown in the Results section, the accuracy of the devised method is reasonable enough to be exposed in public use.

Keywords

Optical character recognition, tesseract, neural networks, row finding, segmentation.

Full Text  Volume 5, Number 4