Text Extraction from Tamil and Hindi Document Images using Open Source Optical Character Recognition tools |
Author(s): |
Dr. S. VIJAYARANI |
Keywords: |
Optical Character Recognition, OCR architecture for Tamil and Hindi document images, Google Docs, Free Online OCR, i2OCR. |
Abstract |
Optical Character Recognition (OCR) is a technique, which is used to extract the text from document images and convertedinto text format. This kind of information retrieval is called as recognition based retrieval hence that it can be edited, searched, stored more efficiently. OCR is used for many applications such as library, organization, bank cheques, number plate recognition, historical book analysis and many others applications.Various OCR tools are available for converting document images in different types of languages.The primary objective of this work is to compare the performance analysis of the three different OCR tools for extracting the text informationfrom Tamil and Hindi document images. The OCR tools considered in this analysis are Google Docs, Free Online OCR and i2OCR. Based on the conversion accuracy it is observed that the performance of Free Online OCR is better than other OCR tools. |
Other Details |
Paper ID: IJSARTV Published in: Volume : 2, Issue : 11 Publication Date: 11/1/2016 |
Article Preview |
Download Article |