Devanagari OCR: Issues and Analysis of Newspaper digitization |
Author(s): |
Deepak Kumar Arya |
Keywords: |
Digitization, Newspaper, Preprocessing, OCR |
Abstract |
OCR software attempts to replicate the combined functions of the human eye and brain, which is why it is referred to as artificial intelligence software. A human can quickly and easily recognize text of varying fonts and of various print qualities on a newspaper page, and will apply their language and cognitive abilities to correctly translate this text into meaningful words. This paper highlights some of issues that came up during the course of the newspaper digitization, how OCR software works on newspapers, factors that effect OCR accuracy, methods of improving accuracy, and testing methods and results for specific solutions that were considered viable for large scale text digitization projects. |
Other Details |
Paper ID: IJSARTV Published in: Volume : 2, Issue : 3 Publication Date: 3/3/2016 |
Article Preview |
Download Article |