In this paper, we present a scheme to identify different Indian scripts from a document image. The approach has a tree structure where at first Roman script words are separated using the `headline' feature. All rights reserved. translated. Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line. Recognition of handwritten Marathi characters is challenging field in image processing and character recognition system. Range of feature values of top and bottom profiles for all the three languages are obtained and stored in knowledge base for later use during decision-making. and background. sentences, 97% of the sentences were correctly detected and For Optical Character Recognition (OCR) of such a multilingual document, it is necessary to identify different language forms of the input document, before feeding the documents to the OCRs of individual language. part because people have barriers in languages. In this paper we present a detailed review of current script and language identification techniques. Daily used The proposed system is based on the characteristic features of top-profile and bottom-profile of individual text lines of the input document image. reproducing the text as a digital format that has been produced documents and converts images as OCR text using tesseract and then translates the text by The system is trained to learn the behavior of the top and bottom profiles with a training data set of 800 text lines. introduced in the process of OCR to get quality output via Script Identification from Trilingual Documents using Profile Based Features. This rectangle can be interpreted as a two-dimensional, 3×3 structure of nine parts which we define as bricks. implements image capturing techniques, optical character The technique is tested on 100 handwritten document pages containing both Devnagari and Roman script words and 99.54% of words are identified with their true class. As a via media, in this research we have proposed to work on the prioritized requirements of a particular region, for instance in Karnataka state in India,generally any document including official ones, would contain the text in three languages-English-the language of general importance, Hindi-the language of National importance and Kannada -the language of State/Regional importance. India is having more than 22 official language, every script has its own characteristics and features based on their unique feature we can distinguish one language script with another. We have received average identification accuracy of 67% in K-NN classifier and 80.14% in SVM classifier. character segmentation, simple and compound character separation, The translation is an essential part because The k-nearest neighbor classifier is used to classify the test sample. In every state of India there are two languages one is state local language and the other is English. In this paper, an intelligent feature based technique is reported, which automatically identifies the scripts of handwritten words from a document page, written in Devnagari script mixed with Roman script. tree classifier is used for simple character recognition. Script Line Separation from Indian Multi-Script Documents, English, Devnagari and Urdu Text Identification. We analysed 700 different words of Kannada, English and Hindi in order to extract the discrimination features and for the development of knowledge base. Join ResearchGate to find the people and research you need to help your work. Automatic identification of a script in a given document im- age facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environ- ment. In this paper, an automatic technique for word -wise identification of English, Devnagari and Urdu scripts from a single document is proposed. Next feature based on water reservoir principle, contour tracing, profile etc. * Learn English Speaking using an easy, simple yet comprehensive Kannada to English Speaking Course which is meant for teaching you English speaking. The feature sets and classification tree as well as the knowledge base required for error correction (such as lexicon) differ for Bangla and Devnagari. The framework and the portable hardware system developed takes images of printed computed to model an error detection and correction technique in the Raspberry Pi. In this paper, we are introducing a simple and efficient technique of script identification for Kannada, English and Hindi text words of a printed document. Before feature extraction and classification, handwritten character images were enhanced using preprocessing techniques. An automatic technique for the identification of printed Roman, Chinese, Arabic, Devnagari and Bangla text lines from a single document is proposed. At present, the system has an overall accuracy of about 97.52%. 2 (May, 2008), 116–126. At present, the system has an overall accuracy of 96.09%. The scheme has been tested on 10 Indian scripts and found to be robust to skew gener- ated in the process of scanning and relatively insensitive to change in font size. * Learn English Speaking using an easy, simple yet comprehensive Kannada to English Speaking Course which is meant for teaching you English speaking. background. Afterward, script is identifying at word level using fusion of moment based features and visual discriminating features. For example in Andhra Pradesh, a state in India, the document may contain text words in English and Telugu script. The proposed system is tested on 100 different document images containing more than 1000 text words of each script and a classification rate of 98.25%, 99.25% and 98.87% is achieved for Kannada, English and Hindi respectively. The performance has turned out to be 98.5%. Experimentation conducted involved 1500 text lines for learning and 1500 text lines for testing.

