JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER ENGINEERING
GUAJARATI CHARACTER RECOGNITION: THE STATE OF THE ART COMPREHENSIVE SURVEY
1AVANI R. VASANT, 2SANDEEP R. VASANT, 3DR. G.R. KULKARNI
1Research Scholar Singhania University, Rajasthan and Assistant professor and Head Department of Information Technology V.V.P. Engineering College, Rajkot
2 Research Scholar Saurashtra University, Rajkot & Lecturer AES Institute of Computer Studies, Ahmedabad University,
3Principal C.U.Shah College of Engg. & Tech., Wadhwan
, ,
ABSTRACT: - Character recognition is very interesting area of pattern recognition and it deals with offline handwriting recognition. Handwriting Recognition has continued to Persist as a means of communication, collection, recording and transmitting information in day-to-day life since the centuries even with the advent of the new technologies. Machine recognition has many practical applications, reading handwritten postal envelopes, amount written in bank checks, bill processing, government records, commercial forms, signature verification, offline document recognition etc. This Paper describes the state of the art survey of the work done for the Guajarati character recognition.
Keywords – Gujarati Character Recognition, online, offline, pre-processing, classifiers
ISSN: 0975 –6760| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01 Page 148
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN
COMPUTER ENGINEERING
I. INTRODUCTION
India is a versatile country. In India more than 20 official languages are there, Bengali, Malayalam, Hindi, English, Guajarati, Tamil, Kannada, Urdu etc [1]. Gujarati language is very popular language and it is an official language of the Gujarat State of India. More than 50 million people speak Gujarati language [2]. We can find that work has been done in the field of various languages like Chinese, Tamil, Telugu, and Kannada etc but very less work is done in the field of Gujarati language recognition.
Pattern recognition has become a very interesting topic for researchers during last few decades. Typed characters can be easily recognized by computer machine. But handwritten characters are not recognized efficiently and accurately by computer machine. Many researchers have done work to recognize these characters and many algorithms have been proposed to recognize characters. For more than 30 years, researchers have been working on handwritten recognition. Over the few past years, the numbers of companies involved in research on handwritten recognition are increasing continually.
Challenges in handwritten characters recognition lie in the variation and distortion of offline handwritten characters since different people may use different style of handwriting.
II. Gujarati Script
Gujarati-script used to write the Gujarati language. The Gujarati alphabet utilizes overall 75 distinct legitimate and recognized shapes, which mainly includes 59 Characters and 16 diacritics. Fifty-nine characters are divided into 36 consonants (34 Singular and 2 Compound (not lexically though)) means ornamented sounds, 13 vowels (pure sounds), and 10 numerical digits [4][5]. Sixteen diacritics are divided into 13 vowel and 3 other characters. The alphabet is ordered by logically grouping the vowels and the consonants based on their pronunciations [3].
There are many applications of this form of recognition. Like postal code verification, vehicle number plate recognition, bank cheque processing Assigning ZIP Codes to letter mail, Reading data entered in forms, e.g., tax forms, Automatic accounting procedures used in processing utility bills, Verification of account numbers, Automatic accounting of airline passenger tickets, Automatic validation of passports
Etc [6].
In particular, machines that can read symbols are very cost effective. A machine that reads banking checks can process many more checks than a human being in the same time. This kind of application saves time and money, and eliminates the requirement that a human perform such a repetitive task [1].
Gujarati digits are having different characteristics. They are having various shapes and it’s really difficult to recognize those shapes. Due to varieties in shapes there are some characters that are confusing and possibilities for misclassification are very high [5]. Figure 1 shows the Gujarati alphabets and digits.
Figure 1 Gujarati Characters and Gujarati Digits
III. Application of Character Recognition System
There are number of applications of Character Recognition System [1]
Task-specific Readers
It is basically used for voluminous data processing. It focuses on the specific application like
· Assigning ZIP codes to letter mail.
· Reading data entered in forms, e.g. tax forms
· Verification of account numbers and courtesy amounts on bank checks
· Automatic accounting procedure used in processing utility bills
· Automatic accounting of airline passenger tickets
· Automatic validation of passports
Address Readers
The address reader in a postal service reads the destination address block on the envelope and also reads the PIN code in the address block. Then using the PIN code it can sort the envelopes according to the area.
Form Reader
Form Reader automatically reads the data filled up in the form. It can find the printed and handwritten text in the form and also recognizes the same.
Check Reader
Automatic check reader reads amount and account information from the check image and recognizes the amount as well as the account information.
Signature verifier
Just like check reader there is a signature verifier that also reads the signature image from the check image and recognizes the same.
Bill Processing System
This system is basically used to read payment slips, bills or any value specified in the bill.
Passport Readers
An automatic passport reader can be used for the inspection purpose. That can verify the traveler’s information like name, age, passport number and also photograph image which saves time at the airports.
IV. Types of Character recognition system
A character recognition system basically deals with the recognizing offline handwritten character. Typically it can be classified as the following two types[8].
· Online recognition and
· Offline recognition
Online Character Recognition
In case of online character recognition, there is real time recognition of characters[6]. Online systems have better information for doing recognition since they have timing information and since they avoid the initial search step of locating the character as in the case of their offline counterpart. Online systems obtain the position of the pen as a function of time directly from the interface. Offline recognition of characters is known as a challenging problem because of the complex character shapes and great variation of character symbols written in different modes.
Offline Character Recognition
In case of offline character recognition, the typewritten/handwritten character is typically scanned in form of a paper document and made available in the form of a binary or gray scale image to the recognition algorithm. Offline character recognition is a more challenging and difficult task as there is no control over the medium and instrument used [7]. The artifacts of the complex interaction between the instrument medium and subsequent operations such as scanning and binarization present additional challenges to the algorithm for the offline character recognition. Therefore offline character recognition is considered as a more challenging task then its online counterpart.
V. Different Phases of the Character Recognition System
Figure 2 Components of the character recognition system
The Pre-processing step aims to improve the image data or the image features that required for the further processing. The pre-processing is a series of operations performed on the scanned input image. It essentially enhances the image. It involves converting an input image to binary image, noise removing, dilation operation, line segmentation and digit segmentation and normalization [9].
Feature Extraction is a very important step for any character recognition system. This step involves the procedures like shape information or style which is very much useful for the classification of the pattern. The feature extraction stage analyses a text segment and selects a set of features that can be used to uniquely identify the text segment [10].
Classification stage uses the features extracted to identify the text segment according to the algorithm. The task is to compare the testing patterns and minimizing the error rate and correct classification of the pattern.
Post-processing involves various approaches dictionary lookup and statistical approach or neural network recognition [11] for the correct recognition.
VI. Various classifiers used for the Gujarati character recognition
K-nearest Neighbor classifier
In [12] k-nearest neighbor classifier approach has been used for the Gujarati character recognition. K-nearest neighbor classifier has been found very good results for the English characters. It used the k-nearest samples to test sample and identifies it to that class which has the largest number of votes. The nearest neighbor is found by using the Euclidean distance measure. For 1-NN classifier the best recognition rate achieved was 67% in the binary feature space and in regular moment space the rate was 48%.
The Minimum Hamming Distance Classifier:
In [12] this approach has been used for the Gujarati character recognition. The Minimum Hamming Distance Classifier uses the Hamming Distance between the sample and the class centroids built using the training sets to classify characters. It is assumed that the image pixels have a Bernoulli distribution. Then the hamming distance is the sum of the absolute pixel difference (in binary space) between the class centroids and the image of the character being classified. Using this approach the recognition rate was only 39%.
Feed forward back propagation neural network classifier
Neural based character recognition can be found in [5][13][14].
In [5] a feed forward back propagation neural network is proposed for the classification of the Gujarati numerals. Various techniques are used in the preprocessing phase before implementing classification of numerals. Gujarati numerals are based on very sharp curves and curves are irregular, to handle this situation here in this work, various profiles of digits are used as template to identify various digits. In this very simple but effective, feature extraction technique the use of four different profiles, horizontal, vertical, and two diagonals, is suggested. The overall performance of this proposed network is as high as 81.66%.
A handwritten character recognition system using multilayer Feed forward neural network is proposed in [13]. Three different orientations, namely, horizontal, vertical and diagonal directions are used for extracting 54 features from each character. In addition, 9 and 6 features are obtained by averaging the values placed in zones row wise and column wise, respectively. As a result; every character is represented by 69, that is, 54 +15 features.
From the test results it is identified that the diagonal method of feature extraction yields the highest recognition accuracy of 98% for 54 features and 99% for 69features.
KNN and PCA classifier
In [15] they are using KNN classifier and PCA (to reduce dimensions of feature space) and used Euclidean similarity measure to classify the numerals. KNN classifier yielded 90 % as recognition rate whereas PCA scored recognition rate of 84%. The comparison of KNN and PCA is made and it can be seen that KNN classifier has shown better results as compared to PCA classifier.
SVM Classifier
In [16] authors propose the Support Vector Machine (SVM) based recognition scheme towards the recognition of Gujarati handwritten numerals. A technique based on affine invariant moments for feature extraction is applied and the recognition rate of 91% approximately.
V. CONCLUSION
This paper describes the various steps involved in the character recognition system. Then it also reviews various character recognition systems like online and offline recognition system. It also describes the various applications of the character recognition system. Last section reviews the various classifiers that can be used for Gujarati character recognition.
REFERENCES
[1] Online and off-line Handwriting Recognition: A comprehensive survey, Rejean Plamondon, Fellow IEEE, sargur Srihari, fellow IEEE, IEEE transactions on pattern analysis and machine intelligence, vol.22 no.1 january 2000
[2] Indian script character recognition: a survey U. Pal, B.B. Chaudhuri∗U. Pal, B.B. Chaudhuri / Pattern Recognition 37(2004) 1887– 1899
[3] OFFLINE TYPED GUJARATI CHARACTER RECOGNITION, Manish Kayasth, Dr. Bankim Patel, I S S N : 0 9 7 4 - 3 3 0 8, VOL. 2, NO. 1 ,JUNE 2009
[4] Babu Suthar - Gujarati-English Learner’s Dictionary
[5] Gujarati handwritten numeral optical character reorganization through neural network Apurva A. Desai, Elsevier, Pattern recognition, 43 (2010) 2582–2589
[6] CIA/DOE Partnership Program Proposal for FY99 (Sandia National Laboratories Proposal), 1998.
[7] S. N. Srihari, “Recognition of Handwritten and Machineprinted Text for Postal Address Interpretation”, Pattern Recognition Letters, 14, 1993, pp. 291-302.
[8] Genetic algorithm for feature selection and weighgting for off-line character recognition, Thesis , Faten T. Hussein, Egypt, 1995
[9] Rafael C. Gonzalez, Richard E. woods and Steven L.Eddins, Digital Image Processing using MATLAB, Pearson Education, Dorling Kindersley, South Asia, 2004
[10] Kharma, N. & ward, R. (1999). Character recognition systems for the Non-expert, in IEEE Canadian Review, 33, pp.5-8.
[11] K. Y. Rajput and Sangeeta Mishra , ”Recognition and Editing of Devnagari Handwriting Using Neural Network”, Proceedings of SPIT-IEEE Colloquium and International Conference, Mumbai, India Vol. 1, 66.
[12] S. Antani, L. Agnihotri, Guajarati character recognition, Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pp. 418–421.
[13]Diagonal Feature Extraction Based Handwritten Character System Using Neural Network,J.Pradeep ,E.Srinivasan, S.Himavathi, International Journal of Computer Applications (0975 – 8887) Volume 8– No.9, October 2010
[14]M. Hanmandlu, K.R.M. Mohan, and H. Kumar, “Neuralbased Handwritten character recognition,” in Proceedings of Fifth IEEE International Conference on Document Analysis and Recognition, ICDAR’99, pp. 241-244, Bangalore, India, 1999.
[15] Comparison Of Classifiers For Gujarati Numeral Recognition Baheti M. J., Kale K.V., Jadhav M.E., International Journal Of Machine Intelligence Issn: 0975–2927 & E-Issn: 0975–9166, Volume 3, Issue 3, 2011, Pp-160-163
[16]Support Vector Machine Based Gujarati Numeral Recognition Mamta Maloo, K.V.Kale International Journal On Computer Science And Engineering (Ijcse), Issn : 0975-3397 Vol. 3 No. 7 July 2011 , Pp2595-2600
ISSN: 0975 –6760| NOV 11 TO OCT 12 | VOLUME – 02, ISSUE - 01 Page 148