What kinds of features are effective in hand-written optical character recognition systems? I am currently applying template matching using neural networks, but there is a lot of variety in handwriting so the results I obtained are not good.
Any Character Recognition system either uses Transform Domain features with classifiers like ANN, SVM, Quadratic etc or structural features with knowledge driven classifier (Rule base, Decision Tree etc) . first type of system gives very high accuracy on printed characters, but there are normally biased towards the dataset they learn from (i.e. a specific type of Font and/or script). however, handwritten recognition has much variations intern of style, thickness, boundary noise etc. I can relate this problem with speech recognition, where same word spoken by different speaker will be different, similarly, the same character/word written by different writer will be different. How to Design such system? again we can take help from speech recognition domain, like go to primitive levels identify phonemes, and then incrementally build complex sound from phonemes. Similarly, the system that identify character or world as single unit may not work, but we have to divide character into strokes (and may be sub-stroke level) and then incrementally build complex characters, we have recently proposed one such method for printed Gujarati(one of the popular Indian language) characters. but need many improvement before it can be applied to handwritten. we are still working on improving the performance. you can also look at some of the existing work on similar line of action as listed below.
1)V Nguyen, M Blumenstein, Techniques for static handwriting trajectory recovery: a survey
Proceedings of the 9th IAPR International …, 2010 - dl.acm.org
2)M Liwicki, Y Akira, S Uchida, M Iwamura, Reliable online stroke recovery from offline data with the data-embedding pen ICDAR 2011
3)A Hassaïne, S Al Maadeed, Competition on Handwriting Stroke Recovery from Offline Data
ICDAR 2013
4) Viard-Gaudin, C., P.-M. Lallican, and S. Knerr, Recognitiondirected
recovering of temporal information from handwriting images. Pattern Recognition Letters, 2005. 26: p. 2537-2548.
5) Bunke, H., et al., Recovery of temporal information of
cursively handwritten words for on-line recognition, in 4th
ICDAR. 1997, IEEE Press: Ulm, Germany. p. 931-935 vol.2.
6) Nel, E.M., J.A. du Preez, and B.M. Herbst, Estimating the pen trajectories of static signatures using hidden Markov models. PAMI, IEEE Transactions on, 2005. 27(11): p. 1733-
1746.
7) Kato, Y. and M. Yasuhara, Recovery of drawing order from single-stroke handwriting images. PAMI, IEEE Transactions on, 2000. 22(9): p. 938-949.
8) Lau, K.K., P.C. Yuen, and Y.Y. Tang, Stroke extraction and stroke sequence estimation on signatures, in 16th ICPR. 2002,IEEE Computer Society: Quebec, Canada. p. 119-122.
9) Niels, R. and L. Vuurpijl, Automatic Trajectory Extraction
and Validation of Scanned Handwritten Characters, in 10th
IWFHR. 2006, IRISA/INSA - Campus Universitaire de
Beaulieu: La Baule, France.
10)Stroke extraction based on ambiguous zone detection: a preprocessing step to recover dynamic information from handwritten Chinese characters
Z Su, Z Cao, Y Wang - International Journal on Document Analysis and …, 2009 – Springer
You should read some papers on "Optical character recognition - features", which have been working on this problem for some years now. Some of the most important features will be the number of closed loops, the direction and number of lines, number of line intersections/position of the intersection, etc.
My advice to you is to start with literature review. Choosing the right features may depend on the language you want to recognize. I attached here a link to a good survey in the field.
I believe you need to make machine think like human. How you recognize someone's writing. You see the distance, curvatures, joining or disjoining, size, mutual size, straightness. Off course, reviewing the literature on the subject will be of great advantage which you must be doing.
You can use ligatures (or connected components) if you are working on cursive script. You can also use hidden Markov model to solve script recognition problems. Classification using HMM is accurate.
Have a look at some papers on feature extraction, which can help you on edge detection, size and shape. You could also have a look in for character recognition in computer vision and use those techniques. You can have a look at my paper
For the recognition of handwritten charaters of much variation, image normalization is more important than features used. I recommend Googling "image normalization character recognition"
OCR has been used in commercial applications since 1914 to convert text to Morse (telegraph) code. In the early 19070s my favorite futurist had begun development of omni-foint OCR systems but his primary interest was in text-to-speech and speech-to-text systems, primarily for the blind, an interest he perhaps developed during his collaboration with musician Stevie Wonder on the Kurzweil Synthesizer, a musical instrument still available in musical electronics stores. He spun off his speech-to-text technology and devotes his time and resources to pure AI - his most recent book was "How to Create a Mind". The question is poorly worded, using such phrases as "What kinds of features are effective...". Perhaps a good response would be: Those kinds of features which lead to an effective solution to the problem.
Is it online or offline handwritten data? If it is online, we have found that a combination of the sequence of x, y coordinates of the resampled points and the Fourier descriptor is the best feature. See our paper: A. G. Ramakrishnan and Bhargava Urala, “Global and local features for recognition of online handwritten numerals and Tamil characters,” ACM - Proc. International Workshop on Multilingual OCR, (MOCR 2013), 24 Aug. 2013, Washington DC, USA. You can download it from my website at: http://mile.ee.iisc.ernet.in/mile/publications/DocumentAnalysisRecognition.html
Don't use features such as loops, intersections, etc. because they are highly variant to noise and writing styles.
If it is offline handwritten characters, use 2-D DCT of the normalized images. Transform features are very robust, including KLT or PCA. Use SVM classifier with RBF kernel. That is the best. It gives you better results than neural networks, HMM and deep learning approaches.
Any Character Recognition system either uses Transform Domain features with classifiers like ANN, SVM, Quadratic etc or structural features with knowledge driven classifier (Rule base, Decision Tree etc) . first type of system gives very high accuracy on printed characters, but there are normally biased towards the dataset they learn from (i.e. a specific type of Font and/or script). however, handwritten recognition has much variations intern of style, thickness, boundary noise etc. I can relate this problem with speech recognition, where same word spoken by different speaker will be different, similarly, the same character/word written by different writer will be different. How to Design such system? again we can take help from speech recognition domain, like go to primitive levels identify phonemes, and then incrementally build complex sound from phonemes. Similarly, the system that identify character or world as single unit may not work, but we have to divide character into strokes (and may be sub-stroke level) and then incrementally build complex characters, we have recently proposed one such method for printed Gujarati(one of the popular Indian language) characters. but need many improvement before it can be applied to handwritten. we are still working on improving the performance. you can also look at some of the existing work on similar line of action as listed below.
1)V Nguyen, M Blumenstein, Techniques for static handwriting trajectory recovery: a survey
Proceedings of the 9th IAPR International …, 2010 - dl.acm.org
2)M Liwicki, Y Akira, S Uchida, M Iwamura, Reliable online stroke recovery from offline data with the data-embedding pen ICDAR 2011
3)A Hassaïne, S Al Maadeed, Competition on Handwriting Stroke Recovery from Offline Data
ICDAR 2013
4) Viard-Gaudin, C., P.-M. Lallican, and S. Knerr, Recognitiondirected
recovering of temporal information from handwriting images. Pattern Recognition Letters, 2005. 26: p. 2537-2548.
5) Bunke, H., et al., Recovery of temporal information of
cursively handwritten words for on-line recognition, in 4th
ICDAR. 1997, IEEE Press: Ulm, Germany. p. 931-935 vol.2.
6) Nel, E.M., J.A. du Preez, and B.M. Herbst, Estimating the pen trajectories of static signatures using hidden Markov models. PAMI, IEEE Transactions on, 2005. 27(11): p. 1733-
1746.
7) Kato, Y. and M. Yasuhara, Recovery of drawing order from single-stroke handwriting images. PAMI, IEEE Transactions on, 2000. 22(9): p. 938-949.
8) Lau, K.K., P.C. Yuen, and Y.Y. Tang, Stroke extraction and stroke sequence estimation on signatures, in 16th ICPR. 2002,IEEE Computer Society: Quebec, Canada. p. 119-122.
9) Niels, R. and L. Vuurpijl, Automatic Trajectory Extraction
and Validation of Scanned Handwritten Characters, in 10th
IWFHR. 2006, IRISA/INSA - Campus Universitaire de
Beaulieu: La Baule, France.
10)Stroke extraction based on ambiguous zone detection: a preprocessing step to recover dynamic information from handwritten Chinese characters
Z Su, Z Cao, Y Wang - International Journal on Document Analysis and …, 2009 – Springer
On character normalization and feature extraction, please read the paper below for the most effective techniques, some of which are still competitive now.
C.-L. Liu, K. Nakashima, H. Sako, H. Fujisawa, Handwritten digit recognition: Investigation of normalization and feature extraction techniques, Pattern Recognition, 37(2): 265-279, 2004.
The handwritten recognition task is unanimously a difficult task: therefore several parameters & features, extracted form the source signal, are needed in order to reduce character recognition errors. The more parameters you use, the less errors you do; but at the expenses of more difficult processing algorithms, greater CPU time consumption, larger memory occupation.
Something that has worked for me in the past has been to downscale or upscale the images to a predetermined size (e.g. 20x20 pixels) using a Gaussian pyramid. Then converting to a binary image using a meaningful threshold. After that, the input feature vector for the NN is simply the unrolled image itself.
Consider Direction element features or DEF features. These are zone based directional statistics. These are good to start. Search on DEF you will get many papers.
Neural networks are robust where the data is noisy etc. But it can be difficult for them to do their job unless the features can be extracted effectively. I recommend you look for papers and tutorials on pre-processing the input data as well as experimenting with network architecture. Don't assume higher resolution data is the solution. Some times data reduction will improve the situation. Have you checked out histogram approaches to de-skewing and character segmentation?
Normalize the image, divide it into quadrants and then take DCT or Harr based DWT on each of them and concatenate. Transform based features are always simple to extract and very effective. You can augment these with DEF, etc.
Normalize the characters, center them and just go with the intensity values, and you will get state-of-the-art results. Look for the work of Dan Ciresan (deep convolusional network) or Yann LeCun (original convolusional network, the so-called LeNet5).