First of all, you seem to have adopted a global binarisation algorithm which does not take into account the over-saturated area around the bottom-right corner.
Second, I assume you are interested in extracting features from those Sanskrit symbols. For each ROI (a symbol) you can extract hundreds of spatial and frequency-based features. For instance in the spatial domain you can run after the following features: Area, Eccentricity, EulerNumber, Perimeter, Solidity
Vertical and horizontal projections' signature, Pseudo-Zernike moments, etc.
Thanks Abbas, the features which are are mentioned are GLCM features and Zernike moments. can you please give any stuff regarding the Zernike moments which i can study and apply to my dataset.
The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013). ArabicCharacter Recognition System Development. Iping Supriana.
A. Tahmasbi, F. Saki, S. B. Shokouhi, Classification of Benign and Malignant Masses Based on Zernike Moments, Comput. Biol. Med., vol. 41, no. 8, pp. 726-735, 2011
hey, Zernike Moment are good descriptors in some cases, i used them in microscopy (not as feature descriptors per say) and using them now for SLAM with satisfactory results. see my publication "3d depth variant psf analysis and interpolation using zernike moments" where you can find how to calculate them. in your case a vector of n moment modulus is enough.
Well, that all boils down to the kind of task you are undertaking. You might need to resort to more sophisticated shape descriptors if the task at hand is deemed complex.
If you can achieve your goal with a simple algorithm just go for it.