I need to extract areas that contain any text from the given image which is more like a scene than a simple document. As the next step, I also intend to recognise the text.
In order to extract text regions I recomend you to read this paper about text detection on natural images: Epshtein, B.; Ofek, E.; Wexler, Y., "Detecting text in natural scenes with stroke width transform"
ccv is a computer vision library that has some great algoritms, and has an implementation of this method: http://libccv.org/doc/doc-swt/
In order to efficiently recognize the characters, I would compute the ratio of set and non-set pixels in each bounding box (see above discussion). By binning these ratios and knowing the language of the text, you could find the most frequently occurring characters. Then maybe determine the font and then do some template matching of the bins.
If the resolution is so low that characters are attached, you face a bigger problem...
However, an algorithm strongly depend on the images you have.
In order to extract text regions I recomend you to read this paper about text detection on natural images: Epshtein, B.; Ofek, E.; Wexler, Y., "Detecting text in natural scenes with stroke width transform"
ccv is a computer vision library that has some great algoritms, and has an implementation of this method: http://libccv.org/doc/doc-swt/
It's a main topic of "International Workshop on Camera-Based Document Analysis and Recognition" (CBDAR). You can find many related papers in the proceedings.
If your are interested in technical issues in detail, I recommend papers below.
(I'm sorry they are a little bit old."
Sameer Antani, Ullas Gargi, David Crandall, Tarak Gandhi, and Rangachar Kasturi, "Extraction of Text in Video," The Pennsylvania State University Technical Report, CSE-99-016, 1999
S. Antani, D. Crandall, A. Narasimhamurthy, V.Y. Mariano, and R. Kasturi, "Evaluation of Methods for Detection and Localization of Text in Video," Proc. the Fourth IAPR International Workshop on Document Analysis Systems, pp. 507-514, Dec. 2000
Datong Chen and Juergen Luettin, "Multiple Hypotheses Video OCR," Proc. the Fourth IAPR International Workshop on Document Analysis Systems, pp. 527-531, Dec. 2000
David Crandall and Rangachar Kasturi, "Robust Detection of Stylized Text Events in Digital Video," Pro. the Sixth International Conference on Document Analysis and Recognition, pp. 865-871, Sep. 2001
Nevenka Dimitrova, Lalitha Agnihotri, Chitra Dorai, Ruud M. Bolle, "MPEG-7 Videotext Description for Superimposed Text in Images and Video," IBM Research Re-port RC21524(97104)15Jul1999, 1999
Ullas Gargi, David Crandall, Sameer Antani, Tarak Gandhi, Ryan Keener, and Rangachar Kasturi, "A System for Automatic Text Detection in Video," Proc. the Fifth International Conference on Document Analysis and Recognition, pp. 29-32, Sep. 1999
A.K. Jain, and B. Yu, "Automatic Text Location in Images and Video Frames," Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998
Hae-Kwang Kim, "Efficient Automatic Text Location Method and Content-Based Indexing and Structuring of Video Database," Journal of Visual Communication and Image Representation, Vol. 7, No. 4, pp. 336-344, Dec. 1996
Huiping Li, David Doermann, Omid Kia, "Automatic Text Detection and Tracking in Digital Video," Technical Report of University of Maryland, LAMP-TR-028, CAR-TR-900, CS-TR-3962, Dec. 1998
Rainer Lienhart and Frank Stuber, "Automatic Text Recognition in Digital Videos," Proc. IS&T/SPIE2000 : Image and Video Processing IV, pp. 180-188, Jan. 1996
Prem Natarajan, Baback Elmieh, Richard Schwartz, and John Makhoul, "Videotext OCR using Hidden Markov Models," Proc. the Sixth International Conference on Document Analysis and Recognition, pp. 947-951, Sep. 2001
Check this link https://ritdml.rit.edu/bitstream/handle/1850/4485/SSharmaThesis-2007.pdf?sequence=1. This contain two type of text extraction in the natural images and it also contain matlab code.