I'm working on tesseract-ocr. I want to know how line finding is done in tesseract. of course comments is given in code but I'm not getting it. Can anyone suggest me any documents or any good algorithms for line finding?
Linear Hough transform algorithm uses an accumulator, which is a two-dimensional array to discover the presence of a line represented by: r = x cos b+ y sin b . The Hough transform algorithm decides if there is sufficient indication of a straight line at each pixel at (x,y) and its neighbourhood. If so it calculates (r,b) of that line.
I have tried to segment the reflected laser lines from the arc light modified background. I am not sure if your situation is similar. I hope my paper could help:)
Z.Z. Wang, Monitoring of GMAW Weld Pool From the Reflected Laser Lines for Real-Time Control, IEEE T IND INFORM, 10 (4), pp. 2073-2083, 2014
Tessearct doesn't uses hough as mentioned on other answers. Take a look at the original paper on the Section 3. Also, take a look at Leptonica library, which is extensively used by Tesseract OCR.
Conference Paper An Overview of the Tesseract OCR Engine