A few years ago I participated in a project designed to develop a method for determining the position of the eyes in the image. Very good results are obtained using infrared light. It was possible to determine the position of the eye as well as the pupil of the eye.
Yes, there was an IBM project called "Blue Eyes" that did something similar. I believe they took images of the face in the visible range and IR range. The difference between the two images showed the eyes.
You can detect the eye using edge information. Below i have mentioned the sample paper for eye detection and they are using edge information for detection.
We tried a trained viola and jones for eyes. The area in which the eyes are searched is restricted to the one previously detected by the viola and jones for the face. It worked fine and we reached real time.
Dear Alreza, could you specify what accuracy of eye tracking you need? What tasks would you like to perform with real time eye tracking? For star I would suggest you to use simple image segmentation, you can create skin color model for simple face detection next you can use face geometry to estimate eye position, setup ROI or create mask to distinguish eye region and use binary threshold for finding the retina region. WIth longest line algorithm you can estimate pupil position. Of course you will get better result for gaze tracing with IR illumination. Please write specifically what are the assumptions so I could suggest you the most proper solution. Guys from ITU in Copenhagen developed free eye tracking software. I believe you can start with that.
we addressed this problem quite effectively by detecting ellipses (for eye-sockets) and circles (for irises) using Hough transform. We cannot publish the source code, but the algorithm is described in details here: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6168441&tag=1.
Unlike previous suggestions, personally I would not go for color image segmentation nor Hough transforms. Both methods may work in laboratory conditions, but in my humble opinion they do not work well in relatively unconstrained environments. On the one hand, it is very difficult to calibrate color information in changing scenarios (e.g. outdoors or different daylight times). On the other hand, Hough transform is very prone to local minima, very dependent on the success of edge detection techniques (that may strongly fail under blur, noise or clutter), and quite slow.
I definitely would apply a boosting algorithm over descriptors that are powerful enough, such as SURF or HOG, in order to obtain a cascade of classifiers. A typical Adaboost will do the trick. It is the usual approach for face detection, and the variability of non-positive examples is much lower in the case of eyes, as negative examples are simply "other parts of the face", instead of the more challenging "non-face images". I also recommend to implement a bootstrapping process after the selection of each filter, so that new false positives are harvested, and the cascade gets progressively more precise - that is the most awesome principle of boosting. You could also apply simpler descriptors like Haar of LBP, but in my experience this would expand the training time and the cascade size, and the results will not be better.
You can check recent improvements in this field for object detection, most notably SURF cascade (https://sites.google.com/site/leeplus/publications/learningsurfcascadeforfastandaccurateobjectdetection) and Soft Cascade. With less than 10 filters you will obtain very accurate, very fast and computationally inexpensive results for real-time implementations, and you will just require a simple webcam.
Finally, if you don't want to lose time implementing your own training algorithm (I don't recommend using the OpenCV cascade training routines), I also recommend some very nice open-source libraries that provide you fiducial landmarks, eyes being among them:
- Flandmark (fast and quite reliable)
- Stasm (more precise and many landmarks, but much slower)
- And of course, OpenCV itself has some trained LBP(or Haar?) eye cascades. (In my experience this is the least robust one, although very fast).
Carles Fernández thx for the interesting answer even if it is been a while ago. now with the evolution of OpenCV would you stick to the same answer or would you reconsider updating it? especially that OpenCV Cuda are dropping the cascade classifier and not supporting it anymore. how would you in 2020 detect eyes rapidly and precisely on OpenCV
what ist the size of the FoV in mm and px? what is the image rate? how fast shall the algorithm be? do you have a multicore architecture at your disposal? do you use IR illumination?
the answer to your question needs some constraints and specifications...
Saso Spasovski I am attempting to work at 60 fps per camera (I am attempting to use more than one camera!) using Nvidia GPU, at 1080p resolution, vertical fov is about 50° and horizontal fov is 60°. yes! I use IR illumination. the field of application is pupil detection (I thought that you might be interested for this info! )
What I meant was the size of the eye in the image... But let's see if I can contribute something useful.
60 fps is not too challenging and the usage of IR makes the pupil appear almost black and rich in contrast.
First I would utilze information that I can get almost for free. The specular reflex from the IR illumination probably generates a very bright blob close to the pupil center which is not very complicated to localize. Then I would start from this point to search for a circular structure (the pupillary rim) using the approach from J. Daugman (DOI: 10.1109/tcsvt.2003.818350 ). Under the given circumstances I found this approch easy to implement, fast and accurate enough. Since the Daugman algorithm in its original version searches for circular objects every deviation from this precondition results in an inaccuracy with regard to the pupil center and radius. So I used it just as a first guess since pupils can have an elliptical shape. In a 2nd step I scanned for high gradients in the vicinity of the calculated pupil edge and performed a least squares ellipse fit. But prior to that I removed outliers using a RANSAC based approach. Reaching 300 fps was no problem.
thanks for sharing your algorithm! 300 fps looks quite exciting! but I would rather reach something similar since I would use 3 to 4 cameras simultaneously (60*4= 240 fps). the whole face will be recorded then eye detection will be primarily performed. after that the equivalent of the Daugman method I use the Circular Hough transform for initial pupil detection. more accurate edge detection will also be achieved with ellipse fitting but instead of excluding outliers using RANSAC, I will determine a minimum number of points (enough to completely and comfortably define my ellipse) at subpixel level. my ellipse then will be accurately determined.
well, now we back to the original question of eye detection. I was using the haar feature trained on the cascade classifier provided on OpenCV. Unfortunately, the Cascade Classifier is not available any more on OpenCV Cuda at least for version 4.2 and later. thus my issue was how to detect eyes from faces with the mentioned library .
PS: the specular reflex won't be always available as the frames are not always straightforward since the patient is allowed perform moderate head rotations! attached is an example of IR image without specular reflex
Once I used OpenCV's findcontours() function for that purpose. After thresholding with a reasonable value the mentioned function returns a vector of contours. Every element of that vector, i.e. every contour that was found, was analyzed with regard to size and compactness (perimeter^2 / (4pi*area). All blobs not fulfilling a reasonable size criterion and being not compact or not circular enough were rejected. Perimeter and area of a contour were calculated using arcLength() and contourArea(). It was surprising how few blobs were circular, had the appropriate size and gray value. It helped a lot to know that there is a specular reflex close to the pupil and that we expect to find two pupils at a typical distance of approx 50mm.
This approach was not very fast. So I used two cores. The first analyzed the whole image at a rate of perhaps 10 fps while the 2nd core utilized the results from the first core by defining a smaller ROI around the pupil center position in the which findcontours() had much less to do. If head motion is not too fast then also an update of the ROI definition will not be required too often. If I remeber correctly, I reached something around 200 fps. Once the pupil was found, things get even faster since the center position was used to define the ROI in the following image...
oh, it is now that I see the baby's face! The quality is very beneficial and the pupils exhibit a decent blackness. Everything I said in the previous comment will work with this kind of images! I am surprised that there are no specular reflexes from the IR illumination. Do you use polarizers to remove them?
the reason why we don't have specular reflexion on the baby's face is that the camera sensor didn't detect any reflected rays and image was build only from scattering ones (at least in the eye region). you can imagine the incident IR ray coming from the camera will hit the eye at high incident angle. the reflected ray won't then back to the camera and no specular reflexion would be noticed.
back now to the find-contours method for OpenCV. I personally had to deal a lot with false positives such as the nose holes or the extremity of the mouth (on the side where the two lips meet). these could be be easily confused with pupils as they are dark and have comparable size with it. the second reason why I could also easily miss my pupil is that the edge detection method would miss a part of the pupil and give only part of circle that would fail the circularity test. something very similar to your algorithm worked fine with straight forward frames but gave me poor results when the infant start to move his head. I am even facing frontal frames where only one pupil can be detected. that is why I decided to detect first eye region with AI to migrate to less challenging milieu
Can you provide an image which you find too challenging for a "classic" approach without using AI or ML? I work in the area of refractive surgery where eye tracking is extensively used and where high rates are required (>500 fps). AI/ML methods usually are not fast enough so we had to come up with some tricks which might be applicable in your case. The major difference is that our FoV is just approx 25mm x 25mm. Regarding your false positives, I would recommend to take a look also at the histogram of the surrounding area of a pupil candidate. A fast cross correlation with a reference distribution could help to reject alien objects. I attached the histogram of the pupil region of the image you provided. It seems to be distinct enough to be used as an additional criterion for a pupil detector.
But if it is mandatory to use the face and eye detection functionality of a previous OpenCV version then I would consider to install both OpenCV versions and to create two seperate programs running in parallel. One program would use the old OpenCV version to just find the face and eye regions, while the other would perform the contour finding and the ellipse fit and all the other stuff you need. Both programs/processes could communicate and exchange data and results via shared memory, pipes, sockets or simply by files (preferably located in a ram disk).
attached are two cases: one of incomplete pupil edge, and other of occluded one. on the edge scale, I couldn't find any exclusivity between their edges and nose or other regions' ones. the cross-correlation of histograms is also a good criterion to solidify your assumption but unfortunately, it didn't work for me as eye region histograms vary (ROI catches eyelids and eye laches, in other words, we don't have always an ROI of pure pupil and sclera... ). as you said FOV of 25x25 mm is a major difference and things on that scale are way cleaner. I admit that on the moment that I decided to use ML tools I felt that I gave up the challenge to detect pupils with pure image processing tools
I did not yet completely digested the idea of building OpenCV 2 times and distribute my program between them but still a solution. however, I am checking currently PyTorch, hopefully, I can find an equivalent of my eye detector that would save my day. even though this may lead me to use DNNs ...
yeah, I am also learning this but occasionally. I would recommend Udacity but the paying courses at a bit expensive. I will take them when I can afford it.
those 3 images are some scenarios of occluded or partially closed eyes. the last one is the case when the nose hole is easily confused with the pupil. I remember that kept adding criteria to my algorithm until I lost generality, and I always face a new case that misleads my algorithm...
ha, there is my reflex... I see - it is a challenge. It reminds me on a project during which we had to detect and measure strabism in infant eyes. There we also used the face and eye detection functions of OpenCV (Python) before we were able to focus on pupil detection since speed was not an issue. And I remember that finding the pupil was still challenging due to shadows created by the nose, lashes and lids - just as in your case!!
But we had the enormous advantage of having a number of IR LEDs and thus also their corneal specular reflections. Just one of the LEDs was used for illumination and had a reflector while the other's task was just to create corneal landmarks which were utilized for finding the pupil. No matter what the gaze direction was, we almost always had al least one distinct specular reflection close to the pupil center. And as a by-product we get also the gaze from the distance of the Purkinje reflex centers to the pupil center - not very accurate but acceptable. And isn't it possible to use a second LED for illumination from above? It would diminish the shadows from the lower lid.
Hi, for me once the eye region is detected then the work is way more comfortable since the geometry of the pupil is exclusive in this new environement. I apply also image rectification and depth estimation to have circular pupil and accurate size repectively.
since it is basically a pupillometer then it has RBG LEDs and for camera illumination, we use IR light. sure we can integrate LEDs wherever but it should not be noticeable for the patient to avoid accommodation.
I didn't really undrestand how the illumination LED has a reflector and do you mean by corneal landmarks the specular reflex
I can share the sketch of our device, but I don't think here is the best place to do.
Yes, corneal landmarks = specular reflexes; imagine using one high power IR-LED (with a reflector-socket and optionally a lens to focus the light towards the face) and let's say a group of three low power IR-LEDs (without a reflector) arranged in a row with a distance of 5 cm between each other. The camera then will "see" all four specular reflexes on the cornea and nowhere else. So the algorithm should search for a pattern of four tiny bright spots - three of which have to be positioned nearly in a row. The subsequent processing can then be limited to a very small ROI.