I think that some topics to solve in Visual servoing based on image. In addition, the problems between eye-in-hand architecture and eye-to-hand, such as:
- To design Visual servoing systems without fiducial marks. Generally, the classic visual servoing systems use pattern to extract cartesian coordinates of features from image (centroids, moments, points-corner, contours, etc.).
- Sometimes, in the guided processes using visual servoing techniques, the features are partially occluded or they are lost because they are not visible from the camera while the robot is moving. In this case, the features should be regenerated. (looking for new features or returning to a previous iteration to recover the visual information).
- The trajectory is based in the image, and the 3d trajectory of the robot is not controlled. It is known until it occurs. Perhaps, this fact can induce to undesirable movements in the robot (collisions, inestability, etc.).
- How visual servoing from features in movement can be done? Generally, the visual servoing is used to guide the robot from a initial position to final position (desire). These positions are obtained from features obtained from two imagenes used as references. But if the features (scene) is moving while the robot is moving then the comparison between initial position and final position is difficult.
Matthias Rolf commented several interesting works.
In the field of visual servoing, I think, the major challenges or problems lie in the uncertainties and the specification of the desired motion in the image space.
1) The first fundamental problem comes from the requirement of building intelligence or adaptability into the robotic agent. Similar to human being, visual servoing should enable the successful working of robot in an unstructured environment.
2) The second problem mainly arises from the information loss of the imaging process of a real object, e.g., the image of a 3-dimensional object is typically 2-dimensional. In the existing literature, this is usually done by "teach and follow", which, however, is quite elementary, from the perspective of intelligence.
Hope the above two personal viewpoints could help. You may refer to the tutorial listed by Matthias Rolf for the in-depth research.