"Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth" by James E. Cutting and Peter M. Vishton, enumerates and discusses the nine cues in great detail. The use of 3D depth maps has shown improvement in object segmentation and recognition, so perhaps computer vision has just not attempted the same level of sensor fusion that is necessary for scene comprehension similar to human capability. Cameras operate at 100 megapixels today (similar to rod and cone count in the human eye) and compute capability is rapidly advancing, so it would seem advancement could be achieved in more sophisticated intelligent vision using more cues.