I am working on this topic and I would like to know whether other researchers performed the same experiments. Videos on past experiments (but not on object recognition) are available on http://www.youtube.com/guidobologna
We have done several experiments on visual to auditory sensory substitution (several types of tasks including object recognition) but we have not yet incorporated any neural networks into the system. What do you have in mind? Do you intent to do some intermediate-level classification which the user can incorporate with lower level information to then infer the objects? Or are you thinking of doing high-level classification?
Hello, thank you for your answers. In our team we are using machine learning models to perform object recognition. In practice, a camera captures images, then an object recognizer tells the user whether an object (cell phone, keys, etc ...) is in the central part of the image. Essentially, we are performing high level classification which is not perfect in accuracy, but seems to be useful to grasp objects lying on a desk. We performed experiments with blind people and the system is functional, but not completely efficient in terms of time to grasp an object.
You could also try the work: Bujacz M., Skulimowski P., Strumillo P., "Naviton — A Prototype Mobility Aid for Auditory Presentation of Three-Dimensional Scenes to the Visually Impaired", Journal of the Audio Engineering Society, vol. 60, no. 9, 2012 September, pp. 696-708. This team is working on analysis of the 3D image and transform it into sound.
Earlier works:
1) Strumillo P., Skulimowski P., Polańczyk M., "Programming Symbian Smartphones for the Blind and Visually Impaired", in E. Kącki, M. Rudnicki, J. Stempczyńska (Eds.): Computers in Medical Activity, Advances in Intelligent and Soft Computing, Springer 2009, pp. 129-136.
2) Strumillo P. "Electronic systems aiding spatial orientation and mobility of the visually impaired",In: Z.S. Hippe, J.L. Kulikowski, Y. Mroczek (Eds.) Human-Computer Systems Interaction. Backgrounds and Applications 2, Springer-Verlag Co., Advances in Soft Computing, 2011, pp. 373-386.
Thanks for your clarification Guido. That sounds very interesting. At the moment, we are concentrating mostly on providing low/intermediate level information. We are trying to maximize the amount of information converted whilst simultaneously maximizing interpretability. I have always assumed we would not be able to achieve high-level information with sufficient accuracy, speed and breadth. I'm glad to hear that your team is trying to address this gap.