This is a good question with many possible answers.
The question of saliency--What is considered important in an image?--is both a philosophical as well as a computational question. This philosophical question has no obvious answer. By contrast, there are quite a number of reasonable computational methods used to determine saliency in images.
A good overview of saliency metrics is given in
P. Sharma, Towards three-dimensional visual saliency, Ph.D. thesis, Norwegian University of Science and Technology, 2014:
See Section 2.6, starting on page 20, on saliency evaluation metrics:
AUC: area under characteristic curve.
Chance adjusted salience: difference between mean saliency (what is considered important) values of two sets of image regions.
18 percentile measure: saliency maps are thresholded to the top 20 % of salient image locations.
Kullback Lieblier divergence: logarithmic difference between two probability distributions.
Normalized scan-path saliency: mean of the saliency values for fixated regions.
Pearson correlation coefficient: measure linear dependence between two variables (this approach checks whether a perfect prediction of fixated regions by a saliency model).
Ratio of medians: saliency value of a fixated point calculated as the maximum of the saliency values in a circular diameter with the fixated point as the centre.
String editing distance: fixations and saliency values are clustered using clustering methods such as means.
This is a good question with many possible answers.
The question of saliency--What is considered important in an image?--is both a philosophical as well as a computational question. This philosophical question has no obvious answer. By contrast, there are quite a number of reasonable computational methods used to determine saliency in images.
A good overview of saliency metrics is given in
P. Sharma, Towards three-dimensional visual saliency, Ph.D. thesis, Norwegian University of Science and Technology, 2014:
See Section 2.6, starting on page 20, on saliency evaluation metrics:
AUC: area under characteristic curve.
Chance adjusted salience: difference between mean saliency (what is considered important) values of two sets of image regions.
18 percentile measure: saliency maps are thresholded to the top 20 % of salient image locations.
Kullback Lieblier divergence: logarithmic difference between two probability distributions.
Normalized scan-path saliency: mean of the saliency values for fixated regions.
Pearson correlation coefficient: measure linear dependence between two variables (this approach checks whether a perfect prediction of fixated regions by a saliency model).
Ratio of medians: saliency value of a fixated point calculated as the maximum of the saliency values in a circular diameter with the fixated point as the centre.
String editing distance: fixations and saliency values are clustered using clustering methods such as means.
Prof. James F Peters thanks for your valuable suggestion, But after going through above thesis still answer is not complete.
In one of the publication Dr P. Sharma told, after operation on visual saliency we will get output in form of 2-D gray-scale, but he has not discussed about input.
So still my question in pending:
Can we detect visual saliency on gray scale image/video?
I appreciate Professor James' reference for my own purposes.Thanks Professor.
But to your question: When you say "detect" visual saliency, presumably you mean to measure visual saliency, as a metric for the visual detection by a human subject of an object or feature in a still image of moving video.
Gray-scale is of course a subset of the color spectrum and it is a significant part of human vision. So metrics of visual saliency designed for color should work directly for gray-scale. Any human-vision saliency metric intended for colour that did not also work for gray-scale would be of very questionable validity or generality, until proven by empirical tests with human subjects. The values of a saliency metric will of course generally differ between color and gray scale images of the same scene, because saliency generally changes between the two.
For human vision saliency metrics designed expressly for gray-scale (or monochromatic) images, you might search the open literature in military defense, for human visual object or target recognition in Forward-Looking Infrared (FLIR) images (sorry I have not got references immediately on hand), where extensive work on saliency (depending on human vision sensitivity to contrast, resolution, speckle, shape, and clutter) has been carried out. Medical literature on x-ray imagery might also be considered.
If you are thinking of saliency for computer vision and not human vision, then it must be remembered that computers and saliency metrics know nothing about colour perception as humans perceive colour. Colour is simply a three-dimensional intensity vector, such as (red, green, blue), and gray-scale is simple a one-dimensional intensity (0,0,grey), assigned to each pixel in an image. Any metric designed for color can therefore be applied straightforwardly to gray-scale, again with different values of saliency generally resulting for color and gray-scale versions of the same scene, because the salience is generally different between the two, even for computers.
Image processing is a research domain that is at once too easy and too hard.
(1) Too easy because (as you will find) any reasonable method of image processing performs reasonably well. This leads to endless, open-ended, iterative modification of algorithms, from innovation to innovation in order to make them “better”, which tends never to stop in a conclusive way because you can always imagine ways to them still better. I tell new researchers that they have to pick their favorite algorithmic approach, and run with it against a good data set to the point of exhaustion and boredom. It usually takes about 18 months. At the point that they are ready to abandon their pet algorithm in discouragement, they are finally ready to begin working more productively on image processing proper – for the sake of demonstrable results in a given application, rather than out of love and commitment to a pet image-processing algorithm. (The longest hold-outs seem to be the firm believers in Bayesian algorithms of one kind or another, who take a very long time to finally appreciate that the inherent weaknesses of Bayesian methods are pretty much equivalent to those in any other plausible algorithm.)
(2) Too hard because the use and acceptance of image processing in relevant real-world applications proves to be surprisingly challenging, especially for uncertain and critical (high-cost-of-error) applications. For if there is any uncertainty in the “correct” solution of image processing for scene content (as there generally is, owing to imperfect image quality, occlusions, changing perspective, etc.), then it means that any algorithm will be expected to make mistakes in its processing. And the foremost question, almost never addressed by the algorithm developer, is “How are we to use an algorithm that we know will make mistakes?” The more critical the application (security & safety, medicine), the more important and necessary it is to have a good answer to this question. When is it justified to bring admittedly imperfect algorithms into critical real-world operations? To answer these questions, you need to have a very good understanding about the mode of operation, purpose, work-load, and the costs and risks, of the prospective users of the image processing. This drags you into an entirely different domain than image processing as such; into security & safety operations, medical diagnosis, human-machine trust & reliance, etc…. You typically need to work with an applied research team that spans several disciplines, otherwise, no matter how academically dazzling and impressive the image processing algorithms may be, they will remain unused in practice.
When looking for application domains, I think you generally want:
(1) A topic for which large sets of imagery are readily available as low cost. Otherwise you will have to launch a programme of image acquisition in order to develop (train) and prove your algorithms. Little progress can be made without large image data sets. The staging and collection of many image sets can be very costly and time consuming. (Sometimes the collection and publication of image sets is more valuable to a research & development community, and warrants a PhD in its own right.)
(2) A topic for which you can find strong connections with one or more prospective users of the image processing in real-world applications, to make sure that you have a good plan for transition of your work into applications. Ideally this means partnering with one or more users in competitive proposals for project funding.
You might consider applications in security (situation understanding, face tracking, object tracking anomalous behaviour detection, in video images with multiple cameras), or in medicine, or in robotics, especially driverless cars --- all current, active, fundable topics.
I suggest you to check some old papers about visual salience models, like this pioneering one:
L. Itti, C. Koch, E. Niebur, A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 11, pp. 1254-1259, Nov 1998.
There are many features of salience that are meaningful in gray scale.
Saliency has been partly used in image quality metrics. I propose to look at https://users.soe.ucsc.edu/~milanfar/publications/journal/JoV_Submitted.pdf and http://mmi.tudelft.nl/pub/hantao/VPQM2010.pdf
We have carried out some initial experiments with eye tracking in grayscale images but have not completed and published this work. You can see fixations points in three images (noisy, filtered and noise-free) when the task was to compare the noisy and filtered (what is closer to the noise-free), see attached file. From this experiment You will understand that it is a difficult task to predict saliency.
Mr. Lukin's work is very interesting. It seems to provide objective qualitative ranking of the subjective impression that image filtering has on image quality, which can be very important for many applications of image filtering, and perhaps image compression.
But do the results address saliency? I define saliency as a measure of image quality that correlates with the expected success (probability), or level of effort (expected time) required, in a given target or feature detection or classification task --- tasks such as finding the distant army tank in the forest scene (FLIR), or the tumor in medical imagery (MRI), or oil-spill at sea from texture and contrast in satellite imagery (SAR).
One could imagine a form of image processing (like filtering) that produces images that score poorly in the Lukin tests, but which would nevertheless enhance the saliency of given targets or features for human vision, improving the likelihood of success and speed at finding targets or features.
I have no doubt bent the definition of saliency for my own purposes. Is there an accepted definition of saliency?
Hae Jong Seo and Peyman Milanfar in the paper "Static and Space-time Visual Saliency Detection by Self-Resemblance" give the following definition: "In general, saliency is defined as what drives human perceptual attention". And I agree with this definition. We analyzed a particular task and were interested how a person analyzes three images (what points or image fragments draw attention) trying to undertake a decision is filtering good enough or not. It's a specific task and my goal of sending the file was to show that it's difficult to predict saliency (in our application people usually pay attention to how well noise is suppressed in homogeneous image regions and how well edges and details are preserved by a considered filter.
Concerning our work on filtering, some results were presented in the paper V. Lukin, N. Ponomarenko, S. Krivenko, K. Egiazarian, J. Astola, Image Filter Effectiveness Characterization Based on HVS, Proceedings of the SPIE Conference Computational Imaging VI, Volume 6814, 12 p., 2008. It is available at my page in Researchgate. Besides, there are also some results in the paper N. Ponomarenko, S. Krivenko, K. Egiazarian, V. Lukin, J. Astola, Weighted mean square error for estimation of visual quality of image denoising methods, CD ROM Proceedings of VPQM, Scottsdale, USA, 2010, 5 p. It is also available.
However, in these works we used only very general aspects of saliency. And I fully agree with You that defivition of saliency can vary depending upon application. I have some knowledge in the applications You mentioned and agree with definition that You have provided.
A very simple technique for detection of saliency in grayscale images in Spectral Residual method. It's computationally very efficient. To be more specific, try Phase Fourier Transform which is also a very simple and fast technique. All you need to do is to adjust downsampling parameters according to size of objects that you want to get detected.
Interesting topic that is applicable across various aspects of digital image processing and analysis. I come from the remote sensing community; satellite, aerial, and shipborne imaging of the earth surface (bottom of the the ocean plus other planets included). Typically multispectral imaging is better than panchromatic (grey / B&W), while hyperspectral is kind of state of the art; often colors / spectral bands that are close to each other are very highly correlated and may or may not add new information but adds to the image analysis and processing that is required. There are applications where panchromatic is all that is needed and some cases where you need more spectral bands. In some cases all you have is panchromatic.
Depending on the application panchromatic bands in the visible portion of the spectrum may not do the job, however, panchromatic images collected in the micro wave (radar) or acoustic (sonar) portions will work better. This is because they respond to characteristics that are more applicable to the features of interest. For example, a panchromatic image in the visible portion of the spectrum will measure the brightness level of a given pixel, while a radar or sonar image will be impacted the most not by the visible brightness but the surface roughness / texture and density of the feature.
So grey images/video may be okay for some applications but not others, plus perhaps a 'grey image' in a different portion of the spectrum might be better.
I enjoyed reading all the responses and sorry if I am out in 'left field' because of my different background.
Dear colleagues, we have such an interesting situation when a simple and particular question has raised a considerably wider discussion of people who have different background and deal with different applications in image processing. Maybe, a proper time for such a discussion has come.
From messages and comments of discussion participants, it seems possible to make some preliminary conclusions:
1) The topic of visual saliency attracts attention from specialists of different fields;
2) All of them deal with image processing but with images of different nature (optical, radar, acoustic, hyperspectral represented in some way, medical which have not been mentioned yet but obviously are of interest as well) and for different applications;
3) A common thing is that all these images are subject to visual inspection (analysis); due to this, visual saliency is of value;
4) The differences consist in tasks solved using these images (looking at them for curiosity, inspection for scientific purposes, looking for special objects (their detection and recognition), diagnostics, qualitty assessment, etc.)
5) These differences lead to different definitions of saliency and, respectively, to different approaches to its characterization and analysis, and to peculiarities of saliency depending upon application.
Thus, opinions of people from other fields and inter-disciplinary use of existing approaches can be very helpful.
Definitely, Saliency can be detected on gray scale images, even in black and white images too. Before coming color video and color photography, people were watching B/W movies, photographs and human were automatically perceiving the salient location from that scene/photograph. Only the thing is the saliency model would be designed based on intensity values, texture, orientation etc.