I work with image compression. If you had a compression algorithm that worked well on images of interest and performed badly on noise images, what percentage would that be out of the whole 256^(512*512) image space.
It seems like a hard to quantify ratio, but it stands to reason that it would be quite small, since "interesting" images tend to mean "low-pass", "piece-wise smooth", or "regularly patterned," interesting images/textures would tend to be band-limited, exhibiting strong pixel-to-pixel correlations. I suppose the noise ones would just mean a more-or-less even distribution of energy across all frequencies (high-variance, pattern-less, high entropy).
If you start randomly choosing images, uniformly, from the entire, incredibly vast, image space, what kind of image would you get, on average? I mean, lets imagine a computer program which is drawing these images, without replacement, from the image space and displaying them on a screen. How long do you think it would take before we find an "interesting" image, one exhibiting some kind of band-limited nature? Quite a while. A very long while, in fact. This should be indicative of how few images exhibit low-entropy. If each pixel is a 256-sided die, so we have 512^2 of these dice, for a smooth or low-pass image to result, we would require that large numbers of these dice to land all on the same value. A rare occurrence. Any craps player can tell you that it is hard enough to get just two six-sided dice to land in a desired pattern.
Your question is akin to the old image of "an infinite number of monkeys hitting an infinite number of typewriters will, eventually, write the entire works of Shakespeare." Of course, we would all say that this would take a very long time (assuming a large but actually finite number of monkeys), and the reason why this is true is simply because the ratio of "the entire works of Shakespeare" to "all possible things monkeys could type" is so vastly, incredibly small, that a monkey-random-process is unlikely to produce it. The same is true for the image question. While the number of distinct, "interesting" images is quite large (some billions and billions of photographs have been taken to date, for instance, which could be mapped into the 256^(512x512) image space) the rest of that space is just so unfathomably huge and filled with junk. I think you might have better odds of shining an ideal laser pointer at a random angle from the center of the universe and having it hit an ant in Canada.
Carlos and Eric, thanks for the answers and link. To clarify, I am interested in what percentage of actual images of photographs of real world objects or landscapes, or space, or microscope images are of the total possible image space?
I work on improving wavelet image compression under uniform quantization using Genetic Algorithms to evolve wavelet like transforms where the mean squared error is lower for the evolved transform than the initial wavelet. I know that you can't have a perfect compressor that always makes things smaller, or you could just keep putting the output from the compressor back into the compressor and eventually end up with 1 bit, and that would give you either a 1 or a 0 that would have to decompress to your original image and also all images. So for some images a give compressor will make the image bigger filesize wise. Ideally you would want a compressor that did well on "interesting" images and not care if it made files larger for random high entropy images. I agree that most photographs will be low entropy, but are there many high entropy photo images? I know the classic training image Barbara has higher frequency changes in the scarf and table cloth. http://www.hlevkin.com/TestImages/barbara.bmp
I know a lot of compression techniques take advantage of low entropy, just was wondering if there were any studies or estimates of what percentage interesting images are of all possible images and if that can be used to improve compression even further.
Brendan, your assessment on redundant compressor filtration of a source image is really based on the type of compression algorithm that your are implementing. You seem to want a lossy algorithm (wavelet or the like), where you filter the noise pixels out and only retain the interesting pixels. In my experince, this varies greatly with the type of image you are working on. Photographs (casual not technical) tend to be high SNR images with a good deal of variation that is not noise. So it's not necessary to separate this out from detector (Johnson) or convolved image (Rician) noise. An number of compression schemes actually make use of entropic information not the lack of it. The Maximum Entropy method has been used in a number of schemes to enhance SNR or in your case compress data. This tends to be a reoccuring area of investigation. A simple application by David and Aboulnasr describes the technique http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=952896&contentType=Conference+Publications&sortType%3Dasc_p_Sequence%26filter%3DAND(p_IS_Number%3A20602)%26rowsPerPage%3D100
H. Morris, thanks for the answer and link. My question comes out of thinking about alternative ways to compress images. If we think of wanting a basis of vectors that spans a whole space, than you can figure out what vectors you need. But if you consider the whole space of 512 by 512 images with 256 grey levels, could you construct a basis with a small subset that didn't span the whole space, but spanned the space of probable photographs or images you would usually get from a camera where they could be zoomed in on or from a microscope. That led me to the question what percentage of all images are the photographs we see. Say I took a huge set of photograph images from Google what percentage would they be of all possible images. And maybe theoretically it isn't a good question, and "interest" is in the eye of the beholder, but if we could say access all images ever taken by digital cameras, that weren't deleted because they were blurry or the lens cap was on, and have them resized to 512 by 512 with 256 grayscale (this choice is arbitrary) what percentage of all 256^(512*512) possible images would it be?
That depends on the image of course! If your image is "interesting" say with textures (google barbara image processing for an interesting example) it would have more information than Lena for instance