I'm doing a study in image processing of capsule endoscopy images. Is there a way to know how many images should I have in the database to make the study representative?
this is a very difficult question, and I guess the ugly but practical answer is to get as much as possible/feasible s.t. they represent the situation that will be encountered during deployment as much as possible.
Since you ask about the optimal number of samples, the theoretically sound way is through statistical hypothesis testing. You have to decide what you would like to test, how you will evaluate it, what significance level you would like to achieve, and this can be used to compute the necessary number of samples. A good start is to read the Wikipedia article on "Statistical power", where your question is about the a priori analysis: https://en.wikipedia.org/wiki/Statistical_power#A_priori_vs._post_hoc_analysis
As you see in the examples detailed/linked from there, this all assumes that you can describe all this mathematically, for example comparing the means of two groups if the measurement values are normally distributed. This might be considerably harder for your problem. I assume you would like to test the quality of a method, for which you can define a measure, but it is difficult to tell what will be their distribution and what priors to use.
You can just assume it is normally distributed, and aim to test if it is better than a minimal value, given a p-value of at most 0.05 which can be solved by yourself or using many software tools (see the "Software for Power and Sample Size Calculations" section of the wiki article).
On the other hand, you have to be careful with thinking of this as some ultimate tool for judging the worth of a work.
What I wanted to say is that there are typically many assumptions in these analyses, and there should not be a blind hunt for p-values in science, the real impact is measured by the usefulness of your method, as the above linked paper argues.
In the mean time I also found this interesting article on Edge.org hope it helps :)