When evaluating a new method it is often necessary to test it against others. However I think it is often not clear what to test against. In cases where you only change an existing method/algorithm/program this is quite easy, simply measure against the method without modification to estimate how much better (in whichever terms) you are. But if the method is new, I am not so sure how to evaluate and what to pick as a baseline.

I see two possibilities:

1) Pick the current industry baseline. This is a simple method with which one can estimate the impact of the new method against a proven standard. However simultaneously the industry standard is often several years behind and might not be as good as other methods other researchers have already found. In that case the new method may not actually hold up in the long run.

2) Test against methods from other researchers. This might be hard if implementations are not readily available. But even if they are often there are many competing improvements, so that it is hard to decide which one to pick. Also one is testing against a method which has not proven itself, so it is unclear whether the baseline is realistic in the long run.

How do you go about picking a baseline?

Similar questions and discussions