Our assumption:
We assume that data-driven models are superior to physically-based and conceptual models.
Our question:
Are data-driven approaches “a good/the best guess” to benchmark our model and the used input data, respectively?
Here is an example to illustrate the problem:
We have a physically based hydrological model which simulates the components of the water balance and whose parameters are physically meaningful and therefore are based on observations.
Our model simulates for an observed catchment the discharge. We are getting an NSE of 0.1 in validation. Normally this performance would be judged as very critical. Alternatively running a data-driven approach, with the same meteorological input results only in a slightly better NSE of 0.13.
Under our assumption of data-driven models superiority we than can concluded:
1. Our physical based model represents the processes quite well
2. The model parameters are carefully chosen/observed
3. The meteorological input data is not sufficient to achieve a proper discharge and vice versa