I answer your question from the point of view of whether the question itself is reasonable or not.
I believe that expressivity and generalization ability may not be sufficient to fully determine model performance. A typical example is in highly imbalanced binary classification, such as when the ratio of positive to negative samples is 99:1. Suppose a model predicts all inputs as positive; in this case, the overall accuracy of the model is 99%, and the generalization error is 0, but clearly, the model is ineffective.