I am using T-learner approach for uplift modeling, with two separate models for treatment and control. They predict the counterfactuals and compute the uplift as the difference between the two predictions. I went to know witch metrics capture the quality of the uplift estimates (evaluate the uplift predictions) ? especially with continuous outcomes, where the problem is that the true uplift is never observed for an individual (you can't have both treated and control outcomes for the same person), so traditional metrics like MSE on uplift aren't feasible, and metrics like AUUC, Qini are For binary outcomes (e.g., conversion/no conversion).