Generally we use the Yahoo ads dataset to evaluate Bandit Algorithms. Here the reward is a scalar value.
Are there similar real datasets to evaluate a multi-objective multi-arm bandit algorithm, which assumes a reward vector instead of a scalar value.
Or can you advice me on how I can change the Yahoo ads dataset for multi-objective optimisation.
Thanks.