Generally we use the Yahoo ads dataset to evaluate Bandit Algorithms. Here the reward is a scalar value.

Are there similar real datasets to evaluate a multi-objective multi-arm bandit algorithm, which assumes a reward vector instead of a scalar value.

Or can you advice me on how I can change the Yahoo ads dataset for multi-objective optimisation.

Thanks.

More Nirandika Wanigasekara's questions See All
Similar questions and discussions