I am solving a problem for which I have to select best possible server(level 1) to hit for a given data. These server(level 1) in turn hit some other servers(level 2) to complete the request. The level 1 servers have the same set of level 2 servers integrated with them. For a particular request I am getting success or failure as response.

For this I am using Thompson Sampling with Bernoulli prior. On success I am considering reward as 1 and for failure it is 0. But in case of failure I am receiving error as well. In some error it is evident that the error is due to some issue at server(level 1) end and hence reward 0 makes sense but some error results from request data errors or issue at level 2 servers. For these kind of errors we cant penalize the level 1 servers with reward 0 nor can we reward them with value 1.

Currently I am using 0.5 as reward for such cases.

Exploring over Internet I couldn't find any method/algorithm to calculate the reward for such cases in a proper(informed) way.

What could be the possible way to calculate reward in such cases?

Similar questions and discussions