I have a coin with unknown probability of head. We are tossing it several times and at every state, we are estimating the probability. What should be optimal stopping policy? We can consider some reward or payoff structure. Please help me giving some well defiined reward or payoff structure.
For reference and better understanding of my question, I am attaching the paper which consists of the examples with known transition probability.