Hello everybody.

The reward is necessary to tell the machine ( agent ) which state-action pairs are good, and which are bad.

Please help me to understand the behavior of the discount factor or reward in terms of reinforcement learning.

What I don't understand is why the discounted reward is necessary? Why should it matter whether a good state is reached soon rather than later?

More Ali Amini Bagh's questions See All
Similar questions and discussions