Hello everybody.
The reward is necessary to tell the machine ( agent ) which state-action pairs are good, and which are bad.
Please help me to understand the behavior of the discount factor or reward in terms of reinforcement learning.
What I don't understand is why the discounted reward is necessary? Why should it matter whether a good state is reached soon rather than later?