03 November 2022 3 2K Report

Hello,

For most learning algorithms, we make I.I.D. (Independent Identical Distribution) assumption for the dataset. This assumption is both reasonable and useful (https://ai.stackexchange.com/questions/10839/why-exactly-do-neural-networks-require-i-i-d-data). In deep RL, we learn from the experience tuples of (s_t, a_r, r, s_(t+1)). During training, these tuples are sampled in batches. Are we making I.I.D. assumption at this step? If yes, how do we defend it because clearly, there is a system dynamics that control transitions? Any discussion/pointer on the topic is much appreciated.

Thanks!

Similar questions and discussions