3 Questions 4 Answers 0 Followers
Questions related from Gavin Rens
Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is...
07 April 2019 4,955 2 View
When the reward function is defined as R(a,s), the value function is defined as max_{a in A} (rho(a,b) + ...), where A is the set of actions, b is the current belief state and rho(a,b) is the...
06 May 2016 9,631 3 View
I'm working with discrete distributions, and i have only one constraint (over the atomic events) on the posterior distribution. In particular, i have a prior distribution P over four atomic events...
26 January 2016 6,807 2 View