I am solving a real-world problem to make self adaptive decisions while using context.I am using reinforcement learning to address this problem but formulating a reward function is a big challenge for me. For instance:i need to consider human environment (context) and have to make decision based on user activities in building.Does the condition here reflects the continuous or discrete state/action space?