I am exploring areas to research in reinforcement learning and apply in the insurance industry domain. I am looking to enhance actor-critic algorithms. Am I going in the right direction? What are the current research areas in reinforcement learning?
Yes, you are definetely going into the right direction, since actor critic algorithms combine both value-based and policy based algorithm in a new kind, which has the best of both worlds.
Picking the right algorithm however might depend on the specific RL problem you are trying to solve, and the kind of actions you want to take. You can refer to this library https://stable-baselines3.readthedocs.io/en/master/guide/algos.html to get more insights. That is, to my knowledge, the easiest way to deploy and train a RL algorithm.
You might also look at multi-agent RL, where more (intelligent) agents take actions into the same world, and can cooperate or cooperate to achieve some global or personal goal. Here, each agent could be deployed with single-agent algorithm. However, these are usually more complex scenario, in which game theory can be also considered for studying the behaviours of the agents
I suggest you to first follow the gym standard to implement the gym environment. Then, you can directly deploy a stable-baselines3 algorithm (I would try PPO first) into your environment (if the environment has a single agent), or convert your gym environment into a petting-zoo environment (https://www.pettingzoo.ml/), which makes it easy to let multiple agents execute actions and get their respective reward, without having to change the original environment code too much.
These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information pro- vided by the critic.In the REINFORCE algorithm with state value function as a baseline, we use return ( total reward) as our target but in the ACTOR-CRITIC algorithm, we use the bootstrapping estimate as our target.