How current is research on actor critic algorithms in 2022?

More Supriya Ss's questions See All

How can analyze the results in the cloudsim simulator?

I am looking for a way to analyze the results in the cloudsim simulator.

02 January 2021 1,192 2 View

Is there any criteria to differentiate between crystalline and amorphous peak in XRD for polymers ?

can anyone suggest me the book to know the answer to the above question? the graph has been attached

07 September 2020 7,512 4 View

How to calculate the P(0) in the RLS method with adaptive forgetting factor?

the Root least square method with forgetting factor for implementation in MATLAB i have an ARX model for parameter estimation. How to calculate the P(0) for this method ?

15 June 2020 2,168 2 View

Which standard method can be used for quantification of nitrobenzene in aqueous phase?

Nitrobenzene is sparingly soluble in water. I tried plotting its standard curve with various dilutions ranging from 0.1mM - 1mM, using HPLC with the method given in papers (Mobile phase: Methanol...

17 March 2020 5,306 3 View

Is there any good journals which publish fuzzy group papers without fee?

Please provide some free SCI/SCIE/SCOPUS indexed journal for fuzzy group.

30 December 2019 5,952 1 View

Where can i get the properties of exhaust gases at high pressure?

I want the properties of exhaust gases like viscosity, thermal conductivity, pradantal number at 4 bar pressure. Heat Mass data book mostly gives these values at 1 bar pressure.

06 February 2019 1,440 6 View

Medication Adherence Scale ?

As MMAS-8 is copyrighted, and the authors does not give permision to use it , is there any validated Medication Adherence Scale/ tool which is free to use in research or academic purposes?

11 September 2018 174 3 View

Does cooling process affect the morphology of carbonized material in pyrolysis?

We prepare carbon nanospheres from agricultural waste by pyrolysis. I have two questions: 1. Is there any evidence that explains formation of spherical shape during pyrolysis? 2. After pyrolysis,...

27 May 2018 4,960 3 View

Is there any method to quantify dispersion?

The carbon nanomaterial, we synthesized, is insoluble in most of the solvents. It forms suspension when dissolved. How can I quantify dispersion or how to determine the degree of dispersion in...

09 May 2018 6,399 7 View

I am presently working on DROUGHT TOLERANT MECHANISMS IN COTTON. What kind transcriptomics of work I should do in cotton?

I am working on Cotton drought tolerance. i am now doing the proteomics and metabolomics related things. What kind of new things I can do further that means any kind of transcriptomics or other...

04 February 2018 9,645 3 View

Why should we choose deep learning to machine learning? Why is CNN better for image classification? Can I get Typical Answer?

What Characteristics makes CNN work better?

03 March 2021 1,458 4 View

What are some of the research gaps in machine learning and artificial intelligence in africa?

i would to know some of the research gaps in the artificial intelligence field in most african countries.

03 March 2021 6,145 3 View

Anyone plz suggest relevant topic for research in image processing+Deep learning.?

I have selected brain tumor images ...but now found that already lots of research done n this topic.

03 March 2021 5,774 3 View

How to increase a model accuracy ?

dear community, my model is based feature extraction from non stationary signals using discrete Wavelet Transform and then using statistical features then machine learning classifiers in order to...

03 March 2021 6,994 5 View

Can you share me how teacher education is managed in your respective national educational system?

I feel that the practice in teacher education in my country is below the expected performance level due to very poor management system. Hope I will learn something from your experiences.

02 March 2021 1,516 4 View

Can NFL theorem be valid in infinite search space in ML?

NFL theorem is valid for algorithms training in fixed training set. However, the general characteristic of algorithms in expanded or open dataset has not been proved yet. Could you show your...

01 March 2021 1,189 3 View

What is the difference between L1 and L2 regularization in machine learning?

L1 and L2 regularization

28 February 2021 4,187 3 View

How to eliminate Steady state error from Reinforcement learning genetic algorithm code?

The following code (see 1st 2 images attached) is used to produce PID controller values that are designed to control the system (G). The code finds the PID controller values (noted as k) by using...

28 February 2021 6,560 14 View

Is there any Machine Learning/AI algorithm that lets you predict with incomplete features without missing value imputation process?

For instance, the model is trained with A,B,C,D,E features to predict F target. Then, I want to make prediction on a new data, but with only A,C,D,E is known without doing missing value imputation...

28 February 2021 1,556 6 View

How can I represent SDS-PAGE results for machine learning analysis?

I am trying to classify and analyze the results of an SDS-PAGE based array for bacterial detection using machine learning, but I have trouble finding the best way to represent the results with...

27 February 2021 9,176 3 View

Raoul Raftopoulos

Hello Supriya Ss !

Yes, you are definetely going into the right direction, since actor critic algorithms combine both value-based and policy based algorithm in a new kind, which has the best of both worlds.

Picking the right algorithm however might depend on the specific RL problem you are trying to solve, and the kind of actions you want to take. You can refer to this library https://stable-baselines3.readthedocs.io/en/master/guide/algos.html to get more insights. That is, to my knowledge, the easiest way to deploy and train a RL algorithm.

You might also look at multi-agent RL, where more (intelligent) agents take actions into the same world, and can cooperate or cooperate to achieve some global or personal goal. Here, each agent could be deployed with single-agent algorithm. However, these are usually more complex scenario, in which game theory can be also considered for studying the behaviours of the agents

I suggest you to first follow the gym standard to implement the gym environment. Then, you can directly deploy a stable-baselines3 algorithm (I would try PPO first) into your environment (if the environment has a single agent), or convert your gym environment into a petting-zoo environment (https://www.pettingzoo.ml/), which makes it easy to let multiple agents execute actions and get their respective reward, without having to change the original environment code too much.

I hope I've been helpful,

Raoul

C K Gomathy

hi,

These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information pro- vided by the critic.In the REINFORCE algorithm with state value function as a baseline, we use return ( total reward) as our target but in the ACTOR-CRITIC algorithm, we use the bootstrapping estimate as our target.

For more info:

Preprint Zeroth-Order Actor-Critic

Best wishes..