How can I have constraint on my actions in reinforcment learning?

More Ali Molavi's questions See All

How can we differentiate between calcite, dolomite, siderite, magnesite and ankerite minerals in carbonatite rocks in thin section under op microscop?

How can we differentiate between calcite, dolomite, siderite, magnesite and ankerite minerals in carbonatite rocks in thin section under optical microscope?

07 August 2024 2,132 3 View

Unusual intensity drop in some sections of chromatograms in DDA?

Hi, we have measured tryptic peptides using both DDA and DIA method on QExactive. In DDA replicates i saw unusual intensity drops occurring at the same sections of chromatograms in DDA replicates...

07 August 2024 3,218 4 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

What is the best sampling strategy?

I am conducting a qualitative study that uses interviews to investigate the perceptions of teachers about a particular leadership practice and I am focusing on 3 schools which have a total number...

01 August 2024 8,457 10 View

Looking for help on sem image analysis?

Hello I am conducting a microstructural analysis of a soil treated with lime. The following sem images are of the untreated s1 and treated soil s3. The untreated soil contains quartz calcite...

01 August 2024 572 0 View

What is Random Audit?

HI there, I've came across several articles discuss about random audit an Non random to tax evasion or compliance. Most of the articles is relating about effect of audit (random or non random)...

31 July 2024 5,309 7 View

Can we patent a process flow diagram developed using a process simulator but no actual cases is carried out?

Can we patent a process flow diagram developed using a process simulator but no actual cases is carried out? For example consider a process for certain product manufacture where a new process flow...

31 July 2024 781 1 View

How can we calculate the percentage of configuration interaction (CI) in the UV output data of the Gaussian program?

How can we calculate the percentage of configuration interaction (CI) in the UV output data of the Gaussian program? for example: Excited State 17: Singlet-A 5.1359 eV 241.41 nm...

28 July 2024 9,165 2 View

Please, what is the memory consumption of the Matlab function quad tree decomposition procedure [S = qtdecomp(I)] with respect to the input set I?

27 July 2024 5,455 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Reza Rezaiezadeh-Roukerd

I am not sure whether I understand your question properly or not. You may want to try implementing your constraints in the reward function. The goal is to avoid unpleasant scenarios in training by having less reward for those cases.

Jesús Pérez

In some cases, constraint actions are modelled by a precondition function. Maybe it help you Ali Molavi

Gerben Beintema

Control constraints are always quite tricky to implement in RL. In this case, you can have a change of variables to make the constraints easier to work with as

u1' = u1+u2

u2' = u1-u2

Such that the constraint read

-inf

Md Ferdous Pervej

While there should have several ways to deal with the constraint, in general, I think the following steps are helpful:

At first, I would try to restrict the action space (only consider the feasible action set).

If option 1 takes sufficiently large calculations/coding that appears to be impractical, as the complementary steps, I would have modeled the reward function as follows:

Whenever the RL agent chooses an action that violates the constraint (or leads some other parameter setting that violates any constraint), the reward function should return a huge negative reward.
This shall be sufficient to teach the agent that the chosen actions that lead to negative rewards should be avoided.

Ali Molavi

Thank you all guys. Your proposed methods are very interesting. I am trying to use your ideas to solve the problem.

Dear Ali Molavi , How did you solve the problem?

Ahmed Rabee Sayed

Hi Ali Molavi, I think there are three methods to solve your question:

1. adjust your reward function to penalize constraints violation by giving a huge negative penalty and/or stop the current episode. However, that might provide suboptimal actions if your reward factors are not well-scaled.
2. reformulate your action to handle your constraints, as Gerben Beintema answered above, i.e. use u1' & u2' instead of u1 & u2. this method is the best choice, however, it would be insufficient if the main actions u1 & u2 are bounded.
3. use constrained enforcement as discussed in https://www.mathworks.com/help/slcontrol/ug/constraintenforcement.html and some examples can be found. In brief, new action u should satisfy Bu

0 votes 0 thanks