Does anybody know if a transition probability plays a role in Q-learning?

More Negin Malekian's questions See All

Are you intersted in nanoplastic detection?

to be involved in our research please massage me here

15 May 2024 8,998 2 View

Can the Thermosensitive Chitosan–Glycerol Phosphate Hydrogels be liquefied at 23°C, after being exposed to 37°C?

Hello I intended to make a Thermosensitive chitosan-beta-glycerol phosphate hydrogel, which was based on the protocol of the article (Heat-sensitive electroconductive gold nanoparticle-chitosan...

21 March 2024 4,162 0 View

Does anybody know a Model-Based System Engineering real case study in Automotive Driving?

I need to re-implement a case study in automotive systems (recommended), on "APP4MC" platform, with a reasonable level of complexity in hardware and software characteristics. If anybody knows an...

24 November 2023 7,941 2 View

Accuracy of Climate Models in predicting Extreme Rainfall?

I'd like to make sure if a GCM has enough accuracy in predicting Long-term Precipitation for a region, does it have the same accuracy in predicting the Extreme Rainfall (in the form of IDF curves,...

07 November 2023 9,785 4 View

Does anyone know how to simulate with DEFORM 2D software?

Hi, I am simulating a die and its workpiece in DEFORM 2D software, and at the step of determining and applying temperature and pressure, I encountered the problem that temperature and pressure are...

26 October 2023 7,350 1 View

How to work on the interface of COMSOL and YADE in UBUNTU?

Hello I have a question, I want to work on ICY which is the interface between COMSOL and YADE. But when I want to run COMSOL from terminal, and I wrote "comsol mphserver" on teminal, I get this...

27 January 2023 5,892 4 View

Why in differential scanning calorimetry thermograms the heat capacity(cp)in crese and after that it decrease?

Why in differential scanning calorimetry thermograms the heat capacity(cp)in crese and after that it decrease. I attach the overall photo. I want to know it generally not for a particular case...

15 January 2023 2,099 5 View

Is Mapillary SDK Object Detection works properly?

Hi, I am using Mapillary mly.interface.get_detections_with_image_id interface from the SDK, while it seems it is not detecting objects correctly by the image-ids for many images. Does anyone...

10 December 2022 7,924 3 View

Parse error on line 1 in file "rigid.pdbqt": Unknown or inappropriate tag ?

I receive this error when I try to do virtual screening with vina for flexible docking. does anybody know what should I do?

15 March 2022 7,094 4 View

Why does ulceration happen in TRAMPC2 model mice?

Hello everyone I inoculated C57BL/6 mice with TRAMPC2. I saw ulceration after 32 days of inoculation. anyone knows why it happens? except of injection manner, is there another factor to make this...

22 January 2022 2,596 3 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

How to Compress Information Neurally?

Samuel Morse, the inventor of the Morse Code, understood that certain letters in the English language occurred more frequently than others (Gallistel and King 2010). To deal with this, Morse used...

01 August 2024 4,456 2 View

Nathan D'Lima

Based on the transition taken by the current state some probabilistic reward is given, based on which the other state values are adjusted.

please chk http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf for more details

Marek Grzes

Given the current state, s, the agent does not have to guess its new state s'. It receives the new state s' from the environment. The thing that is missing is that you don't know what is the probability of moving from s to s' (given a particular action of course) even if you know that this transition s->s' is what has happened. Q-learning can learn without estimating transition probabilities. An alternative, model-based approach would be to learn those transition probabilities first, and then to solve the MDP (at that point you won't need Q-learning but you could still use it).

Fabrice Clerot

there is no magic :

you are in a state, you choose an action and you must have access to an "environment" which tells you in which state you are now and what your reward is

this "environment" may be a real system (duly equipped with sensors so as to determine the new state etc) or a computer simulation of a real system

and you do not need to know or evaluate the transition probabilities when using Q-learning

(only the environment know them, it does not tell, but Q learning allows you not to care !)

see

http://www.cs.ubc.ca/~kevinlb/teaching/cs322%20-%202007-8/Lectures/DT5.pdf

Negin Malekian

Thanks to all replies,

I became convinced that transition probabilities is not needed as an input of Q-learning algorithm. But, I think reward function is needed as an input of Q-learning. Isn't it? If not, how is the immediate reward computed in Q-learning ?

the reward (and the new state) is given to you by the "environment" ...

you are not doing q-learning "in abstracto", you are trying to optimize your actions relative to a given "system" (or "environment")

more concretely, say you want to solve a labyrinth problem ("gridworld") : of course you do not know the map of the labyrinth !

you are in a position, you have 4 actions (North, East, South, West) ; you choose an action, say North ; the system tells you your new state (depending of the transition probabilities, you might end up south of your current position ... commands might be noisy !) and gives you your reward, -1 if you are still in the labyrinth, +1000 if you have reached the way out

from that, you can increment your Q table and learn how to reach the way out as fast as possible

to sum up, there is an "environment" (a real system or a computer simulation) which implements the transition probabilities and rewards and you play with this environment so as to maximize your long term reward : when it is your move, you choose a state and an action, when it is the environment's move, it tells you the next state and your reward (and at this point you increment your Q table and go on for the next move)

Angel Martínez-Tenor

In reinforcement learning the lack of a model is solved by making observations from the environment.

When executing the Q-learning algorithm in a computer, you usually have a simulated environment that replaces the transition function, or even you can call to a transition function to obtain the state s' reached from the agent after executing action a from state s

In mobile robotics, when working with a real robot, the reached state s' is obtained by making observations from its sensors, previously mapped into states. No transition function is needed, the agent must learn from its environment; this is the key of reinforcement learning.

As to the Reward Function you do need to map (s,a,s') into rewards, since he task to learn by the agent is mostly defined by the obtained rewards. However, if you deal with many states, the definition of every Reward R(s,a,s') could be nonsensical.

When working with real robots, even with very few states, I find very useful to map the observations of the sensors directly into rewards with simple if else structures. For example, if the mobile robot is learning a wandering task, bumper collisions will get negative rewards, wheel encoders above a certain threshold without colliding will get the highest positive reward, etc. This is a simpler way of defining the Reward function with few lines of code and closer to the definition of the task to learn by the agent.