What are some real life examples of dynamics in RL?

More Soumia Mehimeh's questions See All

Is the peer-reviewed publication "MedieKultur: Journal of Media and Communication Research" (E-ISSN 1901-9726, P-ISSN : 0900-9671) a legitima?

Is the peer-reviewed publication "MedieKultur: Journal of Media and Communication Research" (ISSN Online: 1901-9726, ISSN Print: 0900-9671) a legitimate and credible scholarly journal in the field...

01 August 2024 629 3 View

The Credit Mobilier Scandal in American history ?

I need information about this topic

01 February 2024 4,749 0 View

Programme tassili 2024 -doctorant en france et en Algérie ..... ?

Bonjour , je viens de lire une annonce sur le site de campus France , ils ont lancer le programme appelé Tassili 2024, c'est une collaboration entre les deux pays France et Algérie . D'aprés ce...

04 September 2023 4,413 0 View

Who can help me get a scale for remembering primary education students ?

Remembering scale

21 June 2023 6,951 2 View

Why the density of silanol groups calculated from TGA is higher than the Zhuravlev constant (8 micromol/m2)?

I have nonporous silica particles with a size of 400 nm and a BET surface area of 10.4 m2/g (mode of pore size 1.87 nm). I measured the weight loss during TG analysis and it was 0.93% (second step...

14 March 2023 1,619 0 View

Does capillary attractive force between particles in an evaporating dispersion depend on the evaporation rate of the solvent?

In a dispersion of colloidal particles under evaporation, there is generation of a capillary force which bring the particles closer, if the evaporation rate is higher, the capillary flow will...

01 November 2022 9,886 2 View

How can the memory process be developed for learners through an indicative program?

How do I build a guide program aimed at developing the process of remembering students?

29 October 2022 1,196 6 View

I'm interested in studying the role of the financial sector in the economic war, what kind of data do I need to collect?

I want to study the subject standard study and I am in the stage of adjusting the data needed for the study

07 June 2022 3,483 5 View

How to accurately calculate the filling factor of a photonic colloidal crystal from its reflection wavelength?

The crystal structure is FCC (for a perfect crystal the theoretical filling factor or volume fraction of particles should be 0.74), particles are silica with a refractive index of 1.42 and a...

28 March 2022 6,273 11 View

A single point calculation of the anionic form of the ground state don't met convergence?

Salam I am conducting a single point calculation on an optimized structure of chitosan-ZnO nanocomposite with a DFT/B3LYP/LanL2DZ method to get the energy of the anionic form of the ground state....

23 September 2021 4,475 4 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to generate a citation of my paper from ResearchGate?

How we can cite the papers from ResearchGate. I am trying to create citations for this article, Quantum Machine Learning Algorithms for Optimization Problems: Theory, Implementation, and...

08 August 2024 6,690 3 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Is there anyone with experience in TEM analysis who can assist with a manuscript for an upcoming journal?

Hello dear colleagues, We have prepared a manuscript on NiTi-based alloys and are seeking a second opinion on our current TEM results. If you are a Ph.D. holder with experience in TEM and have...

07 August 2024 9,563 0 View

How to get links for copyrights for papers?

how to get links for copyrights for papers?

06 August 2024 7,410 1 View

How to determine positive-stained cells in FACS? Use isotype or unstained control?

To compare positive and negative cell populations in flow cytometry, should I compare unstained cells with antibody stained cells? Or with the isotype control? Most papers show comparison with...

06 August 2024 6,728 6 View

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Please can anyone support with the survey questions based on RQ measures and propose how to do it in FMCG industry and include as well the role of brand equity Thanks

06 August 2024 949 0 View

Raoul Raftopoulos

Hi Soumia Mehimeh !

First of all, the transition probabilities from one state to another usually depend on two things:

the environment dynamics
the agent's actions

The environment dynamics is simply the internal logic of the environment. For example, let's say you are playing poker (texas hold'em), and three cards are shown in the table. After the betting round, one new card will be shown, right? This means that the MDP will transit from one state where three cards are shown, to another where the fourth card will also be visible to all the players. This fourth card is independent from the actions of the players, and only depends on the environment dynamics

Also, between each part of the turn, players take turn to place some bet! This means that the state of the environment changes whenever a new bet is placed. This is where the environment state changes because of a player action: more money will be on the table, and in turn some players may be more tempted (or not) to keep playing (place the bet, or raise) or to just fold. In this case, the transition probabilities are somewhat more explicit with respect to other examples, like trading agents, in which the state of the environment might be a stock current price, which is influenced a ton of things, but also autonomous driving, walking humanoids robot, etc.

Now, in real life scenarios, you usually don't have access to these transition probabilities (since this is just not how the real world works). However, you can use Deep Neural Networks, among other approaches, to try to estimate the transition probabilities from one state to another, and therefore make intelligent decisions to try to maximize the reward.

This is the reason why in the most recent reinforcement learning papers, in the section in which they describe the Markov Decision Process, the state transition probabilities are usually unkown, and have to be "derived" from direct experience with the environment.

I hope I've been helpful!

Oliver Wallscheid

From a control engineering perspective, any standard ordinary differential equation system or state-space model can be considered a special type of MDP which internal dynamics are deterministic and can be fully defined by a proper mathematical model. Although classical MDPs are often considered of stochastic and unknown nature, this assumption is not mandatory in the RL context, i.e., assuming some deterministic model is also fine and also often used in model-based RL (being closely related to model predictive control).

Rana Ghazali

Since most of the environments are unknown and modelling these environments are difficult particularly determining the transition function. Therefore we can use model free RL like Q-learning and design reward function such that handles our problem goal.