What research has been done on learning non-Markovian reward functions?

More Gavin Rens's questions See All

What exactly is it about biotechnology and medical research?

I'm very interested in biotechnology and medical research and I'd like to meet a professional here and talk about it together

28 July 2024 3,798 4 View

How can I salvage a failed trizol/chloroform RNA extraction using the first interphase?

I've been working off an RNA extraction protocol using trizol and chloroform for several months that worked beautifully until I moved on to our most important samples. After completing them, I...

23 May 2024 2,475 1 View

How to write a ethical approval statement while using public dataset, such as PISA, in the research?

I used pisa dataset 2018 in my research, but the journal editor asked me to give an ethical decaration in the paper. Can i just write "not applicalbe" under the statement?

11 January 2023 9,607 2 View

If GFP is expressed under a specific promoter, and then the cell differentiates and this promoter is no longer active, how long can GFP stay in cell?

I'm wondering what the half life of green fluorescent protein is, and how quickly it is degraded and removed in a neuron. IF we express it via a promoter only active in progenitor cells, could we...

27 June 2022 284 1 View

Which di/tripeptides are soluble in DMSO/DMF?

I am incorporating di/tripeptides into an anhydrous reaction carried out in either DMF or DMSO but have encountered solubility issues with the peptides I have used so far, being far more...

03 June 2021 9,507 3 View

Merkel carcinoma cell line availability?

Hello, I am looking for a Merkel carcinoma cell line for an MSc project. Can anyone recommend a Merkel carcinoma cell line and if so is it possible to supply us with some of these cells? Thank...

30 April 2021 7,835 2 View

How do I remove DMF from a product sensitive to hydrolysis?

I am currently synthesising polymers in DMF through a simple Michael addition. In the current procedure, the polymer is isolated by precipitiation when added to cold diethyl ether (crashes out...

12 April 2021 7,811 4 View

How to make acetone solution into gas for gas sensor testing?

I am currently working on metal oxide gas sensor fabrication. I came across several papers saying when they mix 10ppm of acetone with synthetic air and then tune it with mass flow control to...

06 October 2020 5,420 1 View

Is anyone seeing patients with COVID 19 yet? Given that it affects respiration It must therefore affect swallowing. Do others have any experience?

COVID 19 (the name for the disease which comes from the virus SARS-COV-2) which has of March 11 2020 been declared a Pandemic.

18 March 2020 8,319 2 View

Is it typical to combine right and left feet clearances as a global estimate of toe clearance?

I have left and right toe clearances for gait assessment, and am considering combining both feet as an additonal 'global' estimate of clearance. Firstly, is this common practice? Secondly, can any...

09 December 2019 5,383 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Could dyes amplify the spectrum of light to a specific wavelength?

I am interested to know the behavior of dyes toward light. Specifically, Blue dyes re-emit the spectrum, especially from the green zone (known as principal in LED lamps, and blue dyes are known...

05 August 2024 3,290 1 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Some new emerging problems on application of RL for scheduling in IoT networks?

I have seen plenty of existing works on applied Reinforcement Learning (RL) policies for optimized scheduling in IoT networks including Q-learning, DQNs, and the newer ones including PPO for...

01 August 2024 8,754 2 View

Nico Potyka

I think that people often do not mention that they are using k-order MDPs because you can reduce it to a standard MDP. You can basically do so by letting a state depend not only on the current observation, but also on previous observations. DeepMind's DQN, for example, is effectively a 4-order MDP because a state does not only consist of the current frame, but of the last 4 frames. In general, if you want to represent a transition probability function of the form p(s | a, s1, ..., sk), you can reduce this to a markovian transition probability function by considering k-tuples of states. Your transition probability function then looks like p((s, s1, ..., sk-1) | a, (s1, ..., sk)) where (s, s1, ..., sk-1) and (s1, ..., sk) are just states (k-tuples) of another MDP.

Gavin Rens

Thanks, Nico. I suspected something like that, and your formal explanation confirms it.

Your answer related to k-order MDPs, but there is still the case of learning LTL-like specifications for rewards. I've not yet come across such work. There are methods for learning Finite and Büchi (infinite) Automata, but i haven't come across such work in the area of MDPs/RL.