How can we make the training loss decrease in reinforcement learning

More Chen Ziheng's questions See All

How to understand this crystallographic phenomenon of low temperature crystals in zeolite?

During low-temperature testing, new diffraction peaks that appear could be indicative of several phenomena. In one of our tests, we observed notable new peaks around 40° and 45° in a specific...

06 August 2024 726 3 View

How can I find writers for drafting, submitting, and publishing papers?

Looking for paper collaboration writers： Collaboration Model 1: 1.1 Based on the chosen topic, complete the paper writing, select an SCI or SSCI Q1 journal, submit using my QRCID, and complete...

27 July 2024 8,965 1 View

Mechanochemistry: ball to powder mass or ball to powder volume?

I run mechanochemical reactions that involves the use of a large amount of catalyst (1:1 molar ratio to the reactant). When I try to understand the effect of amount of catalyst on the reaction, I...

23 July 2024 9,938 1 View

How can I determine if a drug binds to a ribosome using a Cryo-EM map?

Hello, I am searching for the binding position of a drug in a ribosome, based on previous work indicating it binds there. The resolution of the ribosome-drug Cryo-EM map is around 2.5 Å. I've been...

19 July 2024 2,155 0 View

Can anyone recommend a brand of cas9 protein that can be used for in-vivo editing in model animals?

zebrafish and mouse

19 July 2024 9,158 1 View

What happens in snapTotal- seq?

What happens in snapTotal- seq? (Single-cell Total-RNA Profiling Unveils Regulatory Hubs of Transcription Factors https://www.nature.com/articles/s41467-024-50291-3) In this protocol, several RT...

18 July 2024 10,010 0 View

How to cacualte the creep life of polycrystal with CPFEM?

I saw a paper(PHAN V-T, ZHANG X, LI Y, et al. Microscale modeling of creep deformation and rupture in Nickel-based superalloy IN 617 at high temperature [J]. Mechanics of Materials, 2017, 114:...

16 July 2024 8,436 0 View

Does HEK293T cell express RIG1?

Is there any benefit in knocking out the interferon receptor in 293T cells? In the context of creating RNA viruses 📷

13 July 2024 3,382 0 View

The HCP removal rate is lower than usual in concentration step, why?

Hello everyone. I am working on my LV concentration porject, I use 500Kda hollowfiber to do it. The HCP and BSA removal rate is nearly 80%, but last time, I failed to run it. The HCP hasn't been...

11 July 2024 5,380 0 View

The importance of humanistic care?

Humanistic care is mainly about caring for people's living conditions, affirming people's living conditions and dignity, and pursuing human freedom and liberation. With the continuous development...

10 July 2024 5,107 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Do you think can be any diamond in A type eclogites?

I want to know more about diamond ore deposits in world.

08 August 2024 1,514 0 View

U you think We need a website software of Blackbody radiation law expert software?

A website software of Blackbody radiation law expert software can used through the following web site. http://39.105.188.151:3000/index

07 August 2024 1,706 0 View

Enhancing Critical Thinking Skills for Slow Learners: A Review of Empirical Studies?

to identify themes in question with APA style references

07 August 2024 2,239 5 View

How can I improve the quality of lamb milk replacer for machine feeding?

Hello everyone, I am researching ways to enhance the quality of lamb milk replacer for machine feeding. I would appreciate any insights or recommendations on the following: Manual vs. Machine...

06 August 2024 6,227 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How to preform densitometry on SDS-page bands?

I ran a SDS-page of a bacterial lysate and I want to quantify protein concentration in a specific band. I was thinking of using a standards ladder or make some standards are different...

05 August 2024 9,805 3 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Fabrice Noreils

Hi Chen

If you want that people help you here, you first need to clearly describe the problem you are working on:

What is your problem

How did you modelize it (I guess you are implementing a DQN algorithm):

- state input vector

- output vector (usually you are working with the Q value not V)

- what kind of neural net

- target for policy evaluation (Monte carlo, time difference with n-steps bootstrap or not...)

- E-greedy exploration or not? exploration versus exploitation ration

- replay buffer

- batch size?

- what is your loss function

- and finally your optimizer...

Then, I think that you can find somebody out there who will be able to help you, otherwise it is impossible to answer your question.

Regards

Chen Ziheng

Yes, it combines the DQN and PG. The training data is updated by the policy and we sample them in the data buffer to update the policy. Thank you

Erik Cuevas

Combines the policies depending of the used approaches