Many papers(Conference Paper Output Feedback H ∞ Control of Unknown Discrete-time Linear ...

,

Article H∞ control of linear discrete-time systems: Off-policy reinf...

)have been developed in the area of reinforcement based optimal control

they have presented the simulation results and it is mentioned that as a first step they are collecting the data required for learning using behaviour policy and later with this collected data they are  moving forward to find the optimal policy(target policy).

I would like to know what are the initial values of Uk and Wk that are applying to the system to collect the data and how they are shifting the existing control to a different control input while carrying out the simulation?  In the simulation portion they mentioned  values for K1 and K2 as initial values, this initial value they are using for behavior policy or target policy. How this can be implemented in MATLAB?

More Athira Mullachery's questions See All
Similar questions and discussions