I am currently implementing the following two RL algorithms for 5G

(i) Power control to maximize throughput in a multi-gNB multi-UE scenario, and

(ii) Maximizing throughput subject to delay constraints in a single-gNB multi-UE scenario

I have made significant progress using an external Python program connected to NetSim's gNB scheduler. This setup facilitates the exchange of states and rewards between the scheduler and the Python program.

I am interested in exploring scenarios with time delays in state-reward exchanges, such as delayed state reception. How does RL adapt in such situations? I would appreciate any relevant research papers on this topic

More Paarth Dwivedi's questions See All
Similar questions and discussions