I am currently implementing the following two RL algorithms for 5G
(i) Power control to maximize throughput in a multi-gNB multi-UE scenario, and
(ii) Maximizing throughput subject to delay constraints in a single-gNB multi-UE scenario
I have made significant progress using an external Python program connected to NetSim's gNB scheduler. This setup facilitates the exchange of states and rewards between the scheduler and the Python program.
I am interested in exploring scenarios with time delays in state-reward exchanges, such as delayed state reception. How does RL adapt in such situations? I would appreciate any relevant research papers on this topic