Hello,
I am working on a same day delivery problem with customers placing orders randomly throughout the day. The objective is to maximize the expected number of customers served in a day and the problem is modeled as a MDP.
One of the decisions that needs to be made is which customers to accept and which ones to reject for the same day delivery. I want to use a reinforcement learning method (maybe Q-learning) to make the accept-reject decisions and this is the part where some help is needed.
Please contact me if you think you could help with this and would like to collaborate on the work.
Thank you,
Bhawesh