After a lot of effort I could not understand the PAC guarantee and fitted q-learning algorithm. I need a paper/book/article which explain these topics thoroughly.
There are many sources to learn about PAC results (or more generally, Statistical Learning Theory), especially if the focus is on the supervised learning setting.
I have compiled an incomplete list of resources that might be helpful at the end of these slides:
The analysis of an RL algorithms is usually more difficult than the analysis of supervised learning algorithms.
For the theoretical analysis of Fitted Q-Iteration (an Approximate Value Iteration algorithm), take a look at the following paper:
Remi Munos and Csaba Szepesvari, "Finite Time Bounds for Fitted Value Iteration," JMLR, 2008.
For LSPI-like algorithms, take a look at the following papers:
* Andras Antos, Csaba Szepesvari, and Remi Munos,"Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, 2008.
* Alessandro Lazaric, Mohammad Ghavamzadeh, and Remi Munos. "Finite-Sample Analysis of Least-Squares Policy Iteration," JMLR, 2012.
For the regularized variants of these algorithms (i.e., Regularized Fitted Q-Iteration and Regularized LSPI), take a look at my PhD thesis:
Amir-massoud Farahmand, Regularization in Reinforcement Learning, 2011.