Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation (2008)
Hachiya, H., Akiyama, T., Sugiyama, M., Peters, J., Fox, D., Gomes, C. P.
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are usually prohibitively...