Reinforcement learning by Sutton, Tic tac toe self play

Discussion in 'Education' started by dayum, Oct 8, 2018.

    dayum Guest

    I just started Sutton and Barto's book, Reinforcement Learning: An Introduction, and am curious as to how to think about the answer to Exercise 1.1: Self-Play. Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself. What do you think would happen in this case? Would it learn a different way of playing?

    One could also think of the following related sub-questions, but they haven't made my thoughts any clearer.

    1. Would removing the random part of the learning change the situation- i.e. always following optimal policy and not exploring?
    2. How would it depend on who is the first mover?

