1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Reinforcement learning by Sutton, Tic tac toe self play

Discussion in 'Education' started by dayum, Oct 8, 2018.

  1. dayum

    dayum Guest

    I just started Sutton and Barto's book, Reinforcement Learning: An Introduction, and am curious as to how to think about the answer to Exercise 1.1: Self-Play. Suppose, instead of playing against a random opponent, the reinforcement learning algorithm described above played against itself. What do you think would happen in this case? Would it learn a different way of playing?

    One could also think of the following related sub-questions, but they haven't made my thoughts any clearer.

    1. Would removing the random part of the learning change the situation- i.e. always following optimal policy and not exploring?
    2. How would it depend on who is the first mover?

    Login To add answer/comment

Share This Page