Thursday, 14 December 2017

Don't panic

The publication of a new paper by the team behind AlphaGo has really got the chess world talking.  Applying the AlphaGo learning method to chess, they developed a program that was able to beat Stockfish after 4 hours of self learning. To read the headlines (and comments) about this, it would almost seem that humans are about to be replaced by computers, in all facets of life.
For me, while it was an impressive result, it isn't the end of the world, or even chess. Self learning programs have been around for a while, and were quite strong, even 15 years ago. KnightCap was one such program, with the authors describing the self learning aspects in a paper they published in 1999 (which was cited by the authors of AlphaZero).
On the other hand, what did impress me was the successful implementation of the Monte Carlo Tree Search. This is an alternative to the tried and true Alpha-Beta search method (or its variants), and relies on a probabilistic approach to evaluating position.  Instead of assessing the various factors in a position (material, space, pawn structure), the program self-plays thousands of games from a given position, preferring the move that results in the most number of wins. The obvious flaw in this method (apart from computing restraints), is that while a move may lead to wins 99 times out of 100, the opponent may find the 1% reply that is a forced loss for the engine. But based on the result against Stockfish, this did not seem to occur in practice.
The other thing to point out is that this wasn't a match between AlphaZero and Stockfish, at least not in a competitive sense. Stockfish had a number of restrictions placed on it (no opening book, less powerful hardware), and I suspect the point of the exercise was to provide a measure of how successful the learning algorithm was. If the authors intend to develop the worlds strongest chess program, then entering the World Computer Chess Championships is instead the best way to test it.


AlphaZero (Computer) - Stockfish (Computer) [E17]
AlphaZero - Stockfish London ENG, 04.12.2017

Start positionPrevious MoveNext MoveEnd positionPlay movesStop playing
1. Nf3 Nf6 2. d4 e6 3. c4 b6 4. g3 Bb7 5. Bg2 Be7 6. O-O O-O 7. d5 exd5 8. Nh4 c6 9. cxd5 Nxd5 10. Nf5 Nc7 11. e4 d5 12. exd5 Nxd5 13. Nc3 Nxc3 14. Qg4 g6 15. Nh6+ Kg7 16. bxc3 Bc8 17. Qf4 Qd6 18. Qa4 g5 19. Re1 Kxh6 20. h4 f6 21. Be3 Bf5 22. Rad1 Qa3 23. Qc4 b5 24. hxg5+ fxg5 25. Qh4+ Kg6 26. Qh1 Kg7 27. Be4 Bg6 28. Bxg6 hxg6 29. Qh3 Bf6 30. Kg2 Qxa2 31. Rh1 Qg8 32. c4 Re8 33. Bd4 Bxd4 34. Rxd4 Rd8 35. Rxd8 Qxd8 36. Qe6 Nd7 37. Rd1 Nc5 38. Rxd8 Nxe6 39. Rxa8 Kf6 40. cxb5 cxb5 41. Kf3 Nd4+ 42. Ke4 Nc6 43. Rc8 Ne7 44. Rb8 Nf5 45. g4 Nh6 46. f3 Nf7 47. Ra8 Nd6+ 48. Kd5 Nc4 49. Rxa7 Ne3+ 50. Ke4 Nc4 51. Ra6+ Kg7 52. Rc6 Kf7 53. Rc5 Ke6 54. Rxg5 Kf6 55. Rc5 g5 56. Kd4 1-0


1 comment:

Anonymous said...

Actually, if you read the AlphaGoZero paper (the Nature one) carefully, and the current preprint, it seems there is no randomization (or even roll-outs!) in the MCTS they use (except in the learning stage, with "Dirichlet noise"). When playing a game, it's more of "book-building" according to the computer experts, as various GUIs allow. But I'm sure than the NN-based Google implementation is honed to their domain (80K new "MCTS" nodes per second).