9

Deep learning, 2017 Lyle Ungar University of Pennsylvania

Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Download PDF Report

Upload
others
View
2
Download
0

Embed Size (px)

Citation preview

Page 1: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Deep learning, 2017 Lyle Ungar

University of Pennsylvania

Page 2: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

What’s hot in Deep Learning u Solving problems

l  Breast cancer diagnosis; chatbots; optimizing data centers… l  Replacing more and more hand-crafted features

u GANS (Generative Adversarial Networks) l  Learn generative models

u Reinforcement Learning u DeepMind/google

Page 3: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Power Utilization u 15% reduction

https://www.theverge.com/2016/7/21/12246258/google-deepmind-ai-data-center-cooling

Page 4: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

A simple neural network module for relational reasoning

https://deepmind.com/blog/deepmind-papers-nips-2017/

There is a tiny rubber thing that is the same colour as the large cylinder; what shape is it?

Page 5: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

https://arxiv.org/pdf/1712.01815.pdf

Page 6: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Lots of fancy network structures

googlenet

Convolutional (different sizes) Or fully connected Maxpool Concatination Softmax

Some layers use dropout

input

outputs

Page 7: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Design a really smart computer

https://research.googleblog.com/2017/05/using-machine-learning-to-explore.html

Using Machine Learning to Explore Neural Network Architecture

Use RL to search for the ‘best’ neural net architecture

Page 8: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

AutoML learns network structure

Human built Learned by RL https://research.googleblog.com/2017/05/using-machine-learning-to-explore.html

Page 9: Lyle Ungar University of Pennsylvania - Penn …cis520/lectures/deep...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Starting from random play,

Big open directions (an opinion) u Multitask learning / domain transfer u One shot learning u  Integrating deep learning with external data

(e.g. results of search)

Mastering Chess and Shogi by Self-Play with a General ... · neural network f . The search returns a vector ˇrepresenting a probability distribution over moves, either proportionally

Mastering Chess and Shogi by Self-Play with a General ... · neural network f . The search returns a vector ˇrepresenting a probability distribution over moves, either proportionally

Documents

Computer Shogi 2012 through 2014

Computer Shogi 2012 through 2014

Documents

Shogi Yearbook 2015 - Shogi24.comshogi24.com/YB_2015.pdf · Shogi Yearbook 2015 Title match games, Challenger’s tournaments, interviews with WATANABE Akira and HIROSE Akihito, tournament

Shogi Yearbook 2015 - Shogi24.comshogi24.com/YB_2015.pdf · Shogi Yearbook 2015 Title match games, Challenger’s tournaments, interviews with WATANABE Akira and HIROSE Akihito, tournament

Documents

R EINFORCEMENT L EARNING FOR P LANNING OF A S IMULATED … · such as learning to play Go [1], chess and shogi 1 at a superhuman level [2] and playing Atari with only raw sensory

R EINFORCEMENT L EARNING FOR P LANNING OF A S IMULATED … · such as learning to play Go [1], chess and shogi 1 at a superhuman level [2] and playing Atari with only raw sensory

Documents

Mastering Chess and Shogi by Self ... - UCL Computer ScienceMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian

Mastering Chess and Shogi by Self ... - UCL Computer ScienceMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian

Documents

QUARTERBACK PLAYBOOK TEMPLATE - …shop.champrosports.com/PDF/AF45-AF46-PlaycardTemplate.pdfPLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY QUARTERBACK PLAYBOOK TEMPLATE

QUARTERBACK PLAYBOOK TEMPLATE - …shop.champrosports.com/PDF/AF45-AF46-PlaycardTemplate.pdfPLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY QUARTERBACK PLAYBOOK TEMPLATE

Documents

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final ...cis520/exams/final2017_solutions.pdf · wis unknown, we would like to estimate it using linear regression. Adding priors

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final ...cis520/exams/final2017_solutions.pdf · wis unknown, we would like to estimate it using linear regression. Adding priors

Documents

Mastering Chess and Shogi by Self-Play with a General ... · Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian

Mastering Chess and Shogi by Self-Play with a General ... · Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian

Documents

Woodsmoke resort & spa, Shogi, Shimla

Woodsmoke resort & spa, Shogi, Shimla

Travel

#Julialang and Computer Shogi (Japanese Chess)

#Julialang and Computer Shogi (Japanese Chess)

Technology

castillos en shogi - ajedrezjapones.comajedrezjapones.com/wp-content/uploads/2017/01/CASTILLOS.pdf · En contraste con el movimiento especial de enroque en el ajedrez occidental,

castillos en shogi - ajedrezjapones.comajedrezjapones.com/wp-content/uploads/2017/01/CASTILLOS.pdf · En contraste con el movimiento especial de enroque en el ajedrez occidental,

Documents

QUARTERBACK PLAYBOOK TEMPLATE - The …static.growthofagame.com/.../Champro-Playcard-Template.pdfPLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY QUARTERBACK PLAYBOOK

QUARTERBACK PLAYBOOK TEMPLATE - The …static.growthofagame.com/.../Champro-Playcard-Template.pdfPLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY PLAY# PLAY QUARTERBACK PLAYBOOK

Documents

Mastering Chess and Shogi by Self-Play with a …...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian Schrittwieser,

Mastering Chess and Shogi by Self-Play with a …...Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert, Julian Schrittwieser,

Documents

Reanalyze Mastering Atari, Go, Chess and Shogi by Planning ......Chess Shogi Go Motivation Bring the power of planning with MCTS and deep networks to an increasingly larger set of

Reanalyze Mastering Atari, Go, Chess and Shogi by Planning ......Chess Shogi Go Motivation Bring the power of planning with MCTS and deep networks to an increasingly larger set of

Documents

Associative Search on Shogi Game Recordsmizuhito/papers/journal/IPSJ-TPRO_4(3)2011.pdf · 43 Associative Search on Shogi Game Records (5,613,402 states) appearing in the Shogi game

Associative Search on Shogi Game Recordsmizuhito/papers/journal/IPSJ-TPRO_4(3)2011.pdf · 43 Associative Search on Shogi Game Records (5,613,402 states) appearing in the Shogi game

Documents

the kernel function k(xn) xm) must be evaluated for all ...cis520/papers/Bishop_7.1.pdf · review the key concepts covered in Appendix E. Additional infor mation on support vector

the kernel function k(xn) xm) must be evaluated for all ...cis520/papers/Bishop_7.1.pdf · review the key concepts covered in Appendix E. Additional infor mation on support vector

Documents

navod Shogi A5 EN · Japanese chess Shogi are a more interesting and more complicated version of chess. It originated during the rule of Nara (710 - 794). Some names for pieces are

navod Shogi A5 EN · Japanese chess Shogi are a more interesting and more complicated version of chess. It originated during the rule of Nara (710 - 794). Some names for pieces are

Documents

Introduction to Handicap Play - Shogieric.macshogi.com/shogi/roger/Introduction to Handicap Play.pdfAcknowledgements. I am grateful to Larry Kaufman for granting permission to make

Introduction to Handicap Play - Shogieric.macshogi.com/shogi/roger/Introduction to Handicap Play.pdfAcknowledgements. I am grateful to Larry Kaufman for granting permission to make

Documents

2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata

2004/11/13GPW20041 What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata

Documents

A general reinforcement learning ... - discovery.ucl.ac.uk · A general reinforcement learning algorithm that masters chess, shogi and Go through self-play David Silver, 1;2 Thomas

A general reinforcement learning ... - discovery.ucl.ac.uk · A general reinforcement learning algorithm that masters chess, shogi and Go through self-play David Silver, 1;2 Thomas

Documents

How to Build a Foldable Large Size Shogi Ban - kifulog.net · How to Build a Foldable Large Size Shogi Ban Version1 - June 2014 Budget : 20euros in Europe. Most of the material can

How to Build a Foldable Large Size Shogi Ban - kifulog.net · How to Build a Foldable Large Size Shogi Ban Version1 - June 2014 Budget : 20euros in Europe. Most of the material can

Documents

Technology

Instrucciones para jugar Shogi

Instrucciones para jugar Shogi

Documents

Mastering Chess and Shogi by Self-Play with a General · PDF fileMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert,

Mastering Chess and Shogi by Self-Play with a General · PDF fileMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, 1Thomas Hubert,

Documents

Algorithm Reinforcement Learning Self-Play with a General Mastering Chess …allen450/alphazero-talk.pdf · 2019-03-27 · Chess, Shogi, and Go Chess Pieces have varying movement

Algorithm Reinforcement Learning Self-Play with a General Mastering Chess …allen450/alphazero-talk.pdf · 2019-03-27 · Chess, Shogi, and Go Chess Pieces have varying movement

Documents

Shogi Internationalization: Europe · iii Preface I have started playing shogi 10 years ago, being inspired by Japanese manga. At some point I began to dream about becoming a professional

Shogi Internationalization: Europe · iii Preface I have started playing shogi 10 years ago, being inspired by Japanese manga. At some point I began to dream about becoming a professional

Documents

Latent Dirichlet Allocation (LDA) - Penn Engineering ...cis520/lectures/LDA.pdfDirichlet Distribution • Dirichlet distribution • is defined over a (k-1)-simplex. I.e., it takes

Latent Dirichlet Allocation (LDA) - Penn Engineering ...cis520/lectures/LDA.pdfDirichlet Distribution • Dirichlet distribution • is defined over a (k-1)-simplex. I.e., it takes

Documents

Threat Analysis to Reduce the Effects of the Horizon Problem in Shogi

Threat Analysis to Reduce the Effects of the Horizon Problem in Shogi

Documents

shogi 2017 - C-able · Title: shogi_2017 Author: Rin Created Date: 9/1/2017 6:50:51 PM

shogi 2017 - C-able · Title: shogi_2017 Author: Rin Created Date: 9/1/2017 6:50:51 PM

Documents

Pro Shogi Player’s Rating and Game Records Analysis

Pro Shogi Player’s Rating and Game Records Analysis

Documents