37
Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin From: Google brain Google research

Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Attention Is All You Need

Presented by: Aqeel Labash

2017 - By:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,

Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia PolosukhinFrom:

Google brainGoogle research

Page 2: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Deep learning, algorithms inspired by brain functions & structure

http://neuralnetworksanddeeplearning.com

Page 3: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Machine translation is a blackbox

http://blog.webcertain.com/the-dangers-of-machine-translation/09/07/2014/

Page 4: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Taking a look under the hood

http://matthewbrannelly.com.au/

Page 5: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

But First How Machine Translation work ?

Page 6: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Word embedding is about representing words in meaningful numerical vectors

https://www.doc.ic.ac.uk/~js4416/163/website/nlp/

Page 7: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Problem: RNN constrained by previous timestep computation

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Page 8: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Target is to Improve the perofmrance and get rid of sequential computation

Page 9: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Transformer

1

https://www.universalstudioshollywood.com/things-to-do/rides-and-attractions/transformers-the-ride-3d/

Page 10: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Self-Attention: focus on the important parts.

https://deepage.net/deep_learning/2017/03/03/attention-augmented-recurrent-neural-networks.html#attentionインターフェース

Page 11: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 12: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 13: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 14: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 15: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 16: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 17: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 18: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Page 19: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Decoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax

Page 20: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Decoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax

Page 21: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Model: Complete

https://arxiv.org/abs/1706.03762

Page 22: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://arxiv.org/abs/1706.03762

Page 23: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://arxiv.org/abs/1706.03762

Page 24: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://arxiv.org/abs/1706.03762

Page 25: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762

Q,K,V

● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”

● Otherwise: all three come from previous layer ( Hidden state )

Page 26: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762

Q,K,V

● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”

● Otherwise: all three come from previous layer ( Hidden state )

Page 27: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://arxiv.org/abs/1706.03762

Complexity

Page 28: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://arxiv.org/abs/1706.03762

Position-wise Feed-Forward network

Page 29: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

https://stats.stackexchange.com/questions/201569/difference-between-dropout-and-dropconnect/201891

Dropout

● Used for Residual connections

Page 30: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

http://www.prioritiesusa.org/recommendations-for-sports-training/

Training

● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with

37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.

● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)

Page 31: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

http://www.prioritiesusa.org/recommendations-for-sports-training/

Training

● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with

37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.

● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)

Page 32: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Optimizing the parameters

https://arxiv.org/abs/1706.03762

Page 33: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Results

https://arxiv.org/abs/1706.03762

Page 34: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Results

https://arxiv.org/abs/1706.03762

Page 35: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Results

https://arxiv.org/abs/1706.03762

Page 36: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

How is this paper linked to neuroscience ??

https://www.astrazeneca.com/our-focus-areas/neuroscience.htmlhttp://blog.webcertain.com/machine-translation-technology-the-search-engine-takeover/18/02/2015/

?

Page 37: Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need Presented by: Aqeel Labash 2017 - By: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Thanks for ListeningLet’s start with the first set of slides