Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need...

Attention Is All You Need

Presented by: Aqeel Labash

2017 - By:Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,

Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia PolosukhinFrom:

Google brainGoogle research

Deep learning, algorithms inspired by brain functions & structure

http://neuralnetworksanddeeplearning.com

Machine translation is a blackbox

http://blog.webcertain.com/the-dangers-of-machine-translation/09/07/2014/

Taking a look under the hood

http://matthewbrannelly.com.au/

But First How Machine Translation work ?

Word embedding is about representing words in meaningful numerical vectors

https://www.doc.ic.ac.uk/~js4416/163/website/nlp/

Problem: RNN constrained by previous timestep computation

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Target is to Improve the perofmrance and get rid of sequential computation

Transformer

https://www.universalstudioshollywood.com/things-to-do/rides-and-attractions/transformers-the-ride-3d/

Self-Attention: focus on the important parts.

https://deepage.net/deep_learning/2017/03/03/attention-augmented-recurrent-neural-networks.html#attentionインターフェース

Model: Encoder

https://arxiv.org/abs/1706.03762

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward

Model: Encoder

Model: Decoder

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax

Model: Decoder

● N=6● All Layers output size 512● Embedding● Positional Encoding● Notice the Residual connection● Multi-head Attention● LayerNorm(x + Sublayer(x))● Position wise feed forward● Softmax

Model: Complete

https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762

● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”

● Otherwise: all three come from previous layer ( Hidden state )

https://www.reddit.com/r/MachineLearning/comments/6kc7py/d_where_does_the_query_keys_and_values_come_from/https://arxiv.org/abs/1706.03762

● "encoder-decoder attention" layers, the queries (Q) come from the previous decoder layer, and the memory keys (K) and values (V) come from the output of the encoder.”

● Otherwise: all three come from previous layer ( Hidden state )

Complexity

Position-wise Feed-Forward network

https://stats.stackexchange.com/questions/201569/difference-between-dropout-and-dropconnect/201891

Dropout

● Used for Residual connections

http://www.prioritiesusa.org/recommendations-for-sports-training/

Training

● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with

37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.

● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)

http://www.prioritiesusa.org/recommendations-for-sports-training/

Training

● Data sets:○ WMT 2014 English-German: 4.5 million sentences pairs with

37K tokens.○ WMT 2014 English-French: 36M sentences, 32K tokens.

● Hardware:○ 8 Nvidia P100 GPus (Base model 12 hours, big model 3.5 days)

Optimizing the parameters

Results

How is this paper linked to neuroscience ??

https://www.astrazeneca.com/our-focus-areas/neuroscience.htmlhttp://blog.webcertain.com/machine-translation-technology-the-search-engine-takeover/18/02/2015/

Thanks for ListeningLet’s start with the first set of slides

Attention Is All You Need - Arvutiteaduse instituut · 2017-09-28 · Attention Is All You Need...

Documents

PowerPointi esitlus...Veebis Veebilehitseja arendaja tööriistad (sh. konsool) Protokoll javascript: Arendusvahendite API veebilehitsejaga 04.11.2018 TARTU ÜLIKOOL ARVUTITEADUSE

Quality assurance & testing - Kursused - Arvutiteaduse … to use automated tests? Automated tests are good for: Testing performance, throughput, response times Testing machine interfaces

Mixture Models for Diverse Machine Translation: Tricks of ... · sequence learning (Shazeer et al.,2017;He et al.,2018), but these did not evaluate generations in terms of both quality

Mobile and Cloud Computing Seminar - Arvutiteaduse … · Mobile and Cloud Computing Seminar MTAT.03.280 Spring 2014 ... •Addressing mobiles in 3G/4G networks ... Seminar topics

Transform coding - Kursused - Arvutiteaduse instituut · 2015. 12. 15. · Transform coding Transform coding is the second approach to exploiting redundancy by using scalar quantization

simulator Hodgkin-Huxley model - Arvutiteaduse instituut · 2015. 6. 3. · -Huxley Simulator Cursor Zoom pan Drugs Nur_lge Stop Time (msec) print Export Merrill Voltage Clear Recall

2.1 Numerical Python (NumPy) - Arvutiteaduse instituut17Python in SC2.1 Numerical Python (NumPy) 2.1 Numerical Python (NumPy) •NumPy enables efﬁcient numerical computing in Python

Marlon Dumas - Kursused - Arvutiteaduse instituut · Marlon Dumas. Business Process ... Performance Measures in SCMPerformance Measures in SCM ... Lean / Six Sigma Eliminate waste,

Attention exogène. Attention endogène. Chien perçu

Cloud Computing Lec2 - Arvutiteaduse instituut · Amazon Elastic Compute Cloud (EC2) • One of the very early pioneers of cloud computing • In a nutshell: – On Demand “Operating

Testing on Android - Arvutiteaduse instituut · 2018. 10. 26. · Overview •Conventional Java testing •What makes mobile/Android testing special? •Further concepts: Continuous

CAUTION: ATTENTION: ADVERTENCIA: IMPORTANT: ATTENTION

Software Quality Management - Arvutiteaduse instituut · 2014-11-28 · Software Standards - Importance •Standards play an important role in quality management: –Encapsulation

Arvutiteaduse instituut - Support Vector Machines...Performance evaluation, Statistical learning theory Linear algebra, Optimization methods May 11, 2015 Coming up next Supervised

How to get students attention. Index 1. Attention conditions1. Attention conditions 2. Attention types2. Attention types 3. Strategies to get children

Paying Attention to Attention

OptionGAN: Learning Joint Reward-Policy Options using ...(ajs)): P!!!. Several works inves-tigate convergence to a sparse and specialized Mixture-of-Experts (Jacobs et al. 1991; Shazeer

Natural Language Processing - Arvutiteaduse instituut · 2017-10-13 · Natural Language Processing: Lecture 6 12.10.2017 ... Their role is to modify the meaning of a lexical morpheme

4. SEQUENCE ANALYSIS - Arvutiteaduse instituut · PAM – Point Accepted Mutation (Margaret Dayhoff) Depicts the likelyhood of change from one amino ... “Sequence analysis" Bioinformatics

MOBILE LOCATION-BASED APPS - Arvutiteaduse … · MOBILE LOCATION-BASED APPS ... •Mobile Positioning Find location using mobile network ... GPS location – dynamic info Lines,