Deep learning neural networks and AI explained

[email protected]

@tommasogritti

Deep Neural Network intuition

Embeddings

Transfer Learning

Tips

Outline

Deep Neural Network omnipresence

https://trends.google.com/trends/explore?date=2008-03-09%202017-04-09&q=artificial%20intelligence,machine%20learning,deep%20learning







… or almost


Applications

http://www.yaronhadad.com/deep-learning-most-amazing-applications/

http://hi.cs.waseda.ac.jp/~iizuka/projects/colorization/en/

https://arxiv.org/pdf/1702.00783.pdf?xtor=AL-32280680

https://www.youtube.com/watch?v=pW6nZXeWlGM

http://www.nature.com/polopoly_fs/1.21398!/menu/main/topColumns/topLeftColumn/pdf/542016a.pdf

https://www.google.com/get/sunroof#p=0

https://youtu.be/aKed5FHzDTw

https://youtu.be/tf7IEVTDjng

Human1011 neurons 104 synapses per neuron1016 “operations” / sec250 M neurons per mm3 180,000 km of “wires” 25 Watts

Deep Neural Networks sound coolGPU8x1012 operations / sec 500 Watts 5760 (small) cores $2000

Toy example

Num website visits

Num page visited

Average time on page

Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

Toy example

Num website visits

Num page visited


Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num website visits” does not seem to influence output

Toy example

Num website visits

Num page visited


Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Num page visited” above 9 seems to be a good threshold, but even when =1 a person can convert ⇒ no simple threshold

Toy example

Num website visits

Num page visited


Converted?

1 13 55s 1

2 1 141s 1

1 8 10s 0

3 5 127s 0

2 3 18s 0

“Avg time on page” > 128 seems to be a good threshold, but even when =55 a person can convert ⇒ no simple threshold

Toy example

1

13

55

??

Num website visits

Num page visited


User converted?w1=??

w2=??

w3=??

> 0 converted< 0 not converted

multiply

sum

1*w1 + 13*w2 + 55*w3 > 0

Toy example

1

13

55

3.58

Num website visits

Num page visited


User converted?-7.04

0.28

0.12

multiply

sum

> 0 ? YES

Toy example

3

5

127

-3.76

Num website visits

Num page visited


User converted?

multiply

sum

> 0 ? NO

-7.04

0.28

0.12

Toy example

Num website visits

Num page visited


3

5

-7.04

0.28

0.12127

Method #1

Num website visits

Num page visited


3

5

-2.4

0.91

0.013127

Method #2

Num website visits

Num page visited


3

5

-3.9

0.21

0.03127

Method #3

Num website visits

Num page visited


3

5

-1.1

0.83

0.18127

Method #4

Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited


Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited


input layerhidden layer

output layer

Toy example

1

13

55

Method #1

Method #2

Method #3

Method #4

Final estimate

Num website visits

Num page visited


input layerhidden layer

output layer

Deep = lots of hidden layers

http://www.asimovinstitute.org/neural-network-zoo/

Lots of configurations

Open source toolkits

Neural Networks - Take Home Message● Applicable to endless domains: object recognition, medical imaging,

automotive, finance, robotics, natural language processing, translation

systems, speech recognition

● At the simplest levels only a series of nodes doing sums & thresholding

● Lots of variety

● Lots of open source tools

Embeddings

Context: object recognitionAutomatically classify product images into 1000s

of categories

Dress Boot

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

Image Classifier (old school)

0.2-0.30.150.750.11……

0.93

V =

Input Image Feature extraction Image features

Image Features (old school)

Classifier

f(V) > 0

Image dataset Image Features Classifier

● Edges● Contrast● Local patterns● colors

● Adaboost● SVM● Random

Forests● Neural Network

10% 45% 45%

Effort (old school)

...still in use today

DatasetData gathered from 100s of scraped webshops

Dataset5 million products, uncategorised

Datasetuncategorised

● Keywords filtering

● Visual clustering

● Human inspection

~500 labelled classes~ 1000 images / class

Image classifier (the new way)

Deep Convolutional Neural Network (DCNN)~500 labelled classes~1000 images / class

Backpropagation + Gradient descent


Forward passtraining imagelabel: pans

















predicted label=shoe


training imagelabel: pans




training imagelabel: pans


Backpropagation + Gradient descent= update weights “towards target”


Forward pass


● Repeat for all training images

● Repeat till stopping criteria

Effort (the new way)

50% 50%

~500 labelled classes~1000 images / class Deep Convolutional Neural Network (DCNN)

What is going on in the network?

Dress

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png


predicted label=pans

http://vision03.csail.mit.edu/cnn_art/data/single_layer.png


AbstractImage

“concepts”

Low levelImage

“concepts”

Embedding = self-learnt descriptors

Abstract level concept / descriptor

Dress

0.2-0.30.150.750.11……

0.93

Distance in embedding space

E(a), E(b), E(c)

a

b

c

d ( , ) >> d ( , )E(a) E(b) E(c) E(b)

Distance in embedding space

Bracelets (unsorted)

Bracelets (sorted on embedding)

Bracelets (sorted on embedding)

Shoes (unsorted)

Shoes (sorted on embedding)

Shoes (sorted on embedding)

Iterative refinementNewly discovered

classesRe-train classifier Results

95% from 5M products classified with confidence > 96%

More than 250 new labeled categories

Context: identity recognitionAutomatically recognize celebrities from

red carpet events

Jennifer Aniston

LL Cool J

Embedding training

Triplet Loss

Train network to discriminate between triplets of images

Triplets

Training

Random embedding initialization

Embedding training

Trained embedding

Celebrity identifier





NLP - Word embeddings

https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

With a different network setup we can learn an embedding for words:

NLP - Word embeddings

Each word is represented by a vector. Vector allow to explore very

interesting relationships learnt automatically from the data:

● King - man + woman → queen

● Paris - France + Italy → Rome

● Obama - USA + Russia → Putin

● President - power → prime minister

Embeddings - Take Home Message● From feature engineering to data collection

● Neural Networks automatically learn relevant high level

abstractions

● Embedding spaces very useful to explore data

● Application areas: retrieval or ranking tasks (e.g. product

recommendation, customer segmentation), classification

Transfer learning

ImageNet

1.5 million training examples

1000 categories

Training time ~ days on best GPUs

Transfer Learningrandomly initialized

weightsImageNet


weightsImageNet Network trained to

classify 1000 classes

Classify correctly (>90%) images in

1000 classes

Transfer LearningNew data

New classes

?



classify 1000 classesFine-tune model(update weights)

New dataNew classes



classify 1000 classesFine-tune model(update weights)

New dataNew classes● Faster training time

● Better performance

Sharing pre-trained models● Model-Zoo:

https://github.com/BVLC/caffe/wiki/Model-Zoo

● Common format to share pre-trained models

● Active discussion and contributions



Sharing pre-trained models

Transfer Learning - recommendations

small; similar

large; similar

small; different

large; different

Similarity of the data

Size

of d

atab

ase


Small; similar

large; similar

small; different

large; different

Use existing embedding


Size

of d

atab

ase


Small; similar

large; similar

small; different

large; different


Fine-tune complete network


Size

of d

atab

ase


Small; similar

large; similar

small; different

large; different



Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)


Size

of d

atab

ase


Small; similar

large; similar

small; different

large; different



Use activations from earlier

in the network

Fine-tune complete network (or start from scratch)


Size

of d

atab

ase

Transfer Learning - Take Home Message● Faster progress

● Training also with much smaller amount of data

● Check closest available model before starting from scratch

Should we all go

deep?

Some questions you should ask● What is the performance of the baseline?

○ What can be achieved with a simpler system?

○ Can we start testing the value proposition with a simpler system?


● How much training data is required?



● Do we have the data, can we acquire it or how long does it take to collect it?




● Do we need labeled data or can we use unlabeled data?





● How well does it work on data it has never seen? Generalization / Overfitting






● What are the failure cases?







● How reliable is the confidence of the prediction?







● How reliable is the confidence of the prediction?

● Can we explain why a prediction has been made?

[email protected]

@tommasogritti

Thank you!

Data & Analytics

Deep learning neural networks and AI explained