The Machine Learning Revolution in AI - kennismakers · problems (IQ tests) and reason about priority in trafﬁc • Alternative terms: learning vs reasoning, data-driven vs knowledge

The Machine Learning Revolution in AI

Luc De [email protected]

What is Machine Learning ?

Machine Learning

A machine learns when

• it improves it performance

• on a specific task

• with experience

Central to Artificial Intelligence

There can be no intelligence without learning

Machine = AlphaGo Player Program

Task = playing GO

Performance = % of won games

Experience = huge data base of games + self play

Lee Sedol

AlphaGo

Machine = e-mail program, spamfilter

Task = classify e-mails

Performance = accuracy

Experience = your past input

Spam Filter

Automating Science

Eve, an artificially-intelligent ‘robot scientist’, can make drug discovery faster and much cheaper. [King et al. Nature 04, Science 09]

Robot Scientist Automating Data Science

Why is it useful ?

Why Machine Learning ?

It applies to any application where there is (a lot of) data …

It is very practical

• some programs too complex to program by hand

• easier to generate data than to build programs by hand

• adaptation and personalisation

Why Machine Learning ?

It applies to any application where there is (a lot of) data …

It is very practical

• some programs too complex to program by hand

• easier to generate data than to build programs by hand

• adaptation and personalisation

The enabling technology in

• natural language processing, web search / information retrieval

• computer vision & speech understanding

• robotics (& self-driving cars)

• bioinformatics

• analysing medical EHR & images

• …

How does it work ?

How does it work? Machine learning is all about learning functions

• different types of functions

• different types of data (supervised, unsupervised, reinforcement …)

• different criteria (loss or value function)

• Different schools in machine learning make different choices

f(input) => output.

Where does the data come from ?

learning from examples (supervised / unsupervised)

• good/bad moves ? just moves ?

learning by imitation (Behavioral cloning)

• imitate de world champion

learning from rewards (Reinforcement learning)

• just play, reward = board config. / wins / losses

• the whole AI problem in a nutshell

Donald Michie’s Menace

Donald Michie (2007) Menace (1961)

Machine Educable Noughts And Crosses Engine

slides Menace : thanks to Johannes Fürnkranz

XX

OOX’s move

XX

OO XX

OOX’s move

XX

OO XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO

Choose box on the basis of current

position

XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO


position

XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO


position

XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO


position Execute move

XX

OOX’s move

XXOO

XXO

O

XX O

O

XX

OO


position

X

Execute move

XX

OOX’s move

Menace Machine = 287 “boxes” + pearls

Encodes probabilistic function

• P(box, color) = probability of move

Learning a function

• upon loss: retain all used pearls

• upon winning: put used pearls back + an extra one of the same color

Q�(s, a) = R(s, a) + ��

s�

P (s⇥|s, a) maxa�

Q�(s⇥, a⇥)Richard Belmann

Menace Machine = 287 “boxes” + pearls

Encodes probabilistic function

• P(box, color) = probability of move

Learning a function

• upon loss: retain all used pearls

• upon winning: put used pearls back + an extra one of the same color

Q�(s, a) = R(s, a) + ��

s�


Q�(s⇥, a⇥)

Q�(s, a) = R(s, a) + ��

s�


Q�(s⇥, a⇥)Richard Belmann

Three important points

Learning AND Reasoning needed

• System 1 — thinking fast — can do things like solve “2+2=?” and recognise a car

• System 2 — thinking slow — can reason about complex logic problems (IQ tests) and reason about priority in traffic

• Alternative terms: learning vs reasoning, data-driven vs knowledge driven, symbolic vs sub

• AlphaGo incorporates learning and reasoning

• Machine learned video games — cannot change the rules of the game

There are five schools in ML

Pedro Domingos found it both exciting and scary to see that

president Xi Jinping of China reads his book

Tribe Origins Master Algorithm

Symbolists Logic, philosophy Inverse deduction

Connectionists Neuroscience Backpropagation

Evolutionaries Evolutionary biology Genetic programming

Bayesians Statistics Probabilistic inference

Analogizers Psychology Kernel machines

There are many remaining challenges

• Getting the right data

• bias, fairness, privacy, etc. (ethical concerns)

• Combining learning and reasoning

• Providing explanations and interpretable models

• beyond the deep neural network black-boxes

• Providing guarantees for software

• verification and validation

N. Akhtar, A. Mian: Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

perturbations to make them imperceptible for humans. How-ever, Papernot et al. [60] also created an adversarial attackby restricting the `0-norm of the perturbations. Physically,it means that the goal is to modify only a few pixels inthe image instead of perturbing the whole image to foolthe classifier. The crux of their algorithm to generate thedesired adversarial image can be understood as follows.The algorithm modifies pixels of the clean image one at atime and monitors the effects of the change on the resultingclassification. The monitoring is performed by computinga saliency map using the gradients of the outputs of thenetwork layers. In this map, a larger value indicates a higherlikelihood of fooling the network to predict `target as thelabel of the modified image instead of the original label `.Thus, the algorithm performs targeted fooling. Once the maphas been computed, the algorithm chooses the pixel that ismost effective to fool the network and alters it. This processis repeated until either the maximum number of allowablepixels are altered in the adversarial image or the foolingsucceeds.

5) ONE PIXEL ATTACKAn extreme case for the adversarial attack is when only onepixel in the image is changed to fool the classifier. Inter-estingly, Su et al. [68] claimed successful fooling of threedifferent network models on 70.97% of the tested images bychanging just one pixel per image. They also reported that theaverage confidence of the networks on the wrong labels wasfound to be 97.47%. We show representative examples of theadversarial images from [68] in Fig. 3. Su et al. computedthe adversarial examples by using the concept of DifferentialEvolution [146]. For a clean image Ic, they first created aset of 400 vectors in R5 such that each vector containedxy-coordinates and RGB values for an arbitrary candidatepixel. Then, they randomly modified the elements of thevectors to create children such that a child competes with itsparent for fitness in the next iteration, while the probabilisticpredicted label of the network is used as the fitness criterion.The last surviving child is used to alter the pixel in the image.

FIGURE 3. Illustration of one pixel adversarial attacks [68]: The correctlabel is mentioned with each image. The corresponding predicted label isgiven in parentheses.

Evenwith such a simple evolutionary strategy Su et al. [68]were able to show successful fooling of deep networks.Notice that, differential evolution enables their approach togenerate adversarial examples without having access to anyinformation about the network parameter values or their gra-dients. The only input their technique requires is the proba-bilistic labels predicted by the targeted model.

6) CARLINI AND WAGNER ATTACKS (C&W)A set of three adversarial attacks were introduced by Carliniand Wagner [36] in the wake of defensive distillation againstthe adversarial perturbations [38]. These attacks make theperturbations quasi-imperceptible by restricting their `2, `1and `0 norms, and it is shown that defensive distillation forthe targeted networks almost completely fails against theseattacks. Moreover, it is also shown that the adversarial exam-ples generated using the unsecured (un-distilled) networkstransfer well to the secured (distilled) networks, which makesthe computed perturbations suitable for black-box attacks.Whereas it is more common to exploit the transferabil-

ity property of adversarial examples to generate black-boxattacks, Chen et al. [41] also proposed ‘Zeroth Order Opti-mization (ZOO)’ based attacks that directly estimate the gra-dients of the targeted model for generating the adversarialexamples. These attacks were inspired by C&W attacks.We refer to the original papers for further details on C&Wand ZOO attacks.

7) DEEPFOOLMoosavi-Dezfooli et al. [72] proposed to compute a minimalnorm adversarial perturbation for a given image in an iterativemanner. Their algorithm, i.e. DeepFool initializes with theclean image that is assumed to reside in a region confined bythe decision boundaries of the classifier. This region decidesthe class-label of the image. At each iteration, the algorithmperturbs the image by a small vector that is computed totake the resulting image to the boundary of the polyhydronthat is obtained by linearizing the boundaries of the regionwithin which the image resides. The perturbations added tothe image in each iteration are accumulated to compute thefinal perturbation once the perturbed image changes its labelaccording to the original decision boundaries of the network.The authors show that the DeepFool algorithm is able tocompute perturbations that are smaller than the perturbationscomputed by FGSM [23] in terms of their norm, while havingsimilar fooling ratios.

8) UNIVERSAL ADVERSARIAL PERTURBATIONSWhereas the methods like FGSM [23], ILCM [35], Deep-Fool [72] etc. compute perturbations to fool a network on asingle image, the ‘universal’ adversarial perturbations com-puted by Moosavi-Dezfooli et al. [16] are able to fool anetwork on ‘any’ image with high probability. These image-agnostic perturbations also remain quasi-imperceptible forthe human vision system, as can be observed in Fig. 1.To formally define these perturbations, let us assume that

14414 VOLUME 6, 2018

Akhtar & Mian, IEEE Access

What to expect ?

What does this imply ?

“AI is the new electricity” (Andrew Ng)

Much like the rise of electricity, which started about 100 years ago; AI will revolutionize every major industry. (Industry 4.0)

We will see many intelligent assistants for specific (routine) tasks;

There is a really high potential, AI can bring a lot of good to society;

there are also some caveats

What does this imply ?AI as the magic wand

• There is a lot of hype; the expectations are often unrealistic

• The press (and the GAFA companies doing AI) create sensational stories — on purpose (?)

• Abuse of the term AI:

• everything is AI and everybody is jumping on the wagon

AI summers and winters

cf. Gartner hype cycle for emerging technologies

Take away

• Insight into the nature of AI and ML

• AI & ML have a lot of potential, they are here to stay

• Go for a broad view on AI, we need all schools of ML, we need learning and reasoning, there are remaining challenges

• Beware of the hype & learn from the past !

Documents

The Machine Learning Revolution in AI - kennismakers · problems (IQ tests) and reason about priority in trafﬁc • Alternative terms: learning vs reasoning, data-driven vs knowledge