65
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004 1 Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society Tübingen, Germany http://www.tuebingen.mpg.de/~raetsch Machine Learning in Science and Engineering

212 Machine Learning Slides

Embed Size (px)

DESCRIPTION

very foolish people

Citation preview

Page 1: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

1

Gunnar Rätsch

Friedrich Miescher Laboratory

Max Planck Society

Tübingen, Germany

http://www.tuebingen.mpg.de/~raetsch

Machine Learning in

Science and Engineering

Page 2: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

2

Roadmap

• Motivating Examples

• Some Background

• Boosting & SVMs

• Applications

Rationale: Let computers learn to automate

processes and to understand highly

complex data

Page 3: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

3

Example 1: Spam Classification

From: [email protected]: CongratulationsDate: 16. December 2004 02:12:54 MEZ

LOTTERY COORDINATOR,INTERNATIONAL PROMOTIONS/PRIZE AWARD DEPARTMENT.SMARTBALL LOTTERY, UK.

DEAR WINNER,

WINNER OF HIGH STAKES DRAWS

Congratulations to you as we bring to your notice, theresults of the the end of year, HIGH STAKES DRAWS ofSMARTBALL LOTTERY UNITED KINGDOM. We are happy to inform youthat you have emerged a winner under the HIGH STAKES DRAWSSECOND CATEGORY,which is part of our promotional draws. Thedraws were held on15th DECEMBER 2004 and results are beingofficially announced today. Participants were selectedthrough a computer ballot system drawn from 30,000names/email addresses of individuals and companies fromAfrica, America, Asia, Australia,Europe, Middle East, andOceania as part of our International Promotions Program.

From: [email protected]: ML Positions in Santa CruzDate: 4. December 2004 06:00:37 MEZ

We have a Machine Learning positionat Computer Science Department ofthe University of California at Santa Cruz(at the assistant, associate or full professor level).

Current faculty members in related areas:Machine Learning: DAVID HELMBOLD and MANFRED WARMUTHArtificial Intelligence: BOB LEVINSONDAVID HAUSSLER was one of the main ML researchers in ourdepartment. He now has launched the new Biomolecular Engineeringdepartment at Santa Cruz

There is considerable synergy for Machine Learning at SantaCruz:-New department of Applied Math and Statistics with an emphasison Bayesian Methods http://www.ams.ucsc.edu/-- New department of Biomolecular Engineeringhttp://www.cbse.ucsc.edu/

Goal: Classify emails into spam / no spam

How? Learn from previously classified emails!

Training: analyze previous emails

Application: classify new emails

Page 4: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

4

Example 2: Drug Design

Actives

InactivesChemist

ONH

N

NN

Cl

OH

OH

N

F

F

F

N

OH

N

Cl

N

OH

S

O

O

O

O

N

NH

Cl

OH

N

OH

OH

O

OOH

OH

Page 5: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

5

The Drug Design Cycle

Actives

InactivesChemist

former CombiChem

technology

ONH

N

NN

Cl

OH

OH

N

F

F

F

N

OH

N

Cl

N

OH

S

O

O

O

O

N

NH

Cl

OH

N

OH

OH

O

OOH

OH

Page 6: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

6

The Drug Design Cycle

former CombiChem

technology

Actives

InactivesLearning

Machine

former CombiChem

technology

ONH

N

NN

Cl

OH

OH

N

F

F

F

N

OH

N

Cl

N

OH

S

O

O

O

O

N

NH

Cl

OH

N

OH

OH

O

OOH

OH

Page 7: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

7

Example 3: Face Detection

Page 8: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

8

Premises for Machine Learning

• Supervised Machine Learning

• Observe N training examples with label

• Learn function

• Predict label of unseen example

• Examples generated from statistical process

• Relationship between features and label

• Assumption: unseen examples are generatedfrom same or similar process

Page 9: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

9

Problem Formulation

···

Natural

+1

Natural

+1

Plastic

-1

Plastic

-1 ?

The “World”:

• Data

• Unknown Target Function

• Unknown Distribution

• Objective

Problem: is unknown

Page 10: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

10

Problem Formulation

Page 11: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

11

Example: Natural vs. Plastic Apples

Page 12: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

12

Example: Natural vs. Plastic Apples

Page 13: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

13

Example: Natural vs. Plastic Apples

Page 14: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

14

AdaBoost (Freund & Schapire, 1996)

•Idea:

• Use simple many “rules of thumb”

• Simple hypotheses are not perfect!

• Hypotheses combination => increased accuracy

• Problems

• How to generate different hypotheses?

• How to combine them?

• Method

• Compute distribution on examples

• Find hypothesis on the weighted sample

• Combine hypotheses linearly:

Page 15: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

15

Boosting: 1st iteration (simple hypothesis)

Page 16: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

16

Boosting: recompute weighting

Page 17: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

17

Boosting: 2nd iteration

Page 18: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

18

Boosting: 2nd hypothesis

Page 19: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

19

Boosting: recompute weighting

Page 20: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

20

Boosting: 3rd hypothesis

Page 21: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

21

Boosting: 4rd hypothesis

Page 22: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

22

Boosting: combination of hypotheses

Page 23: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

23

Boosting: decision

Page 24: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

24

AdaBoost Algorithm

Page 25: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

25

AdaBoost algorithm

• Combination of

• Decision stumps/trees

• Neural networks

• Heuristic rules

• Further reading

• http://www.boosting.org

• http://www.mlss.cc

Page 26: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

26

Linear Separation

property 1

pro

pert

y 2

Page 27: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

27

Linear Separation

property 1

?

pro

pert

y 2

Page 28: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

28

Linear Separation with Margins

property 1

pro

pert

y 2

property 1

?

large margin => good generalization

{margin

pro

pert

y 2

Page 29: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

29

Large Margin Separation

{margin

Idea:

• Find hyperplane

that maximizes margin

(with )• Use for prediction

Solution:

• Linear combination of examples• many ’s are zero

• Support Vector Machines

Demo

Page 30: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

30

Kernel Trick

Linear in

input space

Linear in

feature space

Non-linear in

input space

Page 31: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

31

Example: Polynomial Kernel

Page 32: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

32

Support Vector Machines

• Demo: Gaussian Kernel

• Many other algorithms can use kernels

• Many other application specific kernels

Page 33: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

33

Capabilities of Current Techniques

• Theoretically & algorithmically well understood:

• Classification with few classes

• Regression (real valued)

• Novelty Detection

Bottom Line: Machine Learning

works well for relatively simple

objects with simple properties

• Current Research

• Complex objects

• Many classes

• Complex learning setup (active learning)

• Prediction of complex properties

Page 34: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

34

Capabilities of Current Techniques

• Theoretically & algorithmically well understood:

• Classification with few classes

• Regression (real valued)

• Novelty Detection

Bottom Line: Machine Learning

works well for relatively simple

objects with simple properties

• Current Research

• Complex objects

• Many classes

• Complex learning setup (active learning)

• Prediction of complex properties

Page 35: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

35

Capabilities of Current Techniques

• Theoretically & algorithmically well understood:

• Classification with few classes

• Regression (real valued)

• Novelty Detection

Bottom Line: Machine Learning

works well for relatively simple

objects with simple properties

• Current Research

• Complex objects

• Many classes

• Complex learning setup (active learning)

• Prediction of complex properties

Page 36: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

36

Capabilities of Current Techniques

• Theoretically & algorithmically well understood:

• Classification with few classes

• Regression (real valued)

• Novelty Detection

Bottom Line: Machine Learning

works well for relatively simple

objects with simple properties

• Current Research

• Complex objects

• Many classes

• Complex learning setup (active learning)

• Prediction of complex properties

Page 37: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

37

Capabilities of Current Techniques

• Theoretically & algorithmically well understood:

• Classification with few classes

• Regression (real valued)

• Novelty Detection

Bottom Line: Machine Learning

works well for relatively simple

objects with simple properties

• Current Research

• Complex objects

• Many classes

• Complex learning setup (active learning)

• Prediction of complex properties

Page 38: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

38

Many Applications

• Handwritten Letter/Digit recognition

• Face/Object detection in natural scenes

• Brain-Computer Interfacing

• Gene Finding

• Drug Discovery

• Intrusion Detection Systems (unsupervised)

• Document Classification (by topic, spam mails)

• Non-Intrusive Load Monitoring of electric appliances

• Company Fraud Detection (Questionaires)

• Fake Interviewer identification in social studies

• Optimized Disk caching strategies

• Optimal Disk-Spin-Down prediction

• …

Page 39: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

39

MNIST Benchmark

SVM with polynomial kernel(considers d-th order correlations of pixels)

Page 40: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

40

MNIST Error Rates

Page 41: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

41

2. Search

Classifierfacenon-face

1.

525,820 patches=

7

1

)1(27.0450600l

l

Face Detection

Page 42: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

42

Note: for “easy” patches, aquick and inaccurateclassification is sufficient.

Method: sequentialapproximation of the classifierin a Hilbert space

Result: a set of face detectionfilters

Romdhani, Blake, Schölkopf, & Torr, 2001

Fast Face Detection

Page 43: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

431 Filter, 19.8% patches left

Example: 1280x1024 Image

Page 44: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

4410 Filters, 0.74% Patches left

Example: 1280x1024 Image

Page 45: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

4520 Filters, 0.06% Patches left

Example: 1280x1024 Image

Page 46: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

4630 Filters, 0.01% Patches left

Example: 1280x1024 Image

Page 47: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

4770 Filters, 0.007 % patches left

Example: 1280x1024 Image

Page 48: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

48

Single Trial Analysis of EEG:towards BCI

Gabriel Curio Benjamin Blankertz Klaus-Robert Müller

Intelligent Data Analysis Group, Fraunhofer-FIRSTBerlin, Germany

Neurophysics GroupDept. of NeurologyKlinikum BenjaminFranklinFreie UniversitätBerlin, Germany

Page 49: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

49

Cerebral Cocktail Party Problem

Page 50: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

50

The Cocktail Party Problem

How to decompose superimposed signals?

Analogous Signal Processing problem as for cocktail party problem

Page 51: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

51

The Cocktail Party Problem

• input: 3 mixed signals

• algorithm: enforce independence(“independent component analysis”)via temporal de-correlation

• output: 3 separated signals

(Demo: Andreas Ziehe, Fraunhofer FIRST, Berlin)

"Imagine that you are on the edge of a lake and a friend challenges you to play a game. The game

is this: Your friend digs two narrow channels up from the side of the lake […]. Halfway up each one,

your friend stretches a handkerchief and fastens it to the sides of the channel. As waves reach the

side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You

are allowed to look only at the handkerchiefs and from their motions to answer a series of

questions: How many boats are there on the lake and where are they? Which is the most powerful

one? Which one is closer? Is the wind blowing?” (Auditory Scene Analysis, A. Bregman )

Page 52: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

52

Minimal Electrode Configuration

• coverage: bilateral primary sensorimotor cortices

• 27 scalp electrodes

• reference: nose

• bandpass: 0.05 Hz - 200 Hz

• ADC 1 kHz

• downsampling to 100 Hz

• EMG (forearms bilaterally): m. flexor digitorum

• EOG

• event channel: keystroke timing (ms precision)

Page 53: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

53

Single Trial vs. Averaging

-500 -400 -300 -200 -100 0 [ms]

-15-10-505

1015

-500 -400 -300 -200 -100 0 [ms]

-15-10-5051015

[µV]

-600 -500 -400 -300 -200 -100 0 [ms]

-15-10-5051015

-600 -500 -400 -300 -200 -100 0 [ms]

-15-10-5051015

[µV]

LEFThand

(ch. C4)

RIGHThand

(ch. C3)

Page 54: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

54

BCI Setup

ACQUISITION modes:- few single electrodes- 32-128 channel electrode caps- subdural macroelectrodes- intracortical multi-single-units

EEG parameters:- slow cortical potentials- µ/_ amplitude modulations- Bereitschafts-/motor-potential

TASK alternatives:- feedback control- imagined movements- movement (preparation)- mental state diversity

Page 55: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

55

Page 56: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

56

Finding Genes on Genomic DNASplice Sites: on the boundary

• Exons (may code for protein)

• Introns (noncoding)

Coding region starts with Translation Initiation Site (TIS: “ATG”)

Page 57: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

57

Application: TIS Finding

GMD.SCAIInstitute for Algorithmsand Scientific Computing

Alexander Zien

Thomas LengauerGMD.FIRSTInstitute forComputer Architectureand Software Technology

Gunnar Rätsch

Sebastian Mika

Bernhard Schölkopf

Klaus-Robert Müller

Engineering Support Vector Machine (SVM) Kernels

That Recognize Translation Initiation Sites (TIS)

Page 58: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

58

TIS Finding: Classification Problem

• Build fixed-length sequence

representation of candidates

• Select candidate positions

for TIS by looking for ATG

• Transform sequence into

representaion in real space

A (1,0,0,0,0)

C (0,1,0,0,0)

G (0,0,1,0,0)

T (0,0,0,1,0)

N (0,0,0,0,1)

1000-dimensional real space

(...,0,1,0,0,0,0,...)

Page 59: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

59

2-class Splice Site DetectionWindow of 150nt

around known splice sites

Positive examples: fixed window around a true splice site

Negative examples: generated by shifting the window

Design of new Support Vector Kernel

Page 60: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

60

The Drug Design Cycle

former CombiChem

technology

Actives

InactivesLearning

Machine

former CombiChem

technology

ONH

N

NN

Cl

OH

OH

N

F

F

F

N

OH

N

Cl

N

OH

S

O

O

O

O

N

NH

Cl

OH

N

OH

OH

O

OOH

OH

Page 61: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

61

Three types of Compounds/Points

actives

inactives

untested

few

more

plenty

Page 62: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

62

Shape/Feature Descriptor

Shape/Feature Signature~105 bits

Shape jShape i

bit number

254230

bit =Shape

Feature type

Feature location

S. Putta, A Novel Shape/Feature Descriptor, 2001

Page 63: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

63

Maximizing the Number of Hits

Total number of

active examples

selected

after each batch

Largest Selection Strategy

On Thrombin

dataset

Page 64: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

64

Concluding Remarks

• Computational Challenges• Algorithms can work with 100.000’s of examples

(need operations

• Usually model parameters to be tuned

(cross-validation computationally expensive)

• Need computer clusters and

Job scheduling systems (pbs, gridengine)

• Often use MATLAB

(to be replaced by python: help!)

• Machine learning is an exciting research area …

• … involving Computer Science, Statistics & Mathematics

• … with…• a large number of present and future applications (in all situations

where data is available, but explicit knowledge is scarce)…

• an elegant underlying theory…

• and an abundance of questions to study.

New computational biology group in Tübingen: looking for people to hire

Page 65: 212 Machine Learning Slides

Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004

65

Thanks for Your Attention!

Collegues & Contributors: K. Bennett, G. Dornhege, A.

Jagota, M. Kawanabe, J. Kohlmorgen, S. Lemm, C. Lemmen, P.

Laskov, J. Liao, T. Lengauer, R. Meir, S. Mika, K-R. Müller,

T. Onoda, A. Smola, C. Schäfer, B. Schölkopf, R. Sommer, S.

Sonnenburg, J. Srinivasan, K. Tsuda, M. Warmuth, J. Weston,

A. Zien

Gunnar Rätsch

http://www.tuebingen.mpg.de/~raetsch

[email protected]

Special Thanks: Nora Toussaint, Julia Lüning, Matthias Noll