54
Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & Alexandre Gramfort

Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Optimization for Data ScienceMaster 2 Data Science, Univ. Paris Saclay

Robert M. Gower&

Alexandre Gramfort

Page 2: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Core Info● Where : Telecom ParisTech● Location : Amphi Estaunié or B312 ● ECTS : 5 ECTS● Volume : 40h● When : 12 weeks (including one week break for holidays +

one week for exam)● Online: All teaching materials on moodle: http://datascience-

x-master-paris-saclay.fr/education/● Students upload their projects / reports via moodle too.● All students **must** be registered on moodle.

Page 3: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Who am I? Robert M. Gower

● Assistant Prof at Telecom ● [email protected] ● www.ens.fr/~rgower● Research topics: Stochastic algorithms for optimization,

numerical linear algebra, quasi-Newton methods and automatic differentiation (backpropagation).

Page 4: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Introduction to Optimization in Machine Learning

Robert M. Gower

Master 2 Data Science, Univ. Paris SaclayOptimisation for Data Science

Page 5: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

An Introduction to Supervised Learning

Page 6: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

References for this class

Convex Optimization

Pages 67 to 79

Understanding Machine Learning: From Theory to Algorithms

Chapter 1

Page 7: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Is There a Cat in the Photo?

Yes

No

Page 8: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Is There a Cat in the Photo?

Yes

Page 9: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Is There a Cat in the Photo?

Yes

Page 10: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Is There a Cat in the Photo?

No

Page 11: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Is There a Cat in the Photo?

Yes

Page 12: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Find mapping h that assigns the “correct” target to each input

Is There a Cat in the Photo?

Yes

No

x: Input/Feature y: Output/Target

Page 13: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Labeled Data: The training set

Page 14: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Labeled Data: The training set

y= -1 means no/false

Page 15: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Labeled Data: The training set

Learning Algorithm

y= -1 means no/false

Page 16: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Labeled Data: The training set

Learning Algorithm

y= -1 means no/false

Page 17: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Labeled Data: The training set

Learning Algorithm

-1

y= -1 means no/false

Page 18: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Example: Linear Regression for Height

Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Page 19: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Example Hypothesis: Linear Model

Example: Linear Regression for Height

Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Page 20: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Example Training Problem:

Example Hypothesis: Linear Model

Example: Linear Regression for Height

Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Page 21: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear Regression for Height

Age

Height

Page 22: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear Regression for Height

The Training Algorithm

Age

Height

Page 23: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear Regression for Height

The Training Algorithm

Age

Height

Other options aside from linear?

Page 24: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Parametrizing the HypothesisHeight

Age

Linear:

Polinomial:

Age

Height

Neural Net:

Page 25: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Loss Functions

Why a SquaredLoss?

Page 26: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Loss Functions

Why a SquaredLoss?

Loss Functions

The Training Problem

Page 27: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Loss Functions

Why a SquaredLoss?

Loss Functions

The Training Problem

Typically a convex function

Page 28: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Choosing the Loss Function

Quadratic Loss

Binary Loss

Hinge Loss

Page 29: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Choosing the Loss Function

Quadratic Loss

Binary Loss

Hinge Loss

y=1 in all figures

Page 30: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Choosing the Loss Function

Quadratic Loss

Binary Loss

Hinge Loss

EXE: Plot the binary and hinge loss function in when

y=1 in all figures

Page 31: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Loss Functions

Is a notion of Loss enough?

What happens when we do not have enough data?

Page 32: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Loss FunctionsThe Training Problem

Is a notion of Loss enough?

What happens when we do not have enough data?

Page 33: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting 1st order polynomial

Page 34: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting 1st order polynomial

Page 35: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting 3rd order polynomial

Page 36: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting 9th order polynomial

Page 37: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Regularizor Functions

General Training Problem

Regularization

Page 38: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Regularizor Functions

General Training Problem

Regularization

Goodness of fit, fidelity term ...etc

Page 39: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Regularizor Functions

General Training Problem

Regularization

Goodness of fit, fidelity term ...etc

Penlizes complexity

Page 40: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Regularizor Functions

General Training Problem

Regularization

Goodness of fit, fidelity term ...etc

Penlizes complexity

Controls tradeoff between fit and complexity

Page 41: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Regularizor Functions

General Training Problem

Regularization

Exe:

Goodness of fit, fidelity term ...etc

Penlizes complexity

Controls tradeoff between fit and complexity

Page 42: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting kth order polynomial

Page 43: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Overfitting and Model Complexity

Fitting kth order polynomial

For big enough, λthe solution is a 2nd order polynomial

Page 44: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear hypothesis

Exe: Ridge Regression

Ridge Regression

L2 loss

L2 regularizor

Page 45: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear hypothesis

Exe: Support Vector Machines

SVM with soft margin

Hinge loss

L2 regularizor

Page 46: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Linear hypothesis

Exe: Logistic Regression

Logistic Regression

Logistic loss

L2 regularizor

Page 47: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 48: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 49: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 50: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 51: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 52: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Machine Learners Job

Page 53: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The Statistical Learning Problem:The hard truth

Do we really care if the loss is small on the known labelled data paris (xi,yi) ? Nope

We really want to have a small loss on new unlabelled Observations! Assume data sampled where is an unknown distribution

Page 54: Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

The statistical learning problem:Minimize the expected loss over an unknown expectation

The Statistical Learning Problem:The hard truth

Variance of sample mean: