Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science

Optimization for Data ScienceMaster 2 Data Science, Univ. Paris Saclay

Robert M. Gower&

Alexandre Gramfort

Core Info● Where : Telecom ParisTech● Location : Amphi Estaunié or B312 ● ECTS : 5 ECTS● Volume : 40h● When : 12 weeks (including one week break for holidays +

one week for exam)● Online: All teaching materials on moodle: http://datascience-

x-master-paris-saclay.fr/education/● Students upload their projects / reports via moodle too.● All students **must** be registered on moodle.

Who am I? Robert M. Gower

● Assistant Prof at Telecom ● [email protected] ● www.ens.fr/~rgower● Research topics: Stochastic algorithms for optimization,

numerical linear algebra, quasi-Newton methods and automatic differentiation (backpropagation).

http://www.ens.fr/~rgower

Introduction to Optimization in Machine Learning

Robert M. Gower

Master 2 Data Science, Univ. Paris SaclayOptimisation for Data Science

An Introduction to Supervised Learning

References for this class

Convex Optimization

Pages 67 to 79

Understanding Machine Learning: From Theory to Algorithms

Chapter 1

Is There a Cat in the Photo?

Yes

No


Yes


Yes


No


Yes

Find mapping h that assigns the “correct” target to each input


Yes

No

x: Input/Feature y: Output/Target

Labeled Data: The training set


y= -1 means no/false


Learning Algorithm



Learning Algorithm



Learning Algorithm

-1


Example: Linear Regression for Height

Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Example Hypothesis: Linear Model


Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Example Training Problem:

Example Hypothesis: Linear Model


Sex Male

Age 30

Height 1,72 cm

Sex Female

Age 70

Height 1,52 cm

Labeled data

Linear Regression for Height

Age

Height


The Training Algorithm

Age

Height


The Training Algorithm

Age

Height

Other options aside from linear?

Parametrizing the HypothesisHeight

Age

Linear:

Polinomial:

Age

Height

Neural Net:

Loss Functions

Why a SquaredLoss?

Loss Functions

Why a SquaredLoss?

Loss Functions

The Training Problem

Loss Functions

Why a SquaredLoss?

Loss Functions

The Training Problem

Typically a convex function

Choosing the Loss Function

Quadratic Loss

Binary Loss

Hinge Loss


Quadratic Loss

Binary Loss

Hinge Loss

y=1 in all figures


Quadratic Loss

Binary Loss

Hinge Loss

EXE: Plot the binary and hinge loss function in when

y=1 in all figures

Loss Functions

Is a notion of Loss enough?

What happens when we do not have enough data?

Loss FunctionsThe Training Problem

Is a notion of Loss enough?

What happens when we do not have enough data?

Overfitting and Model Complexity

Fitting 1st order polynomial


Fitting 1st order polynomial


Fitting 3rd order polynomial


Fitting 9th order polynomial

Regularizor Functions

General Training Problem

Regularization



Regularization

Goodness of fit, fidelity term ...etc



Regularization


Penlizes complexity



Regularization


Penlizes complexity

Controls tradeoff between fit and complexity



Regularization

Exe:


Penlizes complexity

Controls tradeoff between fit and complexity


Fitting kth order polynomial


Fitting kth order polynomial

For big enough, λthe solution is a 2nd order polynomial

Linear hypothesis

Exe: Ridge Regression

Ridge Regression

L2 loss

L2 regularizor

Linear hypothesis

Exe: Support Vector Machines

SVM with soft margin

Hinge loss

L2 regularizor

Linear hypothesis

Exe: Logistic Regression

Logistic Regression

Logistic loss

L2 regularizor

The Machine Learners Job






The Statistical Learning Problem:The hard truth

Do we really care if the loss is small on the known labelled data paris (xi,yi) ? Nope

We really want to have a small loss on new unlabelled Observations! Assume data sampled where is an unknown distribution

The statistical learning problem:Minimize the expected loss over an unknown expectation

The Statistical Learning Problem:The hard truth

Variance of sample mean:

Documents

Optimization for Data Science - Télécom ParisTech · Optimization for Data Science Master 2 Data Science, Univ. Paris Saclay Robert M. Gower & ... Optimisation for Data Science