Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
10/12/2019 @
Fabien VAUCHELLESzelros.com / [email protected] / @fabienvhttp://bit.ly/ml-aismarttech
INTRODUCTION
I WILL TELL YOU THE TRUTH
JACK KNEW HE WOULD DIE
JACK KNEW HIS FUTURE
JACK
LIFE DECISION PATH
Men
Women
JACK
LIFE DECISION PATH
Men
Women
JACK Child
Adult
LIFE DECISION PATH
Men
Women
Child
Adult
1st class
3th class
JACK
LIFE DECISION PATH
Men
Women
Child
Adult
1st class
3th class
GAME OVER
JACK
LIFE DECISION PATH
70%
BACK TO MACHINE LEARNING
If you know the passengers list:
- Gender- Age- Ticket class- Does he survived ?
You can create a Decision Tree … … for this Supervised Problem !
REAL LIFE IS DIFFERENT...
KEEP COOL AND USE MACHINE LEARNING
ZELROS // FABIEN VAUCHELLES
Speech Recognition
NLP
ComputerVision
Knowledge
AI coachInsurer Data Insurance ML / Predictive models Insurance Advisor
What is Machine Learning
ML SOLVE PROBLEMS WHICH ARE UNSOLVABLE BY CONVENTIONAL METHODS
What is the type of problem
TYPE OF PROBLEM
SUPERVISEDLEARNING
UNSUPERVISEDLEARNING
SUPERVISED LEARNING
Men
Women
Child
Adult
1st class
3th class
GAME OVER
JACK
70%
UNSUPERVISED LEARNING
age
satisfaction
UNSUPERVISED LEARNING
age
satisfaction
UNSUPERVISED LEARNING
age
satisfaction
What is the goal of ML algorithms
MINIMIZE THE ERROR
How to start a ML problem
AS DEVELOPWE USE TEST DRIVEN DEVELOPMENT
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
THE PROCESS OF DATA SCIENCE
ANALYZE DATA
VALIDATE THE PREDICTION
TRAIN THE MODEL1 2
3
ANALYZE DATA
Do you speakData
DATA LANGUAGE
NAME AGE CLASS DIED ?John 23 3 Yes
Marry 31 1 NoHenry 23 2 Yes
Nicolas 41 1 NoAnna 18 3 Yes
Dataset
DATA LANGUAGE
NAME AGE CLASS DIED ?John 23 3 Yes
Marry 31 1 NoHenry 23 2 Yes
Nicolas 41 1 NoAnna 18 3 Yes
Feature
DATA LANGUAGE
NAME AGE CLASS DIED ?John 23 3 Yes
Marry 31 1 NoHenry 23 2 Yes
Nicolas 41 1 NoAnna 18 3 Yes
Target
DATA LANGUAGE
NAME AGE CLASS DIED ?John 23 3 Yes
Marry 31 1 NoHenry 23 2 Yes
Nicolas 41 1 NoAnna 18 3 Yes
Sample
DEMO TIME !
How to visualizea feature
EXCEL WON’T HELP YOU ON BIG DATA
YOU LOVE STATISTIC METHODS
BOX PLOT
Median: 50%
BOX PLOT
5% 25% 50% 75% 95%
Quantile 1 Quantile 2 Quantile 3 Quantile 4Outlier
DEMOTIME !
Why do we care of missing data
ALGORITHMS DON’T LIKE MISSING DATA
MISSING VALUES
10Nan20Nan30
10Nan20Nan30
102030
MISSING VALUES
BAD !!!
MISSING VALUES
10Nan20Nan30
10Nan20Nan30
10020030
BAD !!!
MISSING VALUES
Fill empty value with median: 20
10Nan20Nan30
1020202030
GOOD !!!
MISSING VALUES
DEMO TIME !
CREATE FEATURES
Why do we createArtificial Features
ARTIFICIAL FEATURES HELP ALGORITHMS TO HAVE BETTER PREDICTION
I DON’T UNDERSTAND TEXT !
UNDERSTAND CATEGORIES
NAME GENDERJohn male
Marry femaleHenry male
Nicolas maleAnna female
UNDERSTAND CATEGORIES
NAME GENDERJohn male
Marry femaleHenry male
Nicolas maleAnna female
NAME GENDERJohn 1
Marry 2Henry 1
Nicolas 1Anna 2
LabelEncoder
2 > 1 !!!
UNDERSTAND CATEGORIES
NAME GENDERJohn male
Marry femaleHenry male
Nicolas maleAnna female
NAME GENDER_MALE GENDER_FEMALEJohn 1 0
Marry 0 1Henry 1 0
Nicolas 1 0Anna 0 1
OneHotEncoder
DEMO TIME !
TRAIN THE MODEL
FIT & PREDICT
FIT MODEL PREDICT1 2
FIT & PREDICT
FIT MODEL PREDICT1 2
What can we predict
CLASSIFICATION
Do we have survived on Titanic ?NAME AGE CLASS DIED ?John 23 3 Yes
Marry 31 1 NoHenry 23 2 Yes
Nicolas 41 1 NoAnna 18 3 ?
REGRESSION
What is the price of the ticket ?NAME AGE CLASS FAREJohn 23 3 71
Marry 31 1 8Henry 23 2 53
Nicolas 41 1 7Anna 18 3 ?
What algorithmcan we choose
ALGORITHMS
DECISION TREE
COMPUTER VISION
LINEAR
CLUSTERING
BAYESIAN
NLP
Linear Regression
Logistic RegressionConvolutional Neural
Network
GAN
Recurrent Neural Network
Gaussian Naive Bayes
Multinomial Naive Bayes Bayesian Network
k-Means
k-Medians
Hierarchical Clustering
Perceptron
Random Forest
CATboost
XGBoostTF-IDF
Word2Vec
BERTGPT2
Siamese Network
LightGBM
What is Linear Regression
REGRESSION
y
X
h(X)=θ0+θ1X
/ LINEAR
REGRESSION / POLYNOMIAL
y
X
h(X)=θ0+θ1X+θ2X2
REGRESSION / POLYNOMIAL
y
X
h(X)=θ0+θ1X+θ2X2+θ3X
3
REGRESSION / POLYNOMIAL
y
X
OVERFITTING !!!
h(X)=θ0+θ1X+θ2X2+θ3X
3
REGRESSION / POLYNOMIAL
y
X
h(X)=θ0+θ1X+θ2X2
REGRESSION / COST FUNCTION
J=∑(h(X)-y)2prediction real value
Sum of errors :
mcount
REGRESSION / COST FUNCTION
h(X)=θ0+θ1X+θ2X2
REGRESSION / COST FUNCTION
h(X)=θ0+θ1X+θ2X2+θ3X
3
DEMO TIME !
VALIDATETHE PREDICTION
How to validatea trained model
TRAIN & TEST
X
y
X
y
y_predictΔ
TEST (30%)TRAIN (70%)
CROSS VALIDATION
X
y
X
y
y_predictΔ
TEST (30%)TRAIN (70%)
x5
random
PREDICT
X
y_predict
NEW DATA
DOES JACK SURVIVE ?
DEMO TIME !
RESOURCES
● Coursera Machine Learninghttps://www.coursera.org/learn/machine-learning
● Coursera Deep Learninghttps://www.coursera.org/specializations/deep-learning
● Kagglehttp://www.kaggle.com
ANY QUESTIONS ?
zelros.com / [email protected] / @fabienvhttp://bit.ly/ml-aismarttech