Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
THE PORTUGESE BANK’s DIRECT MARKETING
CAMPAIGN
GOALTopredictiftheclientwillsubscribetothebank’sterm
depositthroughthecampaignbasedoncalls.
ROADMAPv Understandingthedatav ExploratoryDataAnalysisv FeatureSelectionandEngineeringv Choosingthemodelv Evaluationmetrics
THEFIRSTLOOK20Features
• Numberofrows=41188• CategoricalFeatures=10• NumericalFeatures=10• Fewunknownvalues
ClassImbalanceintarget
INSIGHTSFROMDATA
Preferablecontacttype
Frequencyofsubscriptiondependsonjobtitle.AsAdminandTechnicalrolesarestableroles.
Monthseemstobeanimportantfeatureasthedataisevenlydistributed.Highly,likelytouseHotEncoding.
CORRELATIONNumericalversusNumerical Categorical(housing)versusCategorical
CategoricalColumns P-Value
job 0.0900
y 0.0583
marital 0.0442
education 0.0118
default 0.0103
day_of_week 0.0012
poutcome 0.0000
month 0.0000
contact 0.0000
loan 0.0000
NumericalversusCategorical
HeatMap,CrosstabandChi-Squaredmethodtoidentifycorrelationsbetweendifferentvariables
FEATURESELECTION&ENGINEERINGq BasedonRandomForestEstimatorandDecisionTreeq FeatureImportancepredictedq Top7commonfeaturesareselectedfromboththemethods
q Binscreatedonthecolumnsage,campaign.q HandlingOutliersq Standardizedthecolumneurobi3musingminandmaxq Labelencodedonallthecategoricalvariablesq MissingValuesHandledq OversamplingfortheImbalancedClassthroughrandomoversamplingandSMOTE
• age• euribor3m• job• campaign• education• day_of_week• marital',housing'
RESULTSDataspitted(80–20)randomlytotrainandtestthealgorithms
Addingthecolumn‘Duration’tothemodelincreasestheefficiencyby12%butthecolumnisnotusedtopredictthesubscribersasdurationisnotknownbeforethecallisperformed.
FEATURES:07
ALGORITHM RECALL PRECISION AUCROC
LOGISTICREGRESSION 70% 23.8% 70.9%
RANDOMFORESTS 35.2% 28.8% 62.1%
ANYQUESTIONS?