Upload
natalino-busa
View
70
Download
0
Embed Size (px)
Citation preview
1 Natalino Busa - @natbusa
Global Artificial Intelligence Conference
AI in Finance: from Hype to marketing and cyber security use cases
www.globalbigdataconference.com Twitter : @bigdataconf
Global Artificial Intelligence Conference
AI in Finance: from Hype to marketing and cyber security use cases
Natalino BusaTwitter : @natbusa
3 Natalino Busa - @natbusa
Cognitive Finance Group Advisory Board Member
ING Group Enterprise Architect: Cybersecurity, Fintech
Teradata Head of Applied Data Science
Teradata Global Evangelist on Open Sourced Technologies
O’Reilly Author and Speaker
Philips Senior Researcher, Data Architect
Linkedin and Twitter:
@natbusa
5 Natalino Busa - @natbusa
The Medici Bank:Italian: Banco Medici1397–1494
6 Natalino Busa - @natbusa
Data as a Relationship
● Trust
● Transparency of Use
● Customer First
● Regulations and Laws
● Respect and Protect
● Providing a Service
7 Natalino Busa - @natbusa
An ethical approach for Actionable Financial Data
Help the customerPropose, Advise, Select, Filter, Connect,
Simplify1.Protect the customerDetect, Prevent, Alert, Block, Defend,
Identify, Authorize2.
9 Natalino Busa - @natbusa
http://www.slideshare.net/ING/4q15-media
● Innovation helps to empower people to make better financial decisions. ING, has launched several new omni-channel banking platforms.
● The platform gives customers insights into their personal finances in an easy and intuitive way.
Financial personalized recommenders
10 Natalino Busa - @natbusa
Financial personalized recommenders
● It Knows Finance● Conversational● Personal● Actionable● Predictive● Reuse Existing Content
13 Natalino Busa - @natbusa
● Fintech innovation to help strengthen our lending capabilities and better serve our consumer and SME clients.
● Kabbage, one of the leading US-based technology platforms providing automated lending to SME.
● In January 2016, ING has made an investment in fintech WeLab, which provides consumer loans in China and Hong Kong in a fully automated process that just takes minutes, from application to approval.
http://www.slideshare.net/ING/4q15-media
Strategic data-driven initiatives
14 Natalino Busa - @natbusa
Approaching (Almost) Any Machine Learning Problem- Abhishek Thakur, Kaggle Grandmaster -
data labels
raw data: tables, files Useful dataData munging Feature Engineering
Tabular Data ready for ML
15 Natalino Busa - @natbusa
Input
Hand Designed Program
Input Input
Rule-based System
Output
Hand Designed Features
Mapping from features
Output
Learned Features
Mapping from features
Output
Classic Machine Learning
Input
Learned Features
LearnedComplex features
Output
Mapping from features
RepresentationalMachine Learning
Deep Learning(end-to-end learning)
Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg
Predictive API’s: How to get there?
17 Natalino Busa - @natbusa
Demo: Credit Payment Defaultingwith TensorFlow and Keras
Methodology
This research aimed at the case of customers default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
18 Natalino Busa - @natbusa
Step 0: data exploration
Target variable: default payment next monthColor scheme: yes, defaulting not defaulting g
19 Natalino Busa - @natbusa
Step 1: feature engineering
pay_1 -1pay_2 0pay_3 -1pay_4 0pay_5 0pay_6 0pay_avgamt1 0.203221pay_avgamt2 3.72718pay_avgamt3 1.01611pay_avgamt4 0.914495pay_avgamt5 0.0700097pay_avgamt6 0.0689935pay_stdavgamt 1.40083pay_avg -0.333333pay_std 0.516398
20 Natalino Busa - @natbusa
Step 1: baseline (e.g regression)
model = Sequential()model.add(Dense(1, input_shape=(input_dim,))model.add(Activation('relu'))
1
87
it’s a neural network … with no network :)
21 Natalino Busa - @natbusa
Step 2: deep learning
model = Sequential()model.add(Dense(256, input_shape=(input_dim,), activation='relu'))model.add(Dense(256, activation='relu'))model.add(Dropout(0.25))model.add(Dense(64, activation='relu'))model.add(Dense(64, activation='relu'))model.add(Dropout(0.25))model.add(Dense(64, activation='relu'))model.add(Dense(64, activation='relu'))model.add(Dropout(0.25))model.add(Dense(10, activation='sigmoid'))model.add(Dense(1))model.add(Activation('sigmoid'))
256
64
64
87
256
64
64
10
1
22 Natalino Busa - @natbusa
Step 3: compare: is deep learning better?
256
64
64
87
256
64
64
10
1
1
87ShallowLogit Model
DeepLearning
23 Natalino Busa - @natbusa
Step 4: picking the brain of our DL model
87
1
24 Natalino Busa - @natbusa
256
64
64
87
256
64
64
10
1
Step 4: picking the brain of our DL model
25 Natalino Busa - @natbusa
Step 5: semantic clustering
Default
Very Safe
Mixed Group
Safe
Safe Mixed Group
27 Natalino Busa - @natbusa
Hyper-Parameters tuning
- based on scikit-learn- 15 classifiers, - 14 feature preprocessing methods- 4 data preprocessing methods- 110 hyperparameters
- Supervised classification challenge:100 different datasets
https://arxiv.org/abs/1611.03824v1
Natalino Busa - @natbusa
28 Natalino Busa - @natbusa
The API for banking data.
Two levels:
- Transactions- Risk Scoring
Inspiration from the Web
30 Natalino Busa - @natbusa
Clustering geolocated data using Spark and DBSCANHow to group users’ events using machine learning and distributed computing
By Natalino Busa
Predictive API’s: Clustering Geolocated Data
@natbusa | linkedin.com: Natalino Busa
Card Theft/Cloning: DBSCAN and Convex Hulls
@natbusa | linkedin.com: Natalino Busa
Fast writes2D Data StructureReplicatedTunable consistencyMulti-Data centers
Cassandra Kafka SparkStreaming EventsDistributed, Scalable TransportEvents are persisted Decoupled Consumer-ProducersTopics and Partitions
Ad-Hoc QueriesJoins, AggregateUser Defined FunctionsMachine Learning, Advanced Stats and Analytics
Kafka+Cassandra+Spark: SMACK stackStreaming Machine Learning
@natbusa | linkedin.com: Natalino Busa
Spark: Unified Distributed Computing:SQL + Machine Learning + Graph Analytics
Spark - RDDs
Streaming SQL MLlib Graphx
Analytics, Statistics, Data Science, Model Training
HDFS NoSQL SQL
Data Sources
Map-Reduce
HDFS KAFKA
Hive
@natbusa | linkedin.com: Natalino Busa
Cassandra: Store all the dataSpark: Analyze all the data
DC1: replication factor 3 DC2: replication factor 3 DC3: replication factor 3 + Spark Executors Storage! Analytics!
Data
Spark and Cassandra: distributed goodness
@natbusa | linkedin.com: Natalino Busa
Cassandra - Spark Connector
Cassandra: Store all the data
Spark: Distributed Data ProcessingExecutors and Workers
Cassandra-Spark Connector:
Data locality,Reduce ShufflingRDD’s to Cassandra Partitions
DC3: replication factor 3 + Spark Executors
39 Natalino Busa - @natbusa
Network Intrusion Detection
It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release).
Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes
Techniques: TDA, Dimensionality Reductionhttps://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
41 Natalino Busa - @natbusa
Tools for AI and Machine (deep) Learning
… this are just a few examples ...
43 Natalino Busa - @natbusa
AI: an ensemble of analytical methods
SQL + Graph + Text + Machine Learning + Voice/Image/Video
45 Natalino Busa - @natbusa
Takeaways
● AI can be applied in Finance: YES
● Train your AI: Domain Experts + ML
● Use All Tools, All Data
46 Natalino Busa - @natbusa
Distributed computing Artificial Intelligence
Machine Learning Statistics Big/Fast Data
Streaming Computing
Linkedin and Twitter:
@natbusa