46
1 Natalino Busa - @natbusa Global Artificial Intelligence Conference AI in Finance: from Hype to marketing and cyber security use cases

AI in finance - from hype to marketing and cybersecurity use cases

Embed Size (px)

Citation preview

1 Natalino Busa - @natbusa

Global Artificial Intelligence Conference

AI in Finance: from Hype to marketing and cyber security use cases

www.globalbigdataconference.com Twitter : @bigdataconf

Global Artificial Intelligence Conference

AI in Finance: from Hype to marketing and cyber security use cases

Natalino BusaTwitter : @natbusa

3 Natalino Busa - @natbusa

Cognitive Finance Group Advisory Board Member

ING Group Enterprise Architect: Cybersecurity, Fintech

Teradata Head of Applied Data Science

Teradata Global Evangelist on Open Sourced Technologies

O’Reilly Author and Speaker

Philips Senior Researcher, Data Architect

Linkedin and Twitter:

@natbusa

4 Natalino Busa - @natbusa

What about AI in Finance?

5 Natalino Busa - @natbusa

The Medici Bank:Italian: Banco Medici1397–1494

6 Natalino Busa - @natbusa

Data as a Relationship

● Trust

● Transparency of Use

● Customer First

● Regulations and Laws

● Respect and Protect

● Providing a Service

7 Natalino Busa - @natbusa

An ethical approach for Actionable Financial Data

Help the customerPropose, Advise, Select, Filter, Connect,

Simplify1.Protect the customerDetect, Prevent, Alert, Block, Defend,

Identify, Authorize2.

8 Natalino Busa - @natbusa

Personalized Financial

9 Natalino Busa - @natbusa

http://www.slideshare.net/ING/4q15-media

● Innovation helps to empower people to make better financial decisions. ING, has launched several new omni-channel banking platforms.

● The platform gives customers insights into their personal finances in an easy and intuitive way.

Financial personalized recommenders

10 Natalino Busa - @natbusa

Financial personalized recommenders

● It Knows Finance● Conversational● Personal● Actionable● Predictive● Reuse Existing Content

11 Natalino Busa - @natbusa

Inspiration from the Web

12 Natalino Busa - @natbusa

Credit Pre-Authorization

13 Natalino Busa - @natbusa

● Fintech innovation to help strengthen our lending capabilities and better serve our consumer and SME clients.

● Kabbage, one of the leading US-based technology platforms providing automated lending to SME.

● In January 2016, ING has made an investment in fintech WeLab, which provides consumer loans in China and Hong Kong in a fully automated process that just takes minutes, from application to approval.

http://www.slideshare.net/ING/4q15-media

Strategic data-driven initiatives

14 Natalino Busa - @natbusa

Approaching (Almost) Any Machine Learning Problem- Abhishek Thakur, Kaggle Grandmaster -

data labels

raw data: tables, files Useful dataData munging Feature Engineering

Tabular Data ready for ML

15 Natalino Busa - @natbusa

Input

Hand Designed Program

Input Input

Rule-based System

Output

Hand Designed Features

Mapping from features

Output

Learned Features

Mapping from features

Output

Classic Machine Learning

Input

Learned Features

LearnedComplex features

Output

Mapping from features

RepresentationalMachine Learning

Deep Learning(end-to-end learning)

Prof. Yoshua Bengio - Deep Learninghttps://youtu.be/15h6MeikZNg

Predictive API’s: How to get there?

16 Natalino Busa - @natbusa

From Feature to Architecture Engineering:

17 Natalino Busa - @natbusa

Demo: Credit Payment Defaultingwith TensorFlow and Keras

Methodology

This research aimed at the case of customers default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

18 Natalino Busa - @natbusa

Step 0: data exploration

Target variable: default payment next monthColor scheme: yes, defaulting not defaulting g

19 Natalino Busa - @natbusa

Step 1: feature engineering

pay_1 -1pay_2 0pay_3 -1pay_4 0pay_5 0pay_6 0pay_avgamt1 0.203221pay_avgamt2 3.72718pay_avgamt3 1.01611pay_avgamt4 0.914495pay_avgamt5 0.0700097pay_avgamt6 0.0689935pay_stdavgamt 1.40083pay_avg -0.333333pay_std 0.516398

20 Natalino Busa - @natbusa

Step 1: baseline (e.g regression)

model = Sequential()model.add(Dense(1, input_shape=(input_dim,))model.add(Activation('relu'))

1

87

it’s a neural network … with no network :)

21 Natalino Busa - @natbusa

Step 2: deep learning

model = Sequential()model.add(Dense(256, input_shape=(input_dim,), activation='relu'))model.add(Dense(256, activation='relu'))model.add(Dropout(0.25))model.add(Dense(64, activation='relu'))model.add(Dense(64, activation='relu'))model.add(Dropout(0.25))model.add(Dense(64, activation='relu'))model.add(Dense(64, activation='relu'))model.add(Dropout(0.25))model.add(Dense(10, activation='sigmoid'))model.add(Dense(1))model.add(Activation('sigmoid'))

256

64

64

87

256

64

64

10

1

22 Natalino Busa - @natbusa

Step 3: compare: is deep learning better?

256

64

64

87

256

64

64

10

1

1

87ShallowLogit Model

DeepLearning

23 Natalino Busa - @natbusa

Step 4: picking the brain of our DL model

87

1

24 Natalino Busa - @natbusa

256

64

64

87

256

64

64

10

1

Step 4: picking the brain of our DL model

25 Natalino Busa - @natbusa

Step 5: semantic clustering

Default

Very Safe

Mixed Group

Safe

Safe Mixed Group

26 Natalino Busa - @natbusa

Hands on with Keras and Tensorflow

27 Natalino Busa - @natbusa

Hyper-Parameters tuning

- based on scikit-learn- 15 classifiers, - 14 feature preprocessing methods- 4 data preprocessing methods- 110 hyperparameters

- Supervised classification challenge:100 different datasets

https://arxiv.org/abs/1611.03824v1

Natalino Busa - @natbusa

28 Natalino Busa - @natbusa

The API for banking data.

Two levels:

- Transactions- Risk Scoring

Inspiration from the Web

29 Natalino Busa - @natbusa

Card Theft: Geo-Alerting

30 Natalino Busa - @natbusa

Clustering geolocated data using Spark and DBSCANHow to group users’ events using machine learning and distributed computing

By Natalino Busa

Predictive API’s: Clustering Geolocated Data

@natbusa | linkedin.com: Natalino Busa

Venues and Events

@natbusa | linkedin.com: Natalino Busa Events clustering

@natbusa | linkedin.com: Natalino Busa

Card Theft/Cloning: DBSCAN and Convex Hulls

@natbusa | linkedin.com: Natalino Busa

Fast writes2D Data StructureReplicatedTunable consistencyMulti-Data centers

Cassandra Kafka SparkStreaming EventsDistributed, Scalable TransportEvents are persisted Decoupled Consumer-ProducersTopics and Partitions

Ad-Hoc QueriesJoins, AggregateUser Defined FunctionsMachine Learning, Advanced Stats and Analytics

Kafka+Cassandra+Spark: SMACK stackStreaming Machine Learning

@natbusa | linkedin.com: Natalino Busa

Spark: Unified Distributed Computing:SQL + Machine Learning + Graph Analytics

Spark - RDDs

Streaming SQL MLlib Graphx

Analytics, Statistics, Data Science, Model Training

HDFS NoSQL SQL

Data Sources

Map-Reduce

HDFS KAFKA

Hive

@natbusa | linkedin.com: Natalino Busa

Cassandra: Store all the dataSpark: Analyze all the data

DC1: replication factor 3 DC2: replication factor 3 DC3: replication factor 3 + Spark Executors Storage! Analytics!

Data

Spark and Cassandra: distributed goodness

@natbusa | linkedin.com: Natalino Busa

Cassandra - Spark Connector

Cassandra: Store all the data

Spark: Distributed Data ProcessingExecutors and Workers

Cassandra-Spark Connector:

Data locality,Reduce ShufflingRDD’s to Cassandra Partitions

DC3: replication factor 3 + Spark Executors

38 Natalino Busa - @natbusa

Cyber security in Finance

39 Natalino Busa - @natbusa

Network Intrusion Detection

It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release).

Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes

Techniques: TDA, Dimensionality Reductionhttps://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

40 Natalino Busa - @natbusa

AI: tools and technologies

41 Natalino Busa - @natbusa

Tools for AI and Machine (deep) Learning

… this are just a few examples ...

42 Natalino Busa - @natbusa

AI: models and algorithms

43 Natalino Busa - @natbusa

AI: an ensemble of analytical methods

SQL + Graph + Text + Machine Learning + Voice/Image/Video

44 Natalino Busa - @natbusa

AI in Finance: Recap & Lessons Learned

45 Natalino Busa - @natbusa

Takeaways

● AI can be applied in Finance: YES

● Train your AI: Domain Experts + ML

● Use All Tools, All Data

46 Natalino Busa - @natbusa

Distributed computing Artificial Intelligence

Machine Learning Statistics Big/Fast Data

Streaming Computing

Linkedin and Twitter:

@natbusa