22
ETL & Predictive Analytics Kudo Codefest 2016 Kudoplex 2, 29 Mei 2016

Kudo Codefest: Data science ETL & Predictive analytics to make better product

Embed Size (px)

Citation preview

   

ETL & Predictive AnalyticsKudo Codefest 2016

Kudoplex 2, 29 Mei 2016

Data Analyst

Data Engineer

Why Kudo need Data team ?● Our data is getting higher,

especially in the variety● Partnered with many vendor

with different characteristic data● Unique user (agent) behavior,

not a typical e-commerce user● Specific user (agent) profile

ETL (Extract Transform Load) Predictive Analytics

ETL (Extract Transform Load)We need ”analytics friendly” database that is single source

of all data in Kudo

python package : petl, pandas

Extract

Transform

Load

ETL is all about jobs that run periodicallywe need to make sure all jobs run “pretty smooth..”

Airflowa platform to programmaticaly author, schedule and

monitor our data pipelines

Support :

• Retries• Complex

Dependency (DAG)• Python Operator• Email on

Error/Retry• Exchange Message

between Task• Web UI• etc…

Airflow

Airflow Web UI

Airflow Web UI

Predictive AnalyticsMaking Prophecy to Business

Product Classification

Category

Product

PRODUCT NAME

Preprocessing

TOKENIZE

DELETE STOPWORD

Product

VECTORIZE

Token + Delete Stopword

Garskin Iphone 4 Texture Material – Beige Leather

garskin iphone

4texturematerial

beigeleather

0 1 0 1 0 0 0

Vectorize handphone

iphonesamsungleather

smartphonexiaomimurah

VOCABULARY

BINER

garskin iphone

4texturematerial

beigeleather

Train Data

0 1 0 1 0 0 01 0 1 0 0 0 01 1 1 0 1 0 01 0 1 1 1 0 01 1 0 0 1 1 0

Naive Bayes Model

Learning

Output

“360 Degree

Rotating Quiet Usb

Fan”

Elektronik0.8

Fesyen0.15

Perhiasan & Emas0.05

Model

Product Name

Performance

Thank You!Psst.. we are hiring

[email protected]@kudo.co.id