View
185
Download
3
Category
Preview:
Citation preview
Data @enbrite.lyMate Gulyas
CTO & Co-FounderGULYÁS MÁTÉ
@gulyasm
$171B170 000 000 000
$171BMAGYARORSZÁG GDP 125%
39%
39%
Oct. 27, 1994: Web Gives Birth to Banner Ads
THERE IS FRAUD ON THE
INTERNET
PROTOTYPE
WHY WE DO IT?
HOW WE DO IT?
DATA COLLECTION
ANALYZEDATA PROCESSION
ANTI FRAUDVIEWABILITY
BRAND SAFETYREPORT + API
What we do?
Product placeholder
Spark TOOLS
● 0.5-4TB data processed daily
1-10B rows
● Ad-hoc batch queries 20TB data
● 20+ node cluster
● Spent 4 month optimizing it
UNDER THE HOOD
DATA COLLECTION
The way to access log
{ "session_id": "spark_meetup_jsmmmoq", "timestamp": 1456080915621, "type": "click"}
eyJzZXNzaW9uX2lkIjoic3BhcmtfbWVldHVwX2pzbW1tb3EiLCJ0aW1lc3RhbXAiOjE0NTYwODA5MTU2MjEsInR5cGUiOiAiY2xpY2sifQo=
Click event attributes (created by JS tracker)
Access log format
TS CLIENT_IP STATUS "GET https://api.endpoint?event=eyJzZXNzaW9uX2lkIj..."
1.2.
3.
DATA COLLECTION
DATA PROCESSINGDATA PROCESSING
DATA PROCESSING
● AWS
● Apache Spark
● Apache Hadoop
● Luigi
● Python
(pandas, scikit)
● Go
● Pagerduty
● Graphite, Grafana
● Ganglia
● Ansible
● Hashicorp stack
TOOLS
AWS TOOLS
INFRASTRUCTURE
PROVIDER
AWS TOOLS
● 16 services
● 110+ machines
● 1-4 EMR clusters (1-30 node)
● 100TB+ on S3
● All clients has separate infrastructure
Spark TOOLS
INFRASTRUCTURE
PROVIDER
Luigi TOOLS
Luigi + enbrite.ly extensions = Gabo Luigi
WORKFLOW ENGINE
Tools we created GABO LUIGI
HOT MAP DETECTION
BOT TRAFFIC DETECTION
LESSONS LEARNED
LESSONS LEARNED
Automate EVERYTHING
LESSONS LEARNED
OPTIMIZATION takes a
LOT OF TIME
LESSONS LEARNED
Data is NEVER clean
http://budapest.startupsafary.com/
THE NEXT BIG THING....
PRIVACY
MATE GULYASgulyasm@enbrite.ly
@gulyasm@enbritely
THANK YOU!
Recommended