9
Data Engineering @ Grab 15 July 2017 Geekcamp JKT

Data Engineering @ Grab · Infrastructure Evolution of Analytics@Grab MySQL Replica 2013 / 2014 1 Database ~20 tables < 50 reports < 10 users 1 Engineer Redshift 2015/2016 20 Databases

  • Upload
    hakien

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Data Engineering @ Grab

15 July 2017 Geekcamp JKT

65 CITIES7 COUNTRIES>1.1Million

drivers in network

Largest in SEA

>50 Milliondownloads

#1 e-hailing in SEA

third-party taxi-hailing apps

private cars& growing

>70%95%share share

SERVICES6

Infrastructure Evolution of Analytics@Grab

MySQL Replica2013 / 2014

1 Database~20 tables

< 50 reports< 10 users1 Engineer

Redshift2015/201620 Databases100s tables100s reports < 500 users3 Engineers

Presto+EMR+S3Now

~20 Databases + streams100s tables

> 500 reports > 500 users9 Engineers

Redshift for Analytics@Grab

Daily ETL After Midnight Redshift serves multiple use cases

Data Lake@Grab

Pyrois Orchestrator

Hourly ETL

Data Stored as Parquet and

Partitioned by Time

Helios Data Lake in S3

Analytics@Grab Today

Marketing Analytics

User Trust

Data Science

Helios Data

Lake in S3

Data Gateway

❖ Group based ACL

❖ Custom JDBC Driver

❖ Query Parser extracts

Tables/Columns used

❖ Uses correct cluster

based on permissions

❖ Access and Query Logs

FutureHelios Data Lake in S3

Real time streaming

Real time monitoring

We’re just getting started.