New Metrics Engine to Help Drive UBER

Preview:

Citation preview

t.uber.com/scala2016

Sasha OvsankinUBER

New Metrics Engine to Help Drive Uber

November 2016

1

t.uber.com/scala2016

“Transportation as reliable as running water, everywhere. for everyone”

2

t.uber.com/scala2016

Data & Analytics Engineer

About Me

Mathematical PhysicsLomonosov Moscow University

Contact

sashao@uber.comhttps://linkedin.com/in/sashaohttp://t.uber.com/scala2016

3

t.uber.com/scala2016

What Do We Work On

Fuel Uber’s innovation, make software release cycle more robust and

data driven

Experimentation Platform Uber Data Platform

Cutting-edge data platforms powering Uber’s intelligence

4

t.uber.com/scala2016

What This Talk Is About

Building a company-wide Metrics Platform is possible and practical,

and you should do it

5

t.uber.com/scala2016

Agenda

Why Metrics PlatformTechnologyProcessConclusion

6

t.uber.com/scala2016

How Do You Want Your Metrics?

Aligned

Reliable

Trusted

7

t.uber.com/scala2016

Uber Situation

● Over 450 cities in over 70 countries

● Lots of growth: ○ 1B rides by Dec 2015, 2B rides by

June 2016● Teams have high level of

independence

8

t.uber.com/scala2016

How do you make data-driven decisions in a business like that?

9

t.uber.com/scala2016

Metrics Platform = Technology + Process

10

t.uber.com/scala2016

Our Metrics PlatformArchitecture and Process

Engines

Registry

Council

Web-UI

Spark / Hive / Real Time

BI Tool UI

DS / Ops / Product

Definition DSL Query DSL

11

t.uber.com/scala2016

Our Metrics Platform

Easy & Powerful

Integrated

Lightweight Process

12

t.uber.com/scala2016

Metric Walkthrough

Metric hours active

English description

Hours spent by drivers logged-in and online in the driver app

SQL select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_id

13

t.uber.com/scala2016

Metric walkthroughContinued

Add date select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’

In San Francisco select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’

14

t.uber.com/scala2016

Metric walkthroughContinued#2

Group by experiment treatment

select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idjoin xp.user_experiment xp on xp.user_id=dr.idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’and xp.experiment_key=’crm_driveronboarding_wcdrip’group by xp.treatment

Group by driver type

select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac, right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idjoin xp.user_experiment xp on xp.user_id=dr.id join model.driver dm on dm.id=dr.idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’ and c.name=’San Francisco’and xp.experiment_key=’crm_driveronboarding_wcdrip’group by xp.treatment, model.driver.type

15

t.uber.com/scala2016

Complicated

Unmanageable

Fragile

16

t.uber.com/scala2016

Anatomy of a Metric

Preaggregationtransformations

Preaggregationtransformations

Aggregation

Aggregation

Post-aggregationformulasInput

Input

Input

Results

Dimensions

Metric definitions

Filters

FinalJoin

dim1

dim2

m1

m2

...

dim1

dim2

m1

m2

...

select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac,right join dim.driver dr on dr.id=ac.driver_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’group by driver.city

17

t.uber.com/scala2016

Metric = Formula + Query

select(core.avg_driver_hours_active)where(dim_city.name===”San Fransisco”)over(days(7) upto today)groupBy(driver_model.category, user_experiment.treatment)

select sum(ac.minutes_active / 60) / count(*)from derived.driver_activity ac,right join dim.driver dr on dr.id=ac.driver_idjoin dim.city c on c.id=ac.city_idwhere ac.timestamp >= ‘2016-10-01’ and ac.timestamp < ‘2016-11-01’and c.name=’San Francisco’...

avg_driver_hours_active = sum(agg.driver_activity.minutes_active / 60) / count(dim.driver)

FormulaQuery

18

t.uber.com/scala2016

The Metrics DSL: Formula

val hours_online = driver_activity.minutes_active / 60val all_drivers = count(dim_driver)val avg_driver_hours_online = sum(hours_online) / all_drivers

sum count

/

/

driver_activity.minutes_active 60

dim_driver

19

t.uber.com/scala2016

The Metrics DSL: Query

val query= select(avg_driver_hours_online) where(dimDriver.partner_city_id==="San Francisco") over(days * 7 towards today) groupBy(dimDriver.partner_city_id)

val df= engine.toDF(query)

DSL DataFrame Output20

t.uber.com/scala2016

✔ Easy & Powerful

Integrated

Lightweight Process

21

t.uber.com/scala2016

The Engine Core

Company Schema Repository

22

t.uber.com/scala2016

The Engine Core

Company Schema Repository

Table Schemas

Foregn key Relationships

23

t.uber.com/scala2016

The Engine Core

Company Schema Repository

Table Schemas

Foreign key Relationships

Engine Configuration

24

t.uber.com/scala2016

The Engine Core

Company Schema Repository

Table Schemas

Foreign key Relationships

Engine Configuration

Engine Core

Query

25

t.uber.com/scala2016

The Engine Core

Company Schema Repository

Table Schemas

Foreign key Relationships

Engine Configuration

Engine Core

Query

Execution Plan

26

t.uber.com/scala2016

✔ Easy & Powerful

✔ Integrated

Lightweight Process

27

t.uber.com/scala2016

Our Metrics PlatformArchitecture and Process

Engines

Registry

Council

Web-UI

Spark / Hive / Real Time

BI Tool UI

DS / Ops / Product

Definition DSL Query DSL

28

t.uber.com/scala2016

Metric Creation Process

29

t.uber.com/scala2016

Metric Management Web UI

Video link: https://youtu.be/we3q6O4eZIg 30

t.uber.com/scala2016

Our Metrics PlatformTechnology

✔ Easy & Powerful

✔ Integrated

✔ Lightweight Process

31

t.uber.com/scala2016

● Experimentation● Product groups● Financial reporting● Real time decision making● Fraud detection

Users

32

t.uber.com/scala2016

● Futher adoption within Uber● Further work on DSL● More Engines● Real Time ● Open Source?

Future Direction

Interested?

http://t.uber.com/scala2016sashao@uber.com

33

t.uber.com/scala2016

What this talk was about

Building company-wide Metrics Platform is possible and practical,

and you should do it

34

t.uber.com/scala2016

The Metrics Platform Team

Contact us:● http://t.uber.com/scala2016● sashao@uber.com

35

We are hiring!

t.uber.com/scala2016

Questions?

36

t.uber.com/scala2016

Thank you

Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be

reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or

by any information storage or retrieval systems, without permission in writing from Uber. This document is intended

only for the use of the individual or entity to whom it is addressed and contains information that is privileged,

confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified

that the information contained herein includes proprietary and confidential information of Uber, and recipient may not

make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person

other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.

Image credits: ● Erik bij de Vaate● Bernard Spragg

37