37
Large-Scale Machine Learning at Verizon Ashok N. Srivastava, Ph.D. Chief Data Scientist, Verizon

Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Embed Size (px)

Citation preview

Page 1: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Large-Scale Machine Learning at Verizon

Ashok N. Srivastava, Ph.D.

Chief Data Scientist, Verizon

Page 2: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

A Transition in Roles

• Enormous data volumes

• Massive computing infrastructure

• Significant public benefit that touches daily lives

Page 3: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Organizations started to recognize they are

operating with blind spots

1 in 3 business leaders frequently make

critical decisions without the

information they need

53% don’t have access to the information

across their organization needed to

do their jobs

Factors supporting major decisions

79 %

52 %

62 %

To a little extent

To a great extent

Analytics Personal Experience

Collective Experience

Page 4: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

200 TB metadata/day

Mobile in the United States

Social Already Mobile

Smartphone

consumers using

social

70%

58% of social

media time

4 IN 10 social users

bought after sharing on

Sources: eMarketer, Jumptap & comScore, Vision Critical, Verizon internal data

Data Traffic Exploding

40%-45% growth

projected per year

2,000 Terabytes

per day 1 Terabyte (TB) = ~ 1,000 GB

Consumers are Mobile

will

Game 74%

will listen to

Music 41%

will watch

Videos 45%

of 2013 Black

Friday

ecommerce

21%

Smartphone

penetration

61% 2013

80% 2017

7% CAGR

• What site?

• What page?

• Last site?

• Next site?

• From where?

• What time?

• What app?

• From who?

Page 5: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Revenue Streams due to Analytics

5

Page 6: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

The Orion Cluster at Verizon

FiOS Other 3rd party

data PIP and other Enterprise Data

VZW Network, Clickstream, Time, Location Data

Publicly available data

Data Feeds

Big Data Infrastructure

Reporting and Advanced Analytics

Hadoop Ecosystem

Massive Parallel

Processing RDBMS

NoSQL DBMS

Apache Projects

BI Reporting

& Dashboards

Domain-Specific Rules Engine

Anomaly & Pattern

Detection

Prediction Algorithms

Insight Discovery

Recommendation Engine

VZ Management Center Vertical Solutions

Web Services & APIs

• Advertising • Managed Network

Services

• Cybersecurity (unified network) • Network Health Management • M2M…

Real-time & Batch Processing

Page 7: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Orion today…

Page 8: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

8 Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Precision Market Insights Connecting marketers with consumers, improving engagement and driving audience response

Page 9: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

New Challenges for Marketers

The rise of Big Data & Mobile are

complicating marketing activities

Onslaught of 1st and

3rd party information

Siloed data stores

across internal

groups

Burgeoning channel for

customer interactions

The amount of data

from mobile is

exploding

Data Mobile

Lack of standards for

tracking and

targeting

Access to key

information from

channels

Mobile can be

the solution

• Bridge Digital & Physical

• Direct customer relationship

• Context for cross-channel

interactions

• Increased efficiency of

customer communications

Page 10: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

The Online/Mobile Ad Ecosystem

BRANDS

$1.00

AGENCIES

$0.10

AD NETWORKS $0.40

PUBLISHERS

$0.35

CONSUMER

AD

EXCHANGES

$0.10

Buy wholesale

inventory for advertisers

Streamline multiple ad networks for publishers

ENABLER

(Targeting,

Data provider)

$0.05

Page 11: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Mobile + Big Data Solutions for

Marketing

Precision enables better 1:1 understanding of customers across physical and digital

contexts to drive more relevant and personalized interactions

Location

Demographics

Clickstream

App Usage

Married 35-40 year old female in DC metro

interested in tennis and luxury brands;

recently browsed Tory Burch shoes at

m.Nordstrom.com and visits their store 3

times per month.

Page 12: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

• Known interest in sports

• Previous basic app download

• Have basic app but not premium app

Case Study: Relevant Mobile Ads Drive A Major Sports League

Engage a targeted audience to click on a mobile banner ad,

inviting them to purchase and download the NFL Premium App.

Privacy

OBJECTIVE

AUDIENCE

RESULT

Precision can reach a broad audience across multiple properties and devices,

accurately targeting specific mobile user segments.

PRECISION MEETS THE CHALLENGE

Had 64% higher CTR compared

to other mobile media properties Precision

Others

0.38%

0.23%

Page 13: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Adaptive Architecture for

Optimal Advertising

13

Data Driven

Customer Model

Automated

Decision Maker

Observed

Consumer

Behavior

Adjustment

Mechanism

Page 14: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Automatic Profile Discovery

14

Page 15: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Large-scale Machine Learning with

Applications to Connected Machines

Page 16: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 16

Connected Ecosystem Market Size

By 2017, in the U.S., there will

be…

207 Million

Smartphones

Over 426 Million

Ways to Interact

with Customers

Source: eMarketer, August 2013 and Frost & Sullivan 2013

Healthcare Finance

Energy

Transportation Distribution

Manufacturing

Page 17: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 17

Telematics / Fleet Management Today

OEM & AFTERMARKET TELEMATICS

COMMERCIAL TELEMATICS

ENERGY | SECURITY EDUCATION MANUFACTURING RETAIL FINANCIAL SERVICES Others

HEALTHCARE

2012 2013 2014

CO M M E R CI A L T E LE M A T I CS : F LE E T

HUGHES ACQUISITION

CO M M E R CI A L T E LE M A T I CS : W O R K F O R CE | W O R K - O R D E R | A S S E T M G T .

CO N S U M E R T E LE M A T I CS : A F T E R M A R K E T | I N F O T A I N M E N T

S M A R T T R A N S P O R T A T I O N : T R A F F I C | P A R K I N G

S M A R T T R A N S P O R T A T I O N : CA R S H A R I N G | R A I L | P U B L I C T R A N S P O R T | A S S E T M G T . | M U N I T R A N S P O R T

S M A R T B U I LD I N G S R E T A I L

S M A R T T R A N S P O R T A T I O N : A I R | M A R I T I M E

CO N S U M E R T E LE M A T I CS : O E M E M B E D D E D

Making Safety and Efficiency Real

Page 18: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 18

Connected Machine Landscape (I)

• Location analytics

• Comparative analytics

across vehicle

makes/models

• Prognostic and diagnostic

vehicle health

management (self-healing

autonomous systems)

Car

250M cars on the road in the

US, with about 20M new cars

sold each year.

Page 19: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 19

Connected Machine Landscape (II)

• Driver behavior (safety,

remote management)

• Introduces uncertainty

• Need for personalization

• Autonomous control

• Consumer behavior

(infotainment, advertising,

etc.)

• Information retrieval

• Relevance

Car + Consumer

Over 100M mobile

subscribers currently on

Verizon’s network

Page 20: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 20

Connected Machine Landscape (III)

• Environmental inputs

(Traffic, weather,

emergency services, etc.)

• Data correlation

• Distributive decision

making and

optimization

Car + Consumer + Environment

100s of TB of environmental

data generated daily

Page 21: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 21

Verizon has core competency in advanced analytics areas

including:

• Anomaly Detection

• Diagnostics

• Predictive Analytics

• Time Series Analysis and Forecasting

• Correlations and Association Analysis

• Optimization

Advanced Analytics for Connected

Machines

Page 22: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 22

Connected Machine Health Management

3.

Prognosis “What will

happen next

and when?”

2. Diagnosis “What is happening?”

1.

Anomaly

Detection “Is something

different?”

4.

Mitigation “What actions

to take?”

Manual intervention or

autonomous and self-

healing (“intelligent”)

systems

Predictive

Analytics

Real- and near-real-

time monitoring and

anomaly detection

Comprehensive and

real-time view of big

data relevant to

diagnosis

Data and domain rules

Domain rules and

expert opinion

Domain rules and

expert opinion

Page 23: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 23

Multiple Kernel Anomaly Detection

Multiple Kernel

Anomaly Detection

(MKAD)

Discrete

Textual

Networks

Continuous

Primary Source:

Switches, routers

and other

machines

Primary Source:

Maintenance

Reports

Primary Source:

Network

connectivity

Page 24: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Optimization problem

ji

jiji KQ,

,min 2

1

il

i ,1

0

,1,0

1i

iSubject to:

One class SVMs training algorithms require solving the quadratic problem

Dual form

Linear equality constraint

Bounds on design variables

Control parameter

: Lagrange multipliers of the primal QP problem

Page 25: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Anomaly scores

Data points with will be the support vectors

Value of h: degree of anomalousness

Sign of h: if negative – outlier if positive - normal

Indicator

0k

i

ziiz Kfh ,,,,

Decision boundary is determined only by margin and non-margin support vectors obtained by solving the QP problem

Page 26: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

The Impact of Big Data Innovations…

26

Incr

eas

ing

De

gre

e o

f A

no

mal

ou

snes

s

Numeric Inputs Only Numeric and Text Inputs

The addition of Numeric and Text Data in the new MKAD algorithm significantly improves

(by as much as 7000 points) the ranking of the monitored anomalies.

Execution time increased by approximately 5%.

Page 27: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Anomaly Detection using NASA MKAD Algorithm

•Aviation Safety Program Annual Review 2012 | SSAT Project •27

Reported Exceedance Level 3: Speed low at touchdown Level 2: Flaps at questionable setting at landing

Unusual Auto Landing Configuration

Page 28: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

28 Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.

Big Data and Telematics

Page 29: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 29

Telematics

What Can We Build?

W O R K F L O W

W O R K F O R C E

F L E E T

I N V E N T O R Y

CLOUD

ANALYTICS

NETWORK

Monitor fleet location, condition and driver behavior

Align resource capacity with work and direct the right

resource to the right location at the right time

Manage work and share updates both internally and externally

Monitor the location and condition of cargo

Page 30: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 30

Connected Car Solutions

After market data collectors (for all modern makes and models)

Built-in data collectors for Mercedes and other brands

Page 31: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 31

1 Billion Miles

Chicago, IL Houston, TX

United States

New York, NY

Dallas, TX Detroit, MI

58% City Driving, 42% Highway

Page 32: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 32

Real-world MPG calculations are sampled from

thousands of vehicles, not just a handful.

Japanese OEM Vehicles U.S. OEM Vehicles

Page 33: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 33

Driver behavior is analyzed at multiple levels.

Driver behavior can

be compared and

contrasted to drivers

from other OEM

vehicles.

Honda Vehicles:

32.65 mi/day

Nissan Vehicles:

33.22 mi/day

Mitsubishi Vehicles:

33.32 mi/day

Ford Vehicles:

32.08 mi/day

Volkswagen Vehicles:

33.69 mi/day

Cadillac (GM) Vehicles:

29.24 mi/day

Toyota Vehicles:

31.92 mi/day

Page 34: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement. 34

Driver behavior is analyzed at multiple levels.

All Vehicles

Older model years are driven at significantly lower speeds:

More homogeneous driving

across model years

Page 35: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Prediction Interval Estimation using

Bootstrap Regression

• Ensemble Learning Approach

• We have proven that the empirical

quantiles of the bootstrap

prediction models can be used to

consistently estimate the prediction

intervals.

• S. Kumar and A. N. Srivastava, under

submission to NIPS 2014.

Page 36: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

Anomaly Detection Application

S. Kumar and A. N. Srivastava, under review at NIPS 2014.

Page 37: Large-Scale Machine Learning at Verizonmmds-data.org/presentations/2014/srivastava_mmds14.pdfLarge-Scale Machine Learning at Verizon Ashok N. Srivastava, ... The Orion Cluster at Verizon

The Entire Verizon Big Data and Analytics Team

My former team at NASA

Collaborators at Stanford

We are hiring for junior and senior roles:

Data Scientists

Machine Learning experts

Platform Engineers

Analytics and visualization

Application Developers

Product Managers

Acknowledgements + Notes