40
Ken Owens CTO Cisco Intercloud Services 07/15/15 How Cisco Migrated from MapReduce Jobs to Spark Jobs 1

StampedeCon 2015 Keynote

Embed Size (px)

Citation preview

Page 1: StampedeCon 2015 Keynote

Ken OwensCTO Cisco Intercloud Services07/15/15

How Cisco Migrated from MapReduce Jobs to Spark Jobs

1

Page 2: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Introduction

2

Trends

Page 3: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Introduction

3

Alignment to Business Outcomes

Page 4: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Introduction

4

ServicesVs

Legos

Page 5: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Introduction

5

Platform

Page 6: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Introduction

6

Software DefinedDisruption

Page 7: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Source: IDC 7

30MNew devices connected every week

78%Workloadsprocessed

in Cloud DCsby 2018

5TB+of data per person

by 2020

180BMobile apps downloaded

in 2015

277XData created by IoE devices

v. end-user

The Uber Trend: Exponential Rise in Connectivity

Page 8: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Exponential Trend

Linear Trend

Disruptive Stress/Opportunity

Knee of Curve

Exponential Growth Drives Opportunities

Peter Diamandis: BOLD

Page 9: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

When Products Become Cloud-enabled, They Become 10X More Valuable

$23.19

$249.00

$18.01

$199.00

$5.99

$59.99

Page 10: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

SaaS

PaaS IaaS

A Broader Perspective than Hybrid Cloud Is Required…

Data Center Cloud Edge / IoT

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco PublicPresentation ID

Page 11: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Hyperscale applications serving several thousands of users very quickly

Traditional enterprise applications

IoE and increasing connectivity driving the need for such workloads

Hadoop, Mobile back-ends, Gaming, Social

Small (~10%), yet rapidly growing percentage of applications in the Cloud

ERP, CRM, Applications that leverage traditional databases

Majority of applications being run for/by Enterprises today

CIOs Need to Embrace Both Traditional and Hyperscale Application Deployment

Page 12: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

SaaS

PaaS IaaS

Application Portability and Interoperability Is the Key

TraditionalApplications

ERP, Financial, Client/Server, CRM, email, …

Cloud NativeApplications

IoT, BigData, Analytics, Gaming, ...

Data Center Cloud Edge / IoT

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco PublicPresentation ID

Page 13: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Source: Gartner, Lydia Leong

of CIOs currently have a second fast/agile mode

of operation

45%Traditional

Mode

Requires Reliability

(ITIL, CMMI, COBIT)

Nonlinear Mode

Accept Instability

(DevOps, automation,

reusable)

Systems of

Differentiation

Systems of

Innovation

Systems of

Record

Ch

an

ge

Go

ve

rna

nc

e

Bimodal IT Is the New Normal

Source: Gartner, Lydia Leong

Page 14: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Intercloud

The Intercloud

Web-scale Architecture API-Driven Automation

Open, Secure, Compliant, Hybrid IT

Internet

The Internet

IP Based

Open Standards

World of Isolated Clouds (2000s)

Individual custom-built clouds without consistent APIs

Connected for application acceleration with Open APIs

The Intercloud

Intercloud

Islands of Isolated PC LAN Networks (1990s)

Multiple LANs usinga multitude of protocols

The Internet

Connected using industry-standard IP protocol

We Must Connect the Clouds

Page 15: StampedeCon 2015 Keynote

15© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Use Case: Customer Interaction Analytics

Page 16: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Omni-Channel Customer Journeys

Server Logs

Social & Chat

MobileEvent

StreamsCall

Center

S/W Download

Open Trouble Ticket

Assign Engineer

Update Trouble Ticket

Close Trouble Ticket

Resolve Trouble Ticket

Read Support Documents

View Design Documents

View Tech Documents

New Registration

Bug Search FAQs

Contract Details

Product Details

Device Coverage

Interaction Touch points

Channels

Journey

Case Resolution

Software Upgrade

The customers’ interaction with Cisco across multiple touch points to get the desired business outcome.

Page 17: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

• Software Upgrades• Bug Inquiry• Software Inquiry• Trouble Ticket Lifecycle• Device Troubleshooting• New Registration• Contract Renewal

• Customer Interest Analytics

• Customer Experience Analytics

• Resource Forecasting• Security and

Compliance

Customer Journeys Behavioral Insights

• Boost Self Service• Real-time Content

Optimization & Recommendation

• Context Based Predictive Alerts

• Implicit Personalization

Impact

Customer Interaction AnalyticsFrom Journey to Outcome…

Page 18: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Server Logs

Customer Interaction Analytics

Big Data Platform

Synthesize customer journey maps into behavioral insights.

Call Center

Mobility

Social

Event Streams

Data Sources

Data Ingestion

CiscoDV

Kafka

Redis

ETL

Analytics Model

Build Model

Activity Refinement

Activity Synthesis

Synthesized Insights

Real-time Processing

Batch Analytics

Insight Services

CiscoDV

Interact

ImpalaHive

Pig ES Zo

om

dat

a, P

latf

ora

Page 19: StampedeCon 2015 Keynote

19© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

AWS and CIS Intercloud Solution

Page 20: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

AWS Platform

Component Cloud::Hadoop(Batch

Analytics)

Cloud::Queries

(Interactive Queries)

Cloud::Streams

(Near Real-time

Analytics)

Virtual Machines

30 6 5

AWS Instance

Sizing

m3.2xlarge c3.xlarge m3.xlarge

Virtual Cores

8/VM 4/VM 4/VM

RAM 30GB/VM 7.5GB/VM 15GB/VM

Disk 1.5 TB/VM 1.5 TB/VM 1.5 TB/VM

Page 21: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Case for Cisco Intercloud Services for Analytics…

Cisco Security and Compliance requirements• Workloads that deal with personally identifiable data and Cisco

confidential content cannot be uploaded to AWS. Cisco internal cloud solution is a better fit.

Customer journey beyond the enterprise• Applications are hosted on AWS • Partner systems hosted on AWS and other cloud providersPresence in AWS and other cloud services required to support these scenarios for end-end customer journey insights.

Data virtualization integrated in the CIS Analytics Stack• Connect data from multiple clouds and multiple big data platforms

Integrated visualization toolset

Page 22: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

CIS Analytics Platform

Page 23: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

CIS Analytics Platform Requirements

Infra ProvisioningDeploy a virtual private cloud (VPC) on CIS with compute, storage and memory requirements comparable to the current production system. OpenStackIcehouse OpenStack with Neutron, Nova, and Swift installed.   Big Data EcosystemCloudera’s Hadoop distribution version CDH 5.1.3., ELK Stack, Apache Kafka and Apache Storm. Data virtualization & Cloud IntegrationAccess to data services and data stores via Cisco Data Virtualization

Runtime ServicesFoundational PaaS capabilities including SLAs for uptime, performance, latency, data retention, issue escalation

and support priorities, issue resolution, problem management, deployment process, patch management.

API ServicesProvide both fine-grained and coarse-grained access to the all service layers of the CIS Analytics Platform. In the hybrid cloud model it must support interoperability across platform service providers and promote the cloud concepts of extensibility and flexibility.

Page 24: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

AWS to CIS Migration – Success Criteria

Successful synthesis of customer interaction data

Successful automation of the end-end data process pipeline

Build behavioral insight services

Access to data and services via data discovery and visualization tools

Meet the performance, scale and platform stability requirements

Successful deployment of CiscoDV on CIS

Connect HDFS and Hive DS with CiscoDV via Hive and Impala

Build and expose insight services for consumption by limited users

Page 25: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

AWS and CIS Data Node Sizing Comparison Hadoop Cluster for Batch and Query Analytics

Node Service AWS Instance Type vCPU Mem Storage Number of Data Nodes Comments

Data Nodes/Node Master m3.2xlarge 8 30 2x80 GB 30

Each hadoop data node has 1500GB of EBS available for HDFS storage

AWS Sizing

CCS Sizing Node Service CCS Instance Type vCPU Mem Storage Number of

Data Nodes Comments

Data Nodes/Node Master GP-2XLarge 8 32 50 35

Each hadoop data node has 1500GB of EBS available for HDFS storage

Less than AWS sizing (Storage)

Page 26: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Pilot Test Data

• Test performed on one day’s production data • Total no. of records processed – 110,852,667• Total data size – 32GB• Total no. of M/R jobs in the data pipeline – 17• Two test cycles

• Cycle 1: Heterogeneous CCS nodes (vCPUs, storage, memory) • Cycle 2: Homogeneous CCS nodes

Page 27: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

CIS Performance of Batch Analytics – Limited Test

Page 28: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Test Details by M/R job

Job Name

CCS 12 nodes: cycle1

CCS 18 nodes: cycle1

CCS 24 nodes: cycle1

CCS 30 nodes: cycle1

CCS 18 nodes: cycle2

CCS 24 nodes: cycle2

CCS 30 nodes: cycle2

CCS 35 nodes: cycle2

New_cleanse 249 176 143 117 82 67 55 51Process_private_ip 27 14 11 10 7 5 6 6join_web_and_ip_data 142 95 76 61 49 40 34 29combine_ip_decorated_files 26 14 11 10 9 7 8 7filterBotEntries 34 19 15 13 10 8 7 7sessionize 71 64 69 62 60 63 15 13firstActivitiesFilter 26 15 13 10 9 8 6 6allOtherActivitiesFilter 29 18 13 13 11 9 7 6matchFirstActivities 21 13 11 13 13 11 8 8buildActivities 27 15 12 10 7 6 9 9filterBUG 8 5 3 2 3 3 4 4filterSEA 8 5 3 2 3 3 4 4filterTCO 8 5 3 2 3 3 4 4filterTDV 8 5 3 2 3 3 4 4filterWDV 8 5 3 2 3 3 4 4filterMOD 8 5 3 2 3 3 4 4filterTOOL 8 5 3 2 3 3 4 4

Page 29: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

PoC: Analytics with Spark on CIS

Existing code Made in Ruby with Wukong to run on Hadoop A history of changes and modifications Script-based, steps communicate via intermediary filesGoal Revise, rethink and reimplement with Spark on CIS Open for advanced cloud analytics Improve maintainability by moving away from aging Ruby on Hadoop

Page 30: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Sessionize

Cleanse

logs

cleanse

private web

decorate

sessionize (cookie, time)

sessioned

match 1st (IP, UA, time)

build actions merge session PSV

add to hivebug tool

first, others, bots

1..7

onlyBots

firstothers

private

Main computation

happens here

cleansed

Pre-process log records (‘cleanse’)

Extract HTTP sessions (‘sessionize’)

Extract user actions, such as ‘search’, ‘download patch’, ‘open manual’, ‘open a bug’

Ruby: Scripts with temp files

Each box on the figure is a script in a separate file

They pipe Gb of data as input and output

Random matching of nodes to data for sessionizing

Lots of redundant shuffling

Ruby Flow

global sort in timeglobal group by IP

Page 31: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Sessionize

Cleanse

logs

cleanse

private web

decorate

sessionize (cookie, time)

sessioned

match 1st (IP, UA, time)

build actions merge session PSV

add to hivebug tool

first, others, bots

1..7

onlyBots

firstothers

private

Main computation

happens here

cleansed

Same flow, but each box is a Java or Scala function

No intermediate temp files

Steps are chained by Spark, often without any need for intermediate data

If still needed, the data is stored in memory and local disk as much as possible

Local computation

Cleansing is computed on nodes local to data blocks (same as Ruby)

Sessions are built per IP

On separate nodes each handling a single IP range

One copied to the node on partition the data remains local

Spark Flow

global partition by IPlocal sort in time

Page 32: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Volumes Logs of a single day: 52 Gb Total of 110 mil records Where 53 mil records are kept after pre-filtering Producing over 1 mil user actions Cluster of 30 nodes

Ruby Runtime 140 min

Spark Runtime 7 min (20 times faster )

Runtime comparison

Page 33: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Extracting sessions means sort in time and group by IP

Ruby: sorting in time and per-IP grouping is performed across the whole cluster (very bad, lots of IO)

Spark is good at dealing with partitions: per-IP groups are placed on different machines (partitions) global sort in time is replaced by many local per-IP sorts done on machines responsible for

extracting sessions for specific groups of IP addressed

Other improvements Avoid redundant temp files, redundant (de)-serialization of objects (comes with Java/Scala), stages

keep data in memory when possible (comes with Spark) Cache results of user agent resolution that are heavy on regular expressions

Why?

Page 34: StampedeCon 2015 Keynote

34© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

CiscoDV on CIS

Page 35: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Data Virtualization for Intercloud Analytics

Customer Benefits Discover data beyond the enterprise: Virtual integration that combines traditional

enterprise data, Big Data stores on CIS and AWS, cloud data from SaaS providers and, Cisco Customers and Partners

Seamless interoperability offers easy access to data across distributed data sources in the intercloud analytics platform

Universal data governance maximizes enforcement of data security rules

Analytics Data Hubs: Deployment flexibility to build hybrid/virtual sandboxes that enable nimble data discovery and rapid data analytics to support multiple LOBs

Deliver data to any number of analytics tools.

Page 36: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Use Case 1: Get Case Interactions

Use Case Description # of cases opened by company X that are currently open. (other variations would include cases by company, trends etc.)

CiscoDV Value CiscoDV enforces data security rules to restrict access on the intercloud platform to customer sensitive data.

Data Sources SalesForce

Intercloud Solution CIS CiscoDV service can access the “sanitized” version of CSOne data through JDBC from RIDES(SWTG CiscoDV) API.

Connection Type DV on hybrid cloud Enterprise data store

Page 37: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Use Case 2: Get Customer JourneyUse Case Description Customer interactions on the web

pertaining to bug search and case submission process. Foundational data can be used to explore trends and feed into content recommendation models

CiscoDV Value Direct access to Data on CIS Intercloud Analytics Platform

Data Sources SAS Analytics

Intercloud Solution By direct network access to the Impala Server, the CIS CiscoDV server connects to the Impala Service in Hadoop also on CIS as a Data Source. SQL Queries configured in CiscoDV execute Impala queries

Connection Type DV on hybrid cloud VPC Big Data platform

Page 38: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

Use Case 3: Get Bug Interactions

Use Case Description

Another foundational data service that provides a breakdown of customer exposure or interest in bugs. The service can be refined further to look at trends specific to a company or a product for further analytics.

CiscoDV Value Real-time data federation that accesses extremely large data in CIS Intercloud Analytics platform and join that with Bug Data accessed via departmental CiscoDV instance (RIDES)

Data Sources SASA Analytics and QDDTS via RIDES

Intercloud Solution

By building on the access to the Impala Server, the DV server can join the Bug Data from the Enterprise Data Stores with the HDFS data to provide a federated view.

Connection Type

DV on hybrid cloud VPC Big Data platform and Enterprise data store

Page 39: StampedeCon 2015 Keynote

Cisco and/or its affiliates. All rights reserved.Presentation_ID Cisco Public

CiscoDV on Intercloud Analytics Platform (CIS)

Scenario 1

CIS Cisco DV to Cisco Enterprise Data Store

Scenario 2

CIS CiscoDV to Impala and Hive on CIS Intercloud Analytics Platform

Scenario 3

CIS Cisco DV to Hive on AWS Big Data Cluster

Sce

na

rio

1

Scenario

2

Scenario 3

Page 40: StampedeCon 2015 Keynote