19
© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Cisco Confidential © 2015 Cisco and/or its affiliates. All rights reserved. 1 Cisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini and Virendra Singh, Cisco IT

Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

  • Upload
    vodung

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Cisco Confidential © 2015 Cisco and/or its affiliates. All rights reserved. 1

Cisco IT’s Hadoop Adventure with MapR

Robert Novak Cisco Big Data Partner CSE October 2015

With thanks to Alex Garbarini and Virendra Singh, Cisco IT

Page 2: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

•  ~20 year full stack sysadmin (retired)

•  10+ year big data admin (Hadoop since 2009)

•  Cisco UCS C-Series early adopter (2011)

•  Victor Kiam of Big Data on UCS (kinda)

•  Cisco Big Data Safari Tour Guide since 2014

•  Blog at rsts11.com and Cisco Blogs Twitter: @gallifreyan Also found at: linkedin.com/in/rnovak

Page 3: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

•  Motivation for Hadoop Deployment at Cisco

•  Hadoop Platform

•  Timeline

•  Key Decisions / Lessons Learned

•  Data Lake Considerations

•  Use Cases

•  The Quiz

Page 4: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

In 2011-2012, Data Wars Were Beginning •  Growing beyond 20 years of database based application development

-  Cost Structure -  Development Methodology & Project lifecycle -  Programming Model -  Maturity Curve of The Technology Is Different

•  FUD – Fear, Uncertainty and Doubt

•  Availability of agile, skilled workforce

•  Reuse legacy skills and apply new tools

•  Rapid pace of innovation and constantly changing industry dynamics

Page 5: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

Cisco IT’s Hadoop Adventure (so far)

POCs 2011

Multi-tenant Shared Platform July 2012

Use Cases Deployment Starting 2013….

Enterprise Data Lake 2014

Growth & Expanding Ecosystem …to infinity and beyond!

Page 6: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Open Source vs Distribution

Operational Excellence, Availability, Performance, Skill Set

Architecture Cisco UCS Integrated Infrastructure Scalability, sustainability, repeatable performance

Ecosystem Hive (SQL), Mahout, HBase, Spark

Environment Lifecycle Production, Stage, Development & Technical POC

(Isolate usage by Risk & Development lifecycle)

Data Lake Data Governance, reduce cost, increase efficiency, eliminate duplication and silos

Key Decisions Rationale

Page 7: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

•  Architecture and Distribution Considerations

•  Multi-tenant

•  Mission Critical Features

•  Plan For Scalability, Expect The Unexpected

•  Support: Open Source or Distribution

•  Opex Or Capex, Licenses Or Headcount, You Pay One Way Or Another

•  Leverage Skills:

•  Components That Empower Users’ Existing Skills Like Informatica and SQL

Lessons from Technology Journey

Page 8: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

•  Hive doesn’t support ANSI SQL

•  Reusable UDFs for Hive were created

•  Workload management

•  Cisco uses Tidal Enterprise Scheduler for workload management and error handling

•  Hadoop scales linearly

•  Our platform grew 100% in the first year.

•  Invest in architecture that enables rapid growth with minimal pain

Lessons from Technology Journey

Page 9: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

Shared Data ó Rich Analytics

Supply Chain

Engineering

Finance

Advanced Services

Marketing

Sales

IT

Cisco Services

Security

Enterprise Platform(s)

Page 10: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

•  Metadata-driven utilities to automate data ingestion

•  Access Control Driven by Metadata

•  Scalable Cost effective Unify resources

Enterprise Data Lake

Page 11: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

•  Migrate ETL Processing from EDW

•  Data Lake & Adhoc Data Analysis

•  Data Archiving

•  Customer Segmentation

•  Multi-Channel Scoring

•  Content Auto-tagging

•  Smart Analytics Offerings

•  Service Opportunity Identification

•  Organization Network Analytics

•  Engineering Source Code Monitoring

•  Access Control and Auditing

Data Platform Option to Reduce Cost

Marketing & Content Management

Services Risk & Compliance

Cisco IT Use Cases for Hadoop in Production

Page 12: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

Operational Cost / Productivity

•  Distributed Name Node

•  Snapshots

•  Volume Based Disaster Recovery

•  Higher performance and fewer nodes ($)

•  More efficient use of resources, scale with data and processing

•  HBase, MapR-DB, and Hadoop on the same cluster

•  NFS (Fully Read & Write)

•  Multiple versions of components on the same cluster

Performance High Availability

Page 13: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

Databases

Data Platform Reference Architecture v3

Docs, Cases, Content, Social Media, Clicksteam

Operational Intelligence

Index & Search

IT App & System Logs & Config.

Internet of Everything (IoE)

Self Service Dashboard

Rapid Business Intell.

Data Exploration

Mission Critical Operational Reports

Financial Reporting & Extract

Operational Intelligence(Splunk UI)

Real time Predictive

Data Analysis, Text Analytics

Machine Learning,, Statistical Analysis (R)

Machine Data Insights (e.g. In supply chain)

SFDC

Data Sources Data Consumption

Big Data Platform

Hadoop & Spark on UCS

•  Machine Learning •  Data Archiving •  Data Science

Mission Critical Reporting

Legacy EDW •  Financial SSOTs •  Stable core •  Controlled Change

Agile Analytics

SAP HANA on UCS

•  Predictive Engine •  Real time BI

Network of Truth

(Mobile / Browser / Data Service)

Experience Toolkit

Cisco Data Virtualization (Composite) Logical Data Abstraction Layer across transactional, SaaS, Big Data & DW

Rapid Prototyping / Data Integration / Data Services

SAS

Hadoop & Spark

Data Storage and Processing

HANA

Analytics & Modeling

Customer Network, Product Usage

Customer Registry

ERP

Databases ALL other Sources

Cisco D

ata Virtualization (Com

posite)

Page 14: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

Thank you. Thank you.

Page 15: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 15

Cisco Hadoop Platform – Physical Architecture

Cisco Unified Computing System C240 M3

Cisco UCS 62XXUP Fabric InterConnects ( Per Domain )

Scalability

High Performance

Unified Management

Operational Simplicity

High Availability

Components Details

OS RHEL 6.4

Distribution MapR (M7)

Server (node) UCS 240 M3 – 16 cores (w HT – Hyper Threading 32 cores)

Processor E5-2655

Memory/Node

256 GB

Storage/Node 24*1 TB (22 HDFS)

No. of Nodes 54

Cores 864 (Hyper Threading enabled)

Total Memory 13824 GB

Storage 1188 TB

No-SQL HBASE (MapR - M7) ZooKeeper, CLDB, WebServer, JobTracker 3 nodes each, File Server, TaskTracker across all nodes, Platfora 4 nodes

Multi UCS cluster Hadoop environment Multi-Tenant model for PROD and DEV/Stage

8X 10 Gb/s Each 8X 10 Gb/s Each

N7K

Production Capacity

80 Gb/s 80 Gb/s

Scalability

High Performance

Unified Management

Operational Simplicity

High Availability

Cisco Nexus 2232PP 10 GE Fabric Extenders ( Per Rack)

Page 16: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16 16

Components POC DEV QA Production Software OS RHEL 6.4 RHEL 6.4 RHEL 6.4 RHEL 6.4

Hadoop Distribution MapR M7 3.1.0 MapR M7 3.1.0 MapR M7 3.1.0 MapR M7 3.1.0

Server-Cluster Cisco UCS Servers UCS C210 M2 UCS C210 M2/ C240 M3

UCS C240 M3 UCS C240 M3

Processor Intel® Xeon® X5675

Intel® Xeon® X5675

Intel® Xeon® X5675

Intel® Xeon® E5-2655

Memory per Node 48 GB 48 GB / 256 GB 256 GB 256 GB

Storage per Node (HDFS)

14*1 TB 7200 RPM SATA

14*1 TB / 22 *1TB 7200 RPM SATA

22*1 TB 7200 RPM SATA

22*1 TB 7200 RPM SATA

Rack Level No. of Nodes 4 18 8 54

Processors/Cores 48 240 128 864

Memory 4x48=192 GB 12x48 + 6x256 GB 8x256 GB 54x256 = 13.8 TB

Storage Capacity ( 3 way Replication, Compression)

4x18 = 72 TB 12x14 + 6x22 = 257 TB

150TB 1188 TB

Hadoop Lifecycles

Page 17: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

Cisco Confidential 17 © 2013-2014 Cisco and/or its affiliates. All rights reserved.

Business Benefits

Cisco UCS Big Data Common Platform (CPA) A Highly Scalable Architecture Designed to Meet Variety of Scale-Put Application Demands §  UCS Fabric Interconnects provide high-speed, fully

redundant, active-active connectivity

§  Unified fabric (single wire management) §  66% reduction in switch ports

§  66% reduction in cables

§  Powered by UCS C-Series Rack servers §  Form factor extension to UCS blade system

§  UCS Manager §  Global view of the cluster §  Proactive monitoring of health

§  1 Click system software management

§  UCS Central §  Unified management across clusters

(thousands of nodes) §  Application isolation

Business Benefits §  Operational Simplification: Simplified and policy-based management §  Modular Solution: Modular framework that can scale from small to very

large

§  Risk Reduction: Pre-validation, tighter integration and optimizations reduce integration and deployment risk

§  Lower TCO: Unified fabric, unified management and infrastructure optimized for performance lowers TCO significantly

Architectural Benefits §  Scalability: Modular building block, scalable up to

7.2 PB with single management domain

§  Performance: Best-in-class performance of compute and network for massively scale-out applications

§  Management and Monitoring: Unified management across clusters (thousands of nodes)

Hadoop Requirements Distributed powerful computing Reliable Hardware Local storage in PB Low Latency Low Cost Scalability and Performance Manageability

Page 18: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

Hadoop Use Cases Organization

(vs) Adoption Level

Production Pipeline

EDS - iCAM - Party Ranking Service - Teradata ETL Offload - Data Lake

CSTG

- Connected Analytics Network Deployment (CAND) - Smart Call Home - Cloud Consumption (Sentinel)

- NOS Online - Network SSOT

Marketing - Multi-Channel Scoring - Automatic Qualified Leads

CWCS Metadata - Content Auto-Tagging

CITS - Cisco Partner Annuity Initiative - Social Media Services

GIS - Collaboration Dashboard - Item, BOM & Compliance Data Analytics

Legal - Data Warehouse Expansion

Supply Chain - Measurement - ACTS - TST

Page 19: Cisco IT’s Hadoop Adventure with MapR - Big Data · PDF fileCisco IT’s Hadoop Adventure with MapR Robert Novak Cisco Big Data Partner CSE October 2015 With thanks to Alex Garbarini

© 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

Hadoop Platform Security Current State

Load Balanced

Job Submission

Edge Servers Sqoop

A tool for moving data to/from non-Hadoop data stores Pig

A high level data flow language Hive

SQL like language to query and analyze data using MR Impala

Interactive SQL tool on Hadoop Mahout

Data mining algorithm using MR R

Statistical & Machine Learning language Oozie

A job control workflow Flume

Tool to ingest/stream log data TES Agent

To allow scheduled jobs to execute

CLDB MapR-FS, Job Tracker ZooKeeper

Generic User ID

Admin

ACL to limit access

Used for Authentication Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr)

Hadoop Developer/ Data Analyst

Secure Shell Login

Tableau Dashboards

Penthao BI & DI Platform

Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr)

iCAM Servers

Port opened for Hadoop Services (CLDB, Jobtracker, File System & Zookeepr)

Hadoop Admins

Business User Replication