31
Cisco Confidential NDA Required. 1 © 2013 Cisco and/or its affiliates. All rights reserved. Designing Hadoop Infrastructure with Cisco Data Center Solutions, Blueprint for Success. Ami Ben-Amram, [email protected] Data CenterArchitecture Leader, Cisco

3. ami big data hadoop on ucs seminar may 2013

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 3. ami big data hadoop on ucs seminar may 2013

Cisco Confidential NDA Required. 1 © 2013 Cisco and/or its affiliates. All rights reserved.

Designing Hadoop Infrastructure with Cisco Data Center Solutions,

Blueprint for Success. Ami Ben-Amram, [email protected] Data Center Architecture Leader, Cisco

Page 2: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 2

Massively Parallel

Processing; RDBMS

for EDW

Unstructured

Key-Value Store

Database

Document Database

Apache Opens Source Project

Manage and Process Massive Amounts of Data

No SQL MPP Databases

Had

oo

p

Cisco has partnered with leading software providers to offer a comprehensive

infrastructure and management solution to Big Data..

Page 3: 3. ami big data hadoop on ucs seminar may 2013

3 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.

Database

NoSQL Database

Tested and Validated Reference Architectures, Joint engineering Lab

Solution Bundles

Technical Collaterals

Apache-Hadoop reengineered

UCS is the exclusive hardware

reference

Several joint engagements

MPP Column store

UCS is exclusive hardware reference

UCS is the only partner platform

Commercial, distributed key-value

database.

MPP row store

Apache-Hadoop software and services

Few 100 node production cluster (UCSM)

Commercial

Document-oriented database

Page 4: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 4

Page 5: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 5

Small Flows/Messaging (Heart-beats, Keep-alive, delay sensitive

application messaging)

Small – Medium Incast (Hadoop Shuffle, Scatter-Gather, Distributed Storage)

Large Flows (HDFS Insert, File Copy)

Large Incast (Hadoop Replication, Distributed Storage)

Page 6: 3. ami big data hadoop on ucs seminar may 2013

6

Many-to-Many Traffic Pattern

Map 1 Map 2 Map N Map 3

Reducer 1 Reducer 2 Reducer 3 Reducer N

HDFS

Shuffle

Output

Replication

NameNode

JobTracker

ZooKeeper

Page 7: 3. ami big data hadoop on ucs seminar may 2013

Analyze Simulated with

Shakespeare

Wordcount

[ 10s-20s Mgbps]

Extract Transform Load

(ETL) Simulated with

Yahoo TeraSort

[ Larger than 1 Gbps]

Extract Transform Load

(ETL) Simulated with

Yahoo TeraSort with output

replication

[ 2 – 4 Gbps]

Job Patterns have varying impact on network utilization

Job Pattern - network graph of data coming into one node.

Page 8: 3. ami big data hadoop on ucs seminar may 2013

8

Map 1 Map 2 Map N Map 3

Reducer

1

Reducer

2

Reducer

3

Reducer

N

HDFS

Shuffle

Output

Replication

Region

Server

Region

Server

Client Client

Major

Compaction

Read Read

Read

Update

Update

Read

Major

Compaction

Page 9: 3. ami big data hadoop on ucs seminar may 2013

9

Hbase During Major Compaction.

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

Latency(us)

Time

UPDATE-AverageLatency(us) READ-AverageLatency(us) QoS-UPDATE-AverageLatency(us) QoS-READ-AverageLatency(us)

Read/Update

Latency

Comparison of

Non-QoS vs. QoS

Policy

~45% for Read

Improvement

Switch Buffer

Usage

With Network

QoS Policy to

prioritize Hbase

Update/Read

Operations

every 24 hours HBase wakes up and has this stampede of elephants that does this

massive push into HDFS.

Page 10: 3. ami big data hadoop on ucs seminar may 2013

10 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.

Validated 96 Node Hadoop Cluster

• Network

Three Racks each with 32 nodes

Distribution Layer – Nexus 7000 or Nexus 5000

ToR – FEX or Nexus 3000

2 FEX per Rack

Each Rack with either 32 single or dual attached host

• Hadoop Framework

Apache 0.20.2

Linux 6.2

Slots – 10 Maps & 2 Reducers per node

• Compute – UCS C200 M2

Cores: 12 Processor: 2 x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz Disk: 4 x 2TB (7.2K RPM) Network: 1G: LOM, 10G: Cisco UCS P81E

Name Node

Cisco UCS C200

Single NIC

2248TP-E

Nexus 5548 Nexus 5548

Data Nodes 1 – 48

Cisco UCS C 200 Single NIC

… Data Nodes 49- 96

Cisco UCS 200 Single NIC

Traditional DC Design Nexus 55xx/2248

2248TP-E

Name Node

Cisco UCS C 200

Single NIC

Nexus 7000 Nexus 7000

Data Nodes 1 – 48

Cisco UCS C 200 Single NIC

… Data Nodes 49 - 96

Cisco UCS C 200 Single NIC

Nexus 3000 Nexus 3000

Nexus 7K-N3K based Topology

Page 11: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 11

Page 12: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 12

• companies are often challenged by the complexities of traditional server solutions.

• Big data solutions must enable high performance and scale as the business demands.

• To meet these requirements Cisco designed a comprehensive solution: Cisco® Common Platform Architecture (CPA) for Big Data.

• Cisco CPA for Big Data includes compute, storage, connectivity, and unified management features that enable rapid deployment, predictable performance, and reduced total cost of ownership (TCO).

• In addition to these benefits, Cisco CPA for Big Data offers unique data and management integration with enterprise applications hosted on the Cisco Unified Computing System™ (Cisco UCS®)

Page 13: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 13 13

TECHNICAL LEADERSHIP MARKET MOMENTUM

• Unified Infrastructure

• Management Automation

• Design Flexibility

• Optimize for virtualization.

• Best Cloud Infrastructure

• 61 industry benchmark world

records

• $2 billion revenue run rate

• 20,000 customers: almost

50% of Fortune 500

• #2 US blade server market

share by revenue

• #3 WW blade server market

share by revenue

• More than 200 customers in

Israel.

Page 14: 3. ami big data hadoop on ucs seminar may 2013

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

UCS 6200 Series

Fabric Internments:

High speed connectivity and

management, integration

with enterprise application

on blades

Nexus 2232 Fabric Extenders:

Scalability at lower cost

UCS Manager

UCS 240 M3 Servers:

Compute, storage

LAN, SAN, Management

Building Blocks Cisco Big Data Common Platform (CPA) is a highly scalable architecture

designed to meet variety of scale-out application demands

UCS Central

Page 15: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 15

Big Data

High Performance

Rack

UCS-EZ-BD-HP

(2) UCS 96-Port 6296

Fabric Interconnect

(2) Nexus 2232 PP

(16) UCS C240 M3 Servers

w/ dual Intel Xeon E5-2665

2.4 GHz Processors, 256GB

of Memory, 1 x Mega RAID

9266-CV-8i Card, 24 x 1TB

7.2K SATA HDDs

MPP

High Performance

Half-Rack

UCS-EZ-BD-MPP

(2) UCS 96-Port 6248 Fabric

Interconnect

(2) Nexus 2232 PP

(8) UCS C240 M3 Servers w/

dual Intel Xeon E5-2690 2.9 GHz

Processors, 256GB of Memory, 1

x Mega RAID 9266-CV-8i Card,

24 x 600GB 10K SAS HDDs

Solution Bundles

Big Data

High Capacity

Rack

UCS-EZ-BD-HC

(2) UCS 96-Port 6296

Fabric Interconnect

(2) Nexus 2232 PP

(16) UCS C240 M3

Servers w/ dual Intel Xeon

E5-2640 2.5 GHz

Processors, 128GB of

Memory, 1 x Mega RAID

9266-CV-8i Card, 12 x

3TB 7.2K SAS HDDs

Storage Density Optimized;

Low $/TB (under $500/TB) Balanced Compute and IO Bandwidth;

Price-Performance Optimized

High Performance Compute and IO

Bandwidth and IOPS (under $10K/GBPS)

Optimized for Cost, Tested and Validated for Performance and Rapid Deployments

Additional Racks

2 x N2K-UCS2232PF

16 x UCS-EZ-C240-2665

Additional Racks

2 x N2K-UCS2232PF

16 x UCS-EZ-C240-2640

Additional Servers

UCS-EZ-C240-2690

Page 16: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 16

Performance

Optimized (SAS)

Capacity

Optimized (SAS)

Price-Performance

Optimized (SATA)

C240 M3 (SFF) C240 M3 (SFF) C240 M3 (LFF) C220 M3 (SFF)

RU 2 2 2 2

CPU E5-2690 E5-2665 E5-2640 E5-2680

Cores 16 16 12 16

Memory 256GB 256GB 128GB 256GB

Disk Drives

24 x (300 GB 15K,

600GB 10K,

900GB 15K)

24 x 1TB 7.2K 12 x 3TB 7.2K External

Compute Units

NOSH

Compute

Page 17: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 17

Differentiation 0:

Big Data Benefits

• Unified Management - UCS Manager

• Unified Fabric - “Single Wire Management”

• Seamless management integration and data integration

• Direct SAN access

6200

Fabric A

6200

Fabric B

B200 B200

CNA

F

E

X

A

CNA CNA

F

E

X

B

F

E

X

A

F

E

X

B

SAN A SAN B ETH 1 ETH 2

MG

MT MG

MT

Chassis 1 Chassis 2

Fabric Switch

Fabric Extenders

Uplink

Ports

Compute Blades

Half/Full Width

OOB

Mgmt

Server Ports

Virtualized Adapters

6200

Fabric A

6200

Fabric B

B200 Blade

CN

A

F

E

X

B

F

E

X

A

SAN A SAN B ETH 1 ETH 2

M

G

M

T

M

G

M

T

Chassis 1

Fabric

Switch

Fabric Extenders

Uplink

Ports

Compute Blades

Half/Full Width

OOB

Mgmt

Server Ports

Virtualized Adapters

C240 Rack

Mount

CNA

FEX A FEX B

Page 18: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 18

Big Data

• Dozens to 100s of severs are typical

• 20–50% annual growth

UCSM Enables

• Global view of the cluster

• Proactive monitoring of health

• 1 Click software bios and firmware upgrades

• 1 Click bios setting

• 1 Click tunables like jumbo frames

UCS Central Enables

• Scaling to large cluster

• Application isolation

Unified Management

A Single Unified System

For Blade and Rack Servers

C-Series Rack Optimized Servers

Differentiation 1:

Page 19: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 19

Big Data Benefits

• Optimized service profile template for CPA enable quick and consistent deployments

• One click power shell script to configure CPA.

LAN

SAN

•RAID settings

•Disk scrub actions

•Number of vHBAs

•HBA WWN assignments

•FC Boot Parameters

•HBA firmware

•FC Fabric assignments for

HBAs

•QoS settings

•Border port assignment per

vNIC

•NIC Transmit/Receive Rate

Limiting

•VLAN assignments for NICs

•VLAN tagging config for NICs

•Number of vNICs

•PXE settings

•NIC firmware

•Advanced feature settings

•Remote KVM IP settings

•Call Home behavior

•Remote KVM firmware

•Server UUID

•Serial over LAN settings

•Boot order

•IPMI settings

•BIOS scrub actions

•BIOS firmware

•BIOS Settings

LAN

SAN

Traditional UCS Service Profile

Differentiation 2:

Page 20: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 20

Big Data Benefits

• “Single Wire Management”

• Fully redundant active-active fabric cluster interconnect

• Can be configured for direct SAN access

Traditional Unified Fabric

10 GE Ethernet

Cisco VIC Technology

66% Less Switch Ports and Cables

Differentiation 3:

Page 21: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 21

Data Center Applications Big Data Applications

Unified Fabric

Unified Management

Integrated

Data

Management

Data Integration Using Connectors

Data Feeds

Cisco Big Data Common Platform

Architecture

Using C-Series Rack-Mount Servers

Cisco UCS B-Series

Blade Servers

SAN

Array

Cisco UCS Big Data Common Platform Architecture: Extending Enterprise Application Ecosystem to Big Data

Hadoop

NoSQL

MPP Database

RN

Differentiation 4:

Page 22: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 22

No Additional Switching for up to 10 Racks (160 Servers)

10,000 using UCS Central

Example Configuration:

Servers Per Domain

(Pair of Fabric Interconnects) North-Bound Bandwidth

(GBits/sec) Any Node to Any Node Bandwidth

(GBits/sec)

160 320 10

144 480 10

128 640 10

Differentiation 5:

Page 23: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 23

Page 25: 3. ami big data hadoop on ucs seminar may 2013

workload automation facilitates the flow of data

costs

Twitter

Feeds

Map Reduce

Hive

BI Analytics

SQL

Sqoop

Map Reduce

Map Reduce

Call logs

Web Clicks

Gather Data Data Integration Load Data Data Analysis Report Generation

and Distribution

Web Services

SSH

DB/JDBC

ERP/CRM

Data Mover

Sqoop

MapReduce

Informatica

Hive

Sqoop

Informatica

Business Objects

Cognos

Web Services

Page 26: 3. ami big data hadoop on ucs seminar may 2013

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

Reports Dashboards

Analytics

OLAP

Alerts ERP Applications

DB

CRM Applications

DB

DW

DW

ERP/CRM Apps

& Databases

Data Exchange

System(s)

ETL/DW/Big Data/BI

Systems & Applications

Manages Enterprise Workloads

DW

Data

Inte

gra

tion

Business Intelligence

Application(s)

File Drop Box FTP/SFTP/FTPS

Saas, AWS FTP Server

DB

API Feeds

(Twitter, FB, LI etc)

Big Data

Page 27: 3. ami big data hadoop on ucs seminar may 2013

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27

Data

Acquisition

Data load Analysis of

Sales Data

Export to

Enterprise Generate

Report

1

1

2

2

3

3 4

4

Page 28: 3. ami big data hadoop on ucs seminar may 2013

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28

Integrated

Cisco UCS

Server

Management

Integrated

Network

Management w/

Fabric

Interconnect

and Nexus

Switches

Integrated

Data

Management

Cisco UCS

B-series

Cisco UCS

C-series

w/ Direct Attach

Storage

Data Center Applications Big Data Applications

Cisco Workload Automation Delivers Automated

Business Processing Abstraction Layer

Data Feeds

Big Data

Jobs

Data Center Applications

Automated Backup and

Storage

In/out of Big Data Grids

Rapid error free deployment – service profile

Maintenance activities like BIOS, FW upgrade across the cluster

Monitoring the health, power

Seamless data movement

Page 29: 3. ami big data hadoop on ucs seminar may 2013

Thank you.

Page 30: 3. ami big data hadoop on ucs seminar may 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 30

• Hadoop has many building blocks…At the core it is an architecture to store and process unstructured and semi-structured data…

Hadoop Distributed File System

(HDFS)

At the base is a

Self-healing

clustered storage

system.

Map-Reduce Distributed Data

Processing

PIG Hive Sqoop Top level

abstractions

Top level

Interfaces ETL Tools

BI

Reporting RDBMS

HBASE

Database with

Real-time

access

Page 31: 3. ami big data hadoop on ucs seminar may 2013

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

Extreme Performance

Optimized for fast query

execution and unmatched

data loading

Elastic Scalability

Expand capacity and

performance

Highly Available

Fully redundant and

reliable configuration

Unified Networking

Converged data and

management plane

networking

Rapidly Deployable

Pre-validated configuration,

rapid deployment via

service profiles

Unified Management

Power of UCS Manager

to manage the compute,

networking, I/O

Industry Leading

Partnerships

Joint solutions with major

software players

Enterprise Application

Integration

Seamless integration

with enterprise

applications on blades