View
926
Download
1
Category
Preview:
DESCRIPTION
Citation preview
Cisco Confidential NDA Required. 1 © 2013 Cisco and/or its affiliates. All rights reserved.
Designing Hadoop Infrastructure with Cisco Data Center Solutions,
Blueprint for Success. Ami Ben-Amram, amib@cisco.com Data Center Architecture Leader, Cisco
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 2
Massively Parallel
Processing; RDBMS
for EDW
Unstructured
Key-Value Store
Database
Document Database
Apache Opens Source Project
Manage and Process Massive Amounts of Data
No SQL MPP Databases
Had
oo
p
Cisco has partnered with leading software providers to offer a comprehensive
infrastructure and management solution to Big Data..
3 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.
Database
NoSQL Database
Tested and Validated Reference Architectures, Joint engineering Lab
Solution Bundles
Technical Collaterals
Apache-Hadoop reengineered
UCS is the exclusive hardware
reference
Several joint engagements
MPP Column store
UCS is exclusive hardware reference
UCS is the only partner platform
Commercial, distributed key-value
database.
MPP row store
Apache-Hadoop software and services
Few 100 node production cluster (UCSM)
Commercial
Document-oriented database
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 4
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 5
Small Flows/Messaging (Heart-beats, Keep-alive, delay sensitive
application messaging)
Small – Medium Incast (Hadoop Shuffle, Scatter-Gather, Distributed Storage)
Large Flows (HDFS Insert, File Copy)
Large Incast (Hadoop Replication, Distributed Storage)
6
Many-to-Many Traffic Pattern
Map 1 Map 2 Map N Map 3
Reducer 1 Reducer 2 Reducer 3 Reducer N
HDFS
Shuffle
Output
Replication
NameNode
JobTracker
ZooKeeper
Analyze Simulated with
Shakespeare
Wordcount
[ 10s-20s Mgbps]
Extract Transform Load
(ETL) Simulated with
Yahoo TeraSort
[ Larger than 1 Gbps]
Extract Transform Load
(ETL) Simulated with
Yahoo TeraSort with output
replication
[ 2 – 4 Gbps]
Job Patterns have varying impact on network utilization
Job Pattern - network graph of data coming into one node.
8
Map 1 Map 2 Map N Map 3
Reducer
1
Reducer
2
Reducer
3
Reducer
N
HDFS
Shuffle
Output
Replication
Region
Server
Region
Server
Client Client
Major
Compaction
Read Read
Read
Update
Update
Read
Major
Compaction
9
Hbase During Major Compaction.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Latency(us)
Time
UPDATE-AverageLatency(us) READ-AverageLatency(us) QoS-UPDATE-AverageLatency(us) QoS-READ-AverageLatency(us)
Read/Update
Latency
Comparison of
Non-QoS vs. QoS
Policy
~45% for Read
Improvement
Switch Buffer
Usage
With Network
QoS Policy to
prioritize Hbase
Update/Read
Operations
every 24 hours HBase wakes up and has this stampede of elephants that does this
massive push into HDFS.
10 © 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required.
Validated 96 Node Hadoop Cluster
• Network
Three Racks each with 32 nodes
Distribution Layer – Nexus 7000 or Nexus 5000
ToR – FEX or Nexus 3000
2 FEX per Rack
Each Rack with either 32 single or dual attached host
• Hadoop Framework
Apache 0.20.2
Linux 6.2
Slots – 10 Maps & 2 Reducers per node
• Compute – UCS C200 M2
Cores: 12 Processor: 2 x Intel(R) Xeon(R) CPU X5670 @ 2.93GHz Disk: 4 x 2TB (7.2K RPM) Network: 1G: LOM, 10G: Cisco UCS P81E
Name Node
Cisco UCS C200
Single NIC
2248TP-E
Nexus 5548 Nexus 5548
Data Nodes 1 – 48
Cisco UCS C 200 Single NIC
… Data Nodes 49- 96
Cisco UCS 200 Single NIC
…
Traditional DC Design Nexus 55xx/2248
2248TP-E
Name Node
Cisco UCS C 200
Single NIC
Nexus 7000 Nexus 7000
Data Nodes 1 – 48
Cisco UCS C 200 Single NIC
… Data Nodes 49 - 96
Cisco UCS C 200 Single NIC
…
Nexus 3000 Nexus 3000
Nexus 7K-N3K based Topology
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 11
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 12
• companies are often challenged by the complexities of traditional server solutions.
• Big data solutions must enable high performance and scale as the business demands.
• To meet these requirements Cisco designed a comprehensive solution: Cisco® Common Platform Architecture (CPA) for Big Data.
• Cisco CPA for Big Data includes compute, storage, connectivity, and unified management features that enable rapid deployment, predictable performance, and reduced total cost of ownership (TCO).
• In addition to these benefits, Cisco CPA for Big Data offers unique data and management integration with enterprise applications hosted on the Cisco Unified Computing System™ (Cisco UCS®)
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 13 13
TECHNICAL LEADERSHIP MARKET MOMENTUM
• Unified Infrastructure
• Management Automation
• Design Flexibility
• Optimize for virtualization.
• Best Cloud Infrastructure
• 61 industry benchmark world
records
• $2 billion revenue run rate
• 20,000 customers: almost
50% of Fortune 500
• #2 US blade server market
share by revenue
• #3 WW blade server market
share by revenue
• More than 200 customers in
Israel.
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
UCS 6200 Series
Fabric Internments:
High speed connectivity and
management, integration
with enterprise application
on blades
Nexus 2232 Fabric Extenders:
Scalability at lower cost
UCS Manager
UCS 240 M3 Servers:
Compute, storage
LAN, SAN, Management
Building Blocks Cisco Big Data Common Platform (CPA) is a highly scalable architecture
designed to meet variety of scale-out application demands
UCS Central
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 15
Big Data
High Performance
Rack
UCS-EZ-BD-HP
(2) UCS 96-Port 6296
Fabric Interconnect
(2) Nexus 2232 PP
(16) UCS C240 M3 Servers
w/ dual Intel Xeon E5-2665
2.4 GHz Processors, 256GB
of Memory, 1 x Mega RAID
9266-CV-8i Card, 24 x 1TB
7.2K SATA HDDs
MPP
High Performance
Half-Rack
UCS-EZ-BD-MPP
(2) UCS 96-Port 6248 Fabric
Interconnect
(2) Nexus 2232 PP
(8) UCS C240 M3 Servers w/
dual Intel Xeon E5-2690 2.9 GHz
Processors, 256GB of Memory, 1
x Mega RAID 9266-CV-8i Card,
24 x 600GB 10K SAS HDDs
Solution Bundles
Big Data
High Capacity
Rack
UCS-EZ-BD-HC
(2) UCS 96-Port 6296
Fabric Interconnect
(2) Nexus 2232 PP
(16) UCS C240 M3
Servers w/ dual Intel Xeon
E5-2640 2.5 GHz
Processors, 128GB of
Memory, 1 x Mega RAID
9266-CV-8i Card, 12 x
3TB 7.2K SAS HDDs
Storage Density Optimized;
Low $/TB (under $500/TB) Balanced Compute and IO Bandwidth;
Price-Performance Optimized
High Performance Compute and IO
Bandwidth and IOPS (under $10K/GBPS)
Optimized for Cost, Tested and Validated for Performance and Rapid Deployments
Additional Racks
2 x N2K-UCS2232PF
16 x UCS-EZ-C240-2665
Additional Racks
2 x N2K-UCS2232PF
16 x UCS-EZ-C240-2640
Additional Servers
UCS-EZ-C240-2690
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 16
Performance
Optimized (SAS)
Capacity
Optimized (SAS)
Price-Performance
Optimized (SATA)
C240 M3 (SFF) C240 M3 (SFF) C240 M3 (LFF) C220 M3 (SFF)
RU 2 2 2 2
CPU E5-2690 E5-2665 E5-2640 E5-2680
Cores 16 16 12 16
Memory 256GB 256GB 128GB 256GB
Disk Drives
24 x (300 GB 15K,
600GB 10K,
900GB 15K)
24 x 1TB 7.2K 12 x 3TB 7.2K External
Compute Units
NOSH
Compute
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 17
Differentiation 0:
Big Data Benefits
• Unified Management - UCS Manager
• Unified Fabric - “Single Wire Management”
• Seamless management integration and data integration
• Direct SAN access
6200
Fabric A
6200
Fabric B
B200 B200
CNA
F
E
X
A
CNA CNA
F
E
X
B
F
E
X
A
F
E
X
B
SAN A SAN B ETH 1 ETH 2
MG
MT MG
MT
Chassis 1 Chassis 2
Fabric Switch
Fabric Extenders
Uplink
Ports
Compute Blades
Half/Full Width
OOB
Mgmt
Server Ports
Virtualized Adapters
6200
Fabric A
6200
Fabric B
B200 Blade
CN
A
F
E
X
B
F
E
X
A
SAN A SAN B ETH 1 ETH 2
M
G
M
T
M
G
M
T
Chassis 1
Fabric
Switch
Fabric Extenders
Uplink
Ports
Compute Blades
Half/Full Width
OOB
Mgmt
Server Ports
Virtualized Adapters
C240 Rack
Mount
CNA
FEX A FEX B
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 18
Big Data
• Dozens to 100s of severs are typical
• 20–50% annual growth
UCSM Enables
• Global view of the cluster
• Proactive monitoring of health
• 1 Click software bios and firmware upgrades
• 1 Click bios setting
• 1 Click tunables like jumbo frames
UCS Central Enables
• Scaling to large cluster
• Application isolation
Unified Management
A Single Unified System
For Blade and Rack Servers
C-Series Rack Optimized Servers
Differentiation 1:
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 19
Big Data Benefits
• Optimized service profile template for CPA enable quick and consistent deployments
• One click power shell script to configure CPA.
LAN
SAN
•RAID settings
•Disk scrub actions
•Number of vHBAs
•HBA WWN assignments
•FC Boot Parameters
•HBA firmware
•FC Fabric assignments for
HBAs
•QoS settings
•Border port assignment per
vNIC
•NIC Transmit/Receive Rate
Limiting
•VLAN assignments for NICs
•VLAN tagging config for NICs
•Number of vNICs
•PXE settings
•NIC firmware
•Advanced feature settings
•Remote KVM IP settings
•Call Home behavior
•Remote KVM firmware
•Server UUID
•Serial over LAN settings
•Boot order
•IPMI settings
•BIOS scrub actions
•BIOS firmware
•BIOS Settings
LAN
SAN
Traditional UCS Service Profile
Differentiation 2:
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 20
Big Data Benefits
• “Single Wire Management”
• Fully redundant active-active fabric cluster interconnect
• Can be configured for direct SAN access
Traditional Unified Fabric
10 GE Ethernet
Cisco VIC Technology
66% Less Switch Ports and Cables
Differentiation 3:
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 21
Data Center Applications Big Data Applications
Unified Fabric
Unified Management
Integrated
Data
Management
Data Integration Using Connectors
Data Feeds
Cisco Big Data Common Platform
Architecture
Using C-Series Rack-Mount Servers
Cisco UCS B-Series
Blade Servers
SAN
Array
Cisco UCS Big Data Common Platform Architecture: Extending Enterprise Application Ecosystem to Big Data
Hadoop
NoSQL
MPP Database
RN
Differentiation 4:
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 22
No Additional Switching for up to 10 Racks (160 Servers)
10,000 using UCS Central
Example Configuration:
Servers Per Domain
(Pair of Fabric Interconnects) North-Bound Bandwidth
(GBits/sec) Any Node to Any Node Bandwidth
(GBits/sec)
160 320 10
144 480 10
128 640 10
Differentiation 5:
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 23
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 24
• Tested and Validated Reference Architectures
• Joint engineering Lab
• Solution Bundles
• Technical Collaterals
NoSQL Database
No SQL MPP Databases
Had
oo
p
workload automation facilitates the flow of data
costs
Feeds
Map Reduce
Hive
BI Analytics
SQL
Sqoop
Map Reduce
Map Reduce
Call logs
Web Clicks
Gather Data Data Integration Load Data Data Analysis Report Generation
and Distribution
Web Services
SSH
DB/JDBC
ERP/CRM
Data Mover
Sqoop
MapReduce
Informatica
Hive
Sqoop
Informatica
Business Objects
Cognos
Web Services
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Reports Dashboards
Analytics
OLAP
Alerts ERP Applications
DB
CRM Applications
DB
DW
DW
ERP/CRM Apps
& Databases
Data Exchange
System(s)
ETL/DW/Big Data/BI
Systems & Applications
Manages Enterprise Workloads
DW
Data
Inte
gra
tion
Business Intelligence
Application(s)
File Drop Box FTP/SFTP/FTPS
Saas, AWS FTP Server
DB
API Feeds
(Twitter, FB, LI etc)
Big Data
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 27
Data
Acquisition
Data load Analysis of
Sales Data
Export to
Enterprise Generate
Report
1
1
2
2
3
3 4
4
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 28
Integrated
Cisco UCS
Server
Management
Integrated
Network
Management w/
Fabric
Interconnect
and Nexus
Switches
Integrated
Data
Management
Cisco UCS
B-series
Cisco UCS
C-series
w/ Direct Attach
Storage
Data Center Applications Big Data Applications
Cisco Workload Automation Delivers Automated
Business Processing Abstraction Layer
Data Feeds
Big Data
Jobs
Data Center Applications
Automated Backup and
Storage
In/out of Big Data Grids
Rapid error free deployment – service profile
Maintenance activities like BIOS, FW upgrade across the cluster
Monitoring the health, power
Seamless data movement
Thank you.
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential NDA Required. 30
• Hadoop has many building blocks…At the core it is an architecture to store and process unstructured and semi-structured data…
Hadoop Distributed File System
(HDFS)
At the base is a
Self-healing
clustered storage
system.
Map-Reduce Distributed Data
Processing
PIG Hive Sqoop Top level
abstractions
Top level
Interfaces ETL Tools
BI
Reporting RDBMS
HBASE
Database with
Real-time
access
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31
Extreme Performance
Optimized for fast query
execution and unmatched
data loading
Elastic Scalability
Expand capacity and
performance
Highly Available
Fully redundant and
reliable configuration
Unified Networking
Converged data and
management plane
networking
Rapidly Deployable
Pre-validated configuration,
rapid deployment via
service profiles
Unified Management
Power of UCS Manager
to manage the compute,
networking, I/O
Industry Leading
Partnerships
Joint solutions with major
software players
Enterprise Application
Integration
Seamless integration
with enterprise
applications on blades
Recommended