Big Data Architecture and Deployment
Robert Feng
TSA
• Big Data Overview
• Meet New Big Data Requirement - Cisco Common Platform Architecture
• Big Data Applications Co-exist with Enterprise Applications
• Agile Big Data Application Integration with Programmable UCS
• Hadoop Cluster Deployment Automation
• Big Data Performance Enhancement with Cisco ACI
• Summary
Agenda
Big Data Overview
Big Data? So What Has Changed?
The Explosion of Unstructured Data
2005 2015 2010
• More than 90% is unstructured
data
• Approx. 500 quadrillion files
• Quantity doubles every 2 years
• Most unstructured data is neither
stored nor analysed!
1.8 trillion gigabytes of data
was created in 2011…
10,000
0
GB
of
Data
(I
N B
ILL
ION
S)
STRUCTURED DATA
UNSTRUCTURED DATA
Source: Cloudera
Three Common Big Data Architectures
NoSQL
Fast key-value store/retrieve in real time
Hadoop
Distributed batch, query, and processing platform
MPP Relational Database
Scale-out BI/DW
Hadoop Server Hardware Evolving in the Enterprise
Typical 2009 Hadoop node
• 1RU server
• 4 x 1TB 3.5” spindles
• 2 x 4-core CPU
• 1 x GE
• 24 GB RAM
• Single PSU
• Running Apache
• $
Economics favor “fat” nodes
• 6x-9x more data/node
• 3x-6x more IOPS/node
• Saturated gigabit, 10GE on the rise
• Fewer total nodes lowers licensing/support costs
• Increased significance of node and switch failure
Typical 2015 Hadoop node
• 2RU server
• 12 x 4TB 3.5” or 24 x 1TB 2.5” spindles
• 2 x 6-12 core CPU
• 2 x 10GE
• 128-256 GB RAM
• Dual PSU
• Running commercial/licensed distribution
• $$$
Hadoop Server Trends
Source: HadoopWorld session, Cloudera speaker
7
• Fat, dense nodes • New features and
applications (Impala, Drill, HBase, etc.) drive RAM demands
• Argues for somewhat higher-end CPA-like configs to provide a cushion for growth
Meet New Big Data Requirement -
Cisco Common Platform Architecture (CPA)
Cisco UCS Common Platform Architecture (CPA) Building Blocks for Big Data
UCS 6200 Series Fabric Interconnects
Nexus 2232 Fabric Extenders
(optional)
UCS Manager
UCS C220/C240 M4 Servers
LAN, SAN, Management
New UCS Reference Configurations for Big Data
Quarter-Rack UCS
Solution for MPP,
NoSQL – High
Performance
Full Rack UCS
Solution for Hadoop
Capacity-Optimised
Full Rack UCS
Solution for Hadoop,
NoSQL – Balanced
2 x UCS 6248
8 x C220 M4 (SFF)
2 x E5-2680v3
256GB
6 x 400-GB SAS SSD
2 x UCS 6296
16 x C240 M4 (LFF)
2 x E5-2620v3
128GB
12 x 4TB 7.2K SATA
2 x UCS 6296
16 x C240 M4 (SFF)
2 x E5-2680v3
256GB
24 x 1.2TB 10K SAS
UCS C3160 Dense Storage Rack Server
Up to 360TB in 4RU
Server Node 2x E5-2600 V2 CPUs
128/256GB RAM 1GB/4GB RAID Cache
Optional Disk Expansion 4x hot-swappable, rear-load
LFF 4TB/6TB HDD
HDD 4 Rows of hot-swappable HDD
4TB/6TB Total top load: 56 drives
Two 120GB SSDs (OS/Boot)
Big Data Applications Co-exist with Enterprise
Applications
Machine
Operational
(OLTP)
Operational
(OLTP) ETL
BI/Reports
Operational
(OLTP)
Enterprise Data Management with Big Data
Web
ETL
Dashboards
Big Data
(Hadoop, etc.)
MPP EDW EDW
Data Center Applications Big Data Applications
Unified Fabric
Unified Management
Integrated
Data
Management
Data Integration Using Connectors
Data Feeds
Cisco Big Data Common Platform
Architecture
Using C-Series Rack-Mount Servers
Cisco UCS B-Series
Blade Servers
SAN
Array
Hadoop
NoSQL
MPP DB
RN FlexPod, Vblock
Ability to manage and monitor enterprise applications running on blades with SAN storage and big
data applications running on rack-mount servers from single pane of glass
Big Data Applications co-exist with Enterprise Applications
Data Center Applications Big Data Applications
Unified Fabric
Unified Management
Integrated
Data
Management
Data Integration Using Connectors
Data Feeds
Cisco Big Data Common Platform
Architecture
Using C-Series Rack-Mount Servers
Cisco UCS B-Series
Blade Servers
SAN
Array
Hadoop
NoSQL
MPP DB
RN FlexPod, Vblock
Ability to manage and monitor enterprise applications running on blades with SAN storage and big
data applications running on rack-mount servers from single pane of glass
Big Data Applications co-exist with Enterprise Applications
Cisco UCS: Physical Architecture: Rack-Mount Server as a Form Factor Extension of Blades
6200
Fabric A
6200
Fabric B
B200
VIC
F
E
X
B
F
E
X
A
SAN A SAN B ETH 1 ETH 2
MGMT MGMT
Chassis 1
Fabric Switch
Fabric Extenders
Uplink Ports
Compute Blades
Half / Full width
OOB Mgmt
Server Ports
Virtualized Adapters
Cluster
Rack Mount C240
VIC
FEX A FEX B Optional, for
scalability
Cisco Virtual Interface Card (VIC)
PCIe x16
10GbE/FCoE
User Definable vNICs
Eth
0
FC
1 2
FC
3
Eth
256
Converged Network Adapter
FCoE in hardware
Bare metal and VM deployments
Virtualize in hardware
PCIe compliant
vNIC Fabric Failover
Up to 256 distinct PCIe devices
Ethernet vNIC and FC vHBA
QoS
8 queues
vNIC bandwidth guarantees
© 2012 Cisco and/or its affiliates. All rights reserved. 18
UCS Rack-Mount
Servers
UCS Blade
Servers
UCS Manager Deploy, Manage, Monitor
Cisco Tidal Enterprise Scheduler
Hadoop Connectors
Big Data
Ecosystem
SAN Arrays
Enterprise Applications
Availability
Backup Snapshot
Cisco UCS Combines Enterprise and Big Data Platform into One – Direct SAN Access
Extendable to Multidata Center Implementations for Disaster Recovery
and Business Continuity
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
SQL Node
SQL Node
SQL Node
SQL Node
SQL Node
http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-clusters-like-a-boss/
UCS Fabric Failover
• Fabric provides NIC failover capabilities chosen when defining a service profile
• Avoids traditional NIC bonding in the OS
• Provides failover for both unicast and multicast traffic
• Ideal for bare metal OS deployments
vNIC 1
10GE 10GE
vEth 1
OS / Hypervisor / VM
vEth 1
FEX FEX
Physical
Adapter Virtual
Adapter
6200-A 6200-B L1 L2
L1 L2
Physical Cable
Virtual Cable
Cisco
VIC 1225
Data Center Applications Big Data Applications
Unified Fabric
Unified Management
Integrated
Data
Management
Data Integration Using Connectors
Data Feeds
Cisco Big Data Common Platform
Architecture
Using C-Series Rack-Mount Servers
Cisco UCS B-Series
Blade Servers
SAN
Array
Hadoop
NoSQL
MPP DB
RN FlexPod, Vblock
Ability to manage and monitor enterprise applications running on blades with SAN storage and big
data applications running on rack-mount servers from single pane of glass
Big Data Applications co-exist with Enterprise Applications
• A major market transformation in unified server management
• No management barriers between blades and rack optimized servers
• Extending fabric computing to rack optimized servers
• Add capacity without complexity
Cisco UCS
Fabric Interconnect
UCS Management Administrative Parity for Blades and Rack Servers
Cisco
Fabric Extender
C-Series Rack
Optimized Servers
Unified Management
A Single Unified System
B-Series Blade
Servers
UCS Stateless Computing, Benefits
• Server identity no longer has to be tied to physical server hardware
– Profiles provide identity – Seamless server mobility – Stateless components
• Boot over network (LAN or SAN) – Boot order and boot devices are part of
the pre-defined logical server profile – On-board disks can be used for temp,
swap, etc.
• LAN and SAN Connectivity – # of NIC’s – # of HBA’s
Server Name: Bob
UUID: 56 4d cd 3f 59 5b 61…
MAC : 08:00:69:02:01:FC
WWN: 5080020000075740
Boot Order: SAN, LAN
Chassis-1/Blade-1
Chassis-9/Blade-5
Server Name: Bob
UUID: 56 4d cd 3f 59 5b 61…
UUID: 56 4d cd 3f 59 5b 61…
MAC : 08:00:69:02:01:FC
MAC : 08:00:69:02:01:FD
MAC : 08:00:69:02:01:FE
MAC : 08:00:69:02:01:FF
WWN: 5080020000075740
WWN: 5080020000075740
Boot Order: SAN, LAN
No infrastructure changes needed when moving a Service Profile
SAN LAN
© 2012 Cisco and/or its affiliates. All rights reserved. 24
UCS Rack-Mount
Servers
UCS Blade
Servers
UCS Manager Deploy, Manage, Monitor
Cisco Tidal Enterprise Scheduler
Hadoop Connectors
Big Data
Ecosystem
SAN Arrays
Enterprise Applications
Availability
Backup Snapshot
Cisco UCS Combines Enterprise and Big Data Platform into One – Compute Resource Pooling
Extendable to Multidata Center Implementations for Disaster Recovery
and Business Continuity
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
Hadoop Node
SQL Node
SQL Node
SQL Node
SQL Node
SQL Node SQL Node
Agile Big Data Application Integration with
Programmable UCS
UCS Manager Integration with Big Data Applications
26
UCS Manager Integration with Cloudera Manager – Server Infrastructure Manager – UCS Manager
– Big Data Application Manager – Cloudera Manager
Integration
Programmatic Infrastructure
27
XML API
Direct UCS CLI UCS GUI 3rd Party Customer
Self Serve portals
Management Tools
Auditing Tools
System Status
Physical Inventory
Logical Inventory
Comprehensive XML API
Single point of management – access to all domain knowledge
Broad 3rd party integration support
Faster custom integration for customer use cases
Consistent data and views across ALL interfaces
Cisco Developer Network (DevNet)
28
Developer Community developer.cisco.com/web/unifiedcomputing/home
Cisco UCS Platform Emulator (UCSPE)
Cisco UCS PowerTool PowerShell Library
Demo: Cisco UCS PowerTool
Downloads UCS Platform Emulator (UCSPE)
goUCS Automation Tool
Cisco UCS Powertool (PowerShell Module)
XML API, PowerShell code examples
Microsoft SCOM Management Pack for Cisco UCS
Microsoft SCOM Management Pack for Cisco UCS
Microsoft SCVMM UI Extension for Cisco UCS
Microsoft SCO Integration Pack for Cisco UCS
Documentation Developer Guides
Whitepapers
Reference Guides
Collaboration Blogs, videos and access to subject matter experts
Peer to peer forums
UCS Platform Emulator (UCSPE)
29
Hardware Independent Integration – Downloadable Virtual Machine
– Full feature emulator for UCS Manager
– Complete support for XML API calls
– Object Browser to navigate UCSM MIT
– Import and replicate physical UCS Manager physical inventory
– Share physical inventories among UCS Platform Emulators
– Drag-n-drop hardware builder to create custom physical inventory
UCS Python SDK
30
Hardware Independent Integration
Cisco UCS Python SDK is a python module which helps automate all aspects of Cisco UCS management including server, network, storage and hypervisor management
Bulk of the Cisco UCS Python SDK work on the UCS Manager’s Management Information Tree (MIT), performing create, modify or delete actions on the Managed Objects (MO) in the tree.
All the physical and logical components that comprise Cisco UCS are represented in a hierarchical Management Information Model (MIM), referred to as the Management Information Tree (MIT). Each node in the tree represents a Managed Object (MO), uniquely identified by its Distinguished Name. (DN)
Hadoop Cluster Deployment Automation
UCS Director Express for Big Data
32
Unified Management Platform for UCS Hadoop Cluster – Wire once, deploy Hadoop anytime
– Zero Touch Deployment of UCS and Hadoop Infrastructure
– Integrated Topology view of Hadoop Nodes and underlying Compute/Network/Storage
– Simplified and Integrated Management with reduced TCO
– Enables advanced diagnostics and monitoring
Hadoop Manager UCS Manager
UCSD Express for
Big Data
Unified Management
UCSD Express
UCS 6200 Series
Fabric Interconnect
UCS Manager
UCS C240 M4 Series
Rack Server
UCS C3160 Rack
Server
Unified Management with UCSD Express for Big Data Programmability, Scalability and Automation
OS Profile Cisco UCS Template
Hadoop
Big Data Performance Enhancement with Cisco ACI
Application centric infrastructure (ACI)
ACI - a Holistic Architecture Enabling Rapid Deployment of Applications onto Networks
with Scale, Security and Full Visibility
ACI
APPLICATION CENTRIC
POLICY CONTROLLER NEXUS 9000 FABRIC
100 150 200 250 300
ACI
Traditional Network
Time (s)
Case Study – Big Data Analytics
Based on common network load and link failure scenarios
ACI Innovation Driving Application Performance
Congestion Management
60% 60%
90%
Network Innovations
Dynamic Load Balancing
Dynamic Packet Prioritization
30% reduction
in application
completion time
Network Utilization
Summary
Summary
• New Big Data Challenges require re-thinking of new enterprise scale infrastructure
• Leverage UCS and Nexus9k/ACI to integrate big data into your data center operations
UCS ACI
Thank you