Upload
emc
View
211
Download
1
Tags:
Embed Size (px)
Citation preview
1© Copyright 2015 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Modern Infrastructure for
Business Data Lake
2© Copyright 2015 EMC Corporation. All rights reserved.
Scale-out Converged Solutions for AnalyticsJulianna DeLua, VCE
Dan Beres, EMC Isilon
3© Copyright 2015 EMC Corporation. All rights reserved.
AGENDA
History of Analytic Infrastructure
Why Scale-Out, Converged Solutions
Analytic Workflow vHadoop Test Results
Customer Use Cases and Feedback
Conclusion / Next Steps
4© Copyright 2015 EMC Corporation. All rights reserved.
A Brief
History
of Analytic
Infrastructure
5© Copyright 2015 EMC Corporation. All rights reserved.
VCE Confidential© 2015 VCE Company, LLC. All rights reserved.
2013 – Shared infrastructure? Let me know when you know “for sure” it works. In the meanwhile, a few industry pioneers / early adopters start POC with EMC / VCE
2014 – Extend converged system benefits with Isilon scale out – augment enterprise app/data with Hadoop, Splunk, no-SQL. Great performance!
2015 –Internet of things initiatives accelerate. Rapid technological advancements with architectural flexibility - Vscale
6© Copyright 2015 EMC Corporation. All rights reserved.
The Private/Public Cloud“Infinite, inexpensive compute and storage”
ENABLED BY
Agile Product Development Culture
ANATOMY OF A MODERN DIGITAL BUSINESS
CAPABILITIES NEEDED
BUSINESS DRIVERS
• New systems of engagement• New business models• Internet of Things
Platform
Data Algorithms(Code)
“Catch people or things in the act and affect the outcome”
= $$$
Compelling, Unique User Experience/Model
ExistingSystemsA MAJOR PRESSING CHALLENGE
Analytics/BI
• How do we architect for agile data-driven business?
• Can we manage big, fast data?
• Value driven
CIO
• Meet future business needs while simplifying and taking cost out of legacy?
• Avoid lock-in again?
• People and organization
CEO/CMO
• How do we become an agile, digital business?
• Anticipate and delight customers?
• Partner collaboration
• Where/how do we start?
A MAJOR PRESSING CHALLENGE
7© Copyright 2015 EMC Corporation. All rights reserved.
Sub-optimal environment—data locked in high volume, variety, or velocity.
Lack of service-enablement—difficulties in optimizing virtualized, multi-tenant service approach.
Compliance/security exposure—lack of encryption, exposure, and data loss.
Limited standardization—not using data center standards.
Downtime/SLA issues—not readily configurable to handle mixed workloads.
System utilization—inefficient islands of storage and systems, inability to reuse data for multiple solutions.
Long cycles for accessing and sharing information locked in unstructured data.
Cannot rapidly create value via technology-enabled XaaS.
Explicitly demonstrate security, compliance, and governance.
Inability to plan system progression that combine structured/unstructured – exacerbating silos of appliances and hardwares
Insufficient posture against outages and peak period of IT use.
Escalating deployment management and maintenance costs for growing data.
CUSTOMER PAINS TECHNICAL PROBLEMS
Typical Customer Pains and Technical problems
8© Copyright 2015 EMC Corporation. All rights reserved.
CONVERSATIONS LEAD TO PLATFORM EVOLUTIONConversations
Downtime and response time issues missing
business SLA
• Increased flash use
• Continuous need for migration
• Network scale points
• Data mobility
• Hadoop, Splunk, PaaS, Cassandra, MongoDB, Legacy DB
• Aggregate/disaggregate pool of resources
• Control required for application proliferation
Faster time to drive value from innovation
multitude of applications
• Mobile and social offers
• Turn 360 degree insight to customer acquisitions
• Fulfillment, inventory and customer management
AWS is costing too much but business wants faster
go live and flexibility
9© Copyright 2015 EMC Corporation. All rights reserved.
VCE VSCALE™ ARCHITECTUREFLEXIBLE SCALE-OUT THROUGH EXPANDEDMULTI-SYSTEM ARCHITECTURE
VCE VSCALETM FABRIC VCE VSCALETM FABRIC
9
MPP DB
Hadoop PROD & DR
In memory DBBI / DW
Enterprise App - SAP
Microsoft Email,
collaboration
Hadoop POC
Pivotal Cloud
Foundry
Video Surveillance
10© Copyright 2015 EMC Corporation. All rights reserved.
Edge & Central Analytics Workflow
SwiftHTTPRAN | DAV
Isilon OneFSEasy to Grow Manage & AdministerAdditional Clients to More ContentMultiprotocol Access to Same Data
Log
OneFS
……..
FTP SyncIQ SyncIQ
HDFS
NFS SMB
HDFS
Glance
ExternalWAN
InternalWAN
Oracle
NFS
Mediation
AppServer
11© Copyright 2015 EMC Corporation. All rights reserved.
vHadoop+Isilon Install & Deployment Guide
12© Copyright 2015 EMC Corporation. All rights reserved.
“Fix These Problems….Prove it Out!”
Expensive and Won’t Scale– Hundreds of Servers to support less than 2PB Usable Storage (1:7 ratio)– “We have a guy with shopping carts walking down the rows replacing parts”– Additional Staging Area for Data before Ingesting into Hadoop– Can’t Scale Storage without Compute – Locked & Not Elastic
Lacks Enterprise Features– No Cost Effective Data Redundancy– Limited File-system Security, only Simple Authentication– Multiple Points of Failure– Maintaining Hadoop “PODs” involves significant downtime
Time To Results– Requires Significant time to ingest and copy Data– Building Production Hadoop “PODs” can take months – Network Infrastructure Saturation & Expense
13© Copyright 2015 EMC Corporation. All rights reserved.
NFS
NFS
SMB
SMB
SWIFT
HDFS
SWIFT
RAN
RAN
FTP
EMC Isilon Enabled Workflows
14© Copyright 2015 EMC Corporation. All rights reserved.
HDFSSMB, NFS, HTTP, FTP,
HDFS
nodeinfo
nodeinfo
nodeinfo
nodeinfo
nodeinfo
nodeinfo
nodeinfonodeinfo
nodeinfo
NodereplyNodereplyNodereplyNodereplyNodereplyNodereplyNodereplyNodereplyNodereply
file
file
file
file
file
file
file
file
NodereplyNodereplyNodereplyNodereplyNFS
NFS
SMB
SMB
name node
name node
name node
name node
name node
name node
name node
MAPReduce
MAPReduce
MAPReduce
MAPReduce
MAPReduce
MAPReduce
MAPReduce
MAPReduce
MAPReduce
data
node
data
node
Isilon
OriginalData
OriginalData
OneFS ComputeData
1X
EMC Isilon Enabled HadoopName node
Data
Compute
15© Copyright 2015 EMC Corporation. All rights reserved.
Created and tuned Hadoop VMs to maximize Throughput– >90% Utilization of CPUs for Compute– Memory footprint reduced (MEM Page sharing across VMs)– Hadoop 2.0 with YARN does not need FLASH for HDFS
Incremental testing to validate Scalability – Validated 2:1 ratio Compute Node to Isilon Node (can also support 3:1) – 2 VMs per Compute Node for Optimal Performance on Dual Socket– Linear Scalability in performance by incrementally adding more compute
Validated Enterprise/Production Ready- Security Greater with AD Authorization and Access
No need to anonymize dataWhitepaper Created
- Deployment & Upgrade Of Hardware and Software in hours not days/weeks- Validated reduced data-center footprint & environmentals with UCS Blade Servers,
vHadoop & Isilon
Hadoop Test Findings
16© Copyright 2015 EMC Corporation. All rights reserved.
1TB Hadoop Job Cycle ComparisonIsilon Significantly Reduces Time To Results
Traditional Hadoop+DAS
17:32 30:18 20:5020:50
Isilon Enabled vHadoop
18:51
Terasort Test on 1TB DAS Isilon BenefitMB/s Per Node 55.00 85.00 55%Compute Min 30.18 18.51 -39%TTR Min 89.30 18.51 -79%
Isilon Advantages• Eliminates All Data Movement• Allows for Virtualized Compute• Significantly Less Cost• 79% Faster TTR!
TTR- 89.3 Minutes!
17© Copyright 2015 EMC Corporation. All rights reserved.
EMC Isilon – Only Security Compliant Datastore for Hadoop Highly resilient architecture
– Robust data protection options (DR, Snapshots, SyncIQ)– Clustered Multi-Point Name Node with Kerberos – SEC 17a-4 compliant WORM– Hadoop multi-tenancy with dedicated network and access zones
Hadoop on Isilon provides full ACLs for NFS, SMB, and HDFS– Each file/ directory has an Access Control List (ACL) consisting of one or more Access Control Entries
(ACE).– Each ACE assigns a set of permissions (read, write, delete) to a specific security identifier (user or
group).– Deny ACEs which remove permissions and override any “Allow ACEs”
Standard Hadoop only provides basic Unix-type “Simple” permissions– Effective permissions are determined based on the file owner (single user, single group, other/world)– Read and/or write permissions can be assigned to the owner, the group, and “everyone else”– What do you do when you need to assign read access to multiple groups (A, B & C)?– What do you do when you need to assign read access to the group A and read+write access to group
B?– How do you maintain permissions when files are copied from Windows NTFS shares?
18© Copyright 2015 EMC Corporation. All rights reserved.
Supporting Documentation
19© Copyright 2015 EMC Corporation. All rights reserved.
HCFS Certification: Process DetailCertification Step Duration
Partner Prep
Partner defines HDP test matrix (platforms, HDP components, HDFS APIs, HDP version and partner product version)
Partner provides sample product to Hortonworks so Engineering and Field teams are familiar with partner technology
Testing
HDFS Test Suite training - at Hortonworks HQ and online
Partner deploys, runs, analyzes, and reports HDFS Test Suite with technical support from Hortonworks
HDP Core Test Suite (Map/Reduce, YARN, Tez and Hbase, Hive and Pig) training – at Hortonworks HQ and online
Partner deploys, runs, analyzes, and reports HDP Core Test Suite with technical support from Hortonworks
Partner deploys, runs, analyzes, and reports on remaining HDP Component Test Suites with technical support from Hortonworks
Testing time allocation
Documentation
Joint review of test suite execution results
Hortonworks creates functional gap analysis document, need partner sign off
Documentation time allocation
Validation
Hortonworks validates test suite execution results and certifies HCFS for specified HDP version and partner product version
Total certification time allocation 90-180 days
20© Copyright 2015 EMC Corporation. All rights reserved.
Scale-out Isilon for Scale-out Hadoop
ComputeNodes
Isilon is a scale-out system; Hadoop HDFS is partially similar
HDFS on Isilon functions as a Parallel file system
Each compute node performs I/O on every Isilon node in the Rack
I/O bandwidth and storage capacity can be increased linearly simply by adding Isilon nodes
Compute can be increased or decreased on the fly and can easily be virtualized
With a mesh network that is faster than the disks, data locality is irrelevant
IsilonNodes
21© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop Architecture – Traditional DAS Dozens of Hadoop Racks Requires Significant Investment Network Infrastructure
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
10+ Gbps
Core Ethernet Switch
Compute
10 Gbps
…
Shuffle+HDFS
Compute…
Shuffle+HDFS
Rack Ethernet Switch
Compute
Shuffle+HDFS
SATA
10+ Gbps
Compute
10 Gbps
Shuffle+HDFS
Compute…
Shuffle+HDFS
The ratio of compute and disk space/performance is
fixed.
Non-local HDFS I/O (30-90% of HDFS I/O) will go through
Ethernet.
Local disk usage is shared between shuffle I/O (60% of all I/O during terasort) and
HDFS I/O.
Core Network Switches Are Additional Cost for
Hadoop+DAS(more Network traffic required)
22© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop Architecture – Isilon for HDFS Reduced traffic across the Core Ethernet switch--HDFS
traffic will only travel within a rack and across IB.
Isilon InfiniBand Switch
Rack Ethernet Switch
Compute
Shuffle
SATA
10+ Gbps
10 Gbps
Core Ethernet Switch
Compute
Shuffle
10 Gbps
… …
IB
Rack Ethernet Switch
Compute
Shuffle
SATA
10 Gbps
Compute
Shuffle
10 Gbps
… …
IB
…
The number of compute and Isilon nodes can be adjusted independently to achieve the optimal ratio of compute and I/O bandwidth
HDFS I/O ALWAYS comes through a rack-local Isilon node which collects data blocks from all other Isilon nodes across the InfiniBand fabric
(used only for MR copy phase) 10+ Gbps (used only for MR copy phase)
Shuffle I/O (65% of all I/O during terasort) remains on local storage.
Isilon HDFS
Isilon HDFS
Isilon HDFS
Isilon HDFS
23© Copyright 2015 EMC Corporation. All rights reserved.
Traditional Hadoop - Layers
24© Copyright 2015 EMC Corporation. All rights reserved.
Isilon+Hadoop – NO Layers
25© Copyright 2015 EMC Corporation. All rights reserved.
ESG LAB REVIEW – VBLOCK SYSTEMS WITH VCE TECHNOLOGY EXTENSIONF FOR EMC ISILON
• Objectives
• Underscore business challenges and opportunities for progressing to enterprise Hadoop
• Establish requirements to be ready for production – Extensibility, Governance, Security, Availability, Performance and Multi-Use
• Perform benchmarks Vblock System 340 with EMC Isilon with Teragen suite
25
“By leveraging an industry-proven Integrated computing platform ( ICP) in VCE Vblock Systems and combining it with EMC Isilon and VMware vSphere Big Data Extensions, organizations get a fully integrated platform that meets and grows with their big data and analytics requirements.— Tony Palmer, Senior Lab Analyst, ESG
26© Copyright 2015 EMC Corporation. All rights reserved.
TeraGen TeraSort TeraValidate0
200
400
600
800
1,000
1,200
1,400
Comparing Performance of Traditional Hadoop to VCE Vblock System with EMC Isilon (TeraSort Suite)
16 Traditional Hadoop Nodes (combined Compute and DAS)16 VCE Compute Nodes and EMC Isilon Storage
Job
Du
rati
on
(se
co
nd
s)
ESG LAB OBSERVATION ON TERAGEN BENCHMARKS
26
27© Copyright 2015 EMC Corporation. All rights reserved.
VCE CUSTOMER BENEFITS
28© Copyright 2015 EMC Corporation. All rights reserved.
VCE LOWERS OPERATIONAL COSTS
00 IT Staff
Cost
Facilities Infrastructure
After Vblock System Deployment
Before Vblock System Deployment
41%
13%38%
IDC Research Study OF VCE CUSTOMERS, SEPTEMBER 2013
29© Copyright 2015 EMC Corporation. All rights reserved.
GAS AND UTILITY LEADER
• Situation• Largest provider of gas and electric energy in the US. Innovate to drive clean,
sustainable future. Better management of costs and risks using predictive models. Operational improvement and compliance management. Expected data growth and application complexity with smart meter data management.
• Solution• Vblock System 340 to be used for private and public cloud in the hybrid cloud
model to keep custom applications and sensitive data in-house while pushing others to public. Initiated with Pivotal to become software led company with Pivotal CF.
• Anticipated Business Benefits– Increase agility for applications deployment using Platform as a Service
(PaaS) and big data solution– Support 600+ new applications planned annually faster at lower cost– Improve disaster recovery readiness and data protection– Lower costs and detect issues by enabling field personnel – Increased customer satisfaction including cost savings via meter data
Drive to Clean energy transformation while managing cost and risk
29
Differentiators: Suited to Hybrid Cloud Model and
future expansion – upgrades and scaling.
Extending VCE-Pivotal-EMC relationship while being open to tap eco-system
30© Copyright 2015 EMC Corporation. All rights reserved.
FOOD AND BEVERAGE GIANT
• Situation• Global food and beverage conglomerate to accelerate financial reporting and
reflect customer behaviors. Seeking a better alternative to third party cloud base model. Operational improvement and customer intimacy with leading brand recognition throughout the world. Data loading, processing and end-user impact crucial
• Solution• Use Vblock System for a shuffle and extend with VCE technology extension for
EMC Isilon to run Pivotal Hadoop and HAWQ. For Pivotal Greenplum, use VCE technology extension for compute (Cisco C240). Bring some of the core applications to the corporate IT.
• Anticipated Business Benefits– Streamline financial reporting process for goods coming from multiple
geographies while keeping up to data and support broadening user query– Exploit mobile applications for customer preferences and inventory management– Support product launches and marketing campaigns based on consumption logs,
brand preferences and social media– Improve disaster recovery readiness and data protection– Start with one project, gain momentum while ensuring readiness for the future
Financial reporting and marketing analysis Back to Private Cloud
Differentiators: Ability to match architecture to workloads. Reuse existing environment. Extensible for future
growth
31© Copyright 2015 EMC Corporation. All rights reserved.
WHY VCE AND EMC FOR SCALE-OUT CONVERGED ANALYTIC SOLUTION?
• Adaptable, modular, and mission critical• Incremental scaling with your demand from
the broad VCE and EMC portfolio• Pre-tested, validated and certified by EMC and
VCE• Exploit end-to-end analytics on the SAME VCE
and EMC platform• Take advantage of broadening EMC partner
eco-system• Contact your EMC or VCE representatives • Contact : EMC – [email protected]
VCE - [email protected]