Upload
actian-corporation
View
112
Download
1
Embed Size (px)
DESCRIPTION
Turn Hadoop into a High Performance Analytics Platform
Citation preview
Confidential © 2014 Actian Corporation1
SQL + Hadoop: The High Performance AdvantageTurn Hadoop into a High Performance Analytics Platform
Emma McGrattan, ActianJim Hare, Actian
8 July 2014
Confidential © 2014 Actian Corporation2
1. Introduction
2. Hadoop Challenges
3. Actian Analytics Platform – Hadoop SQL Edition
4. Industrialized, High Performance SQL in Hadoop
5. Questions
Agenda
All lines are muted
To ask a question, use Chat or Q&A panel
Recording will be made available
We‘ll be running a few polling questions
Confidential © 2014 Actian Corporation3
$140M Revenues + Profitable
10,000+ Customers
Global Presence: 8 world-wide offices, 7x 24 multinational support model
3 “Actian is now very powerfully positioned in the big data and analytics markets.” Robin Bloor
Actian is Delivering Transformational Value
“Actian has assembled all of the next generation IPs into a single analytics platform, allowing users a level of flexibility in data interaction that competitors have not been able to match.” siliconANGLE
Confidential © 2014 Actian Corporation4
Big Data Offers Significant Opportunities
Personalized Experience
New Products/Services
Reduce RiskPredictive Analytics
Many Data Sources
Low Cost Storage
…But only for those who embrace it
Improve Decision-Making
Confidential © 2014 Actian Corporation5
Enter Hadoop as the Big Data Enabler for Low Cost Storage
DW Offload
Landing Zone
Data Reservoir
?
Confidential © 2014 Actian Corporation6
But It isn’t Easy with Hadoop
Batch performanceTime to Value
Expensive Skills
Silo’d Data Access
Data preparation
Confidential © 2014 Actian Corporation7
Hadoop Complexity Forcing Organizations to Move Data in order to Analyze it
DW Offload
Landing Zone
Hadoop Data Reservoir
Data Management
Analytics Processing
Visualization & Data Science
Workbench
Result: duplicate storage & infrastructure costs, more IT resources, network bandwidth usage, and complexity
Data Transfer
Confidential © 2014 Actian Corporation8
CIOs Challenged by Big Data Costs
One in three CIOs pay between 21 cents to 30 cents per
gigabyte a month. Translation: it costs a company $3.12
million per year to store 500,000 gigabytes at an average cost of 26
cents per gigabyte per month.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation9
CIOs Challenged by Types of Big Data
73% of CIOs day up to 50% of their data will be unstructured
within two years.
Source: http://www.cioinsight.com/it-strategy/storage/slideshows/cios-challenged-by-big-data-costs.html
-- CIO Insight
Confidential © 2014 Actian Corporation10
Instead, what if you could move the analytic processing to the Hadoop data?
Data Science Workbench
Analytic Processing
Data Management
… And transform Hadoop from a data lake into a high performance, fully functional analytics platform
SQL User Access
Confidential © 2014 Actian Corporation11
What is it?
Introducing the Actian Analytics Platform – Hadoop SQL Edition
Patented X100 vector processing engine plus visual data and analytics work flow, all running natively in Hadoop via YARNTurns Hadoop into a High-Performance, Fully-Functional Analytics Database
How is this unique?Highest performing, most industrialized SQL access to Hadoop data
Only end-to-end analytic processing natively in Hadoop
Most consumable, accessible, manageable Hadoop analytics
What does this mean to you? Removes all barriers for business access to big data analytics
Enables SQL users with no constraints on Hadoop data
Accelerates time to value
Confidential © 2014 Actian Corporation12
The Industry’s Abuzz – about Actian!“Deploying on Hadoop enables the Actian Analytics Platform to scale to massively parallel scale without having to modify the underlying engine. For Actian, Hadoop is a means to an end; it provides an opening for Actian to introduce a fast SQL engine that operates at scale.”
Tony Baer, Principal Analyst, Software, Ovum
“Actian’s platform now makes Hadoop data repositories accessible to the entire enterprise by empowering millions of business-savvy SQL users and business analysts to conduct advanced analytics directly on data in the Hadoop Distributed File System (HDFS). Companies investing in Hadoop now can broaden the scope of data discovery, increase the accuracy of decisions, and speed time to value.”
Daniel Gutierrez, Inside Big Data
“The latest version of the Actian Analytics Platform provides end-to-end analytic processing natively in Hadoop. This will make the Hadoop Big Data framework more accessible by offering high-performance ELT (extract, load and transform) and SQL analytics on Hadoop with no need for MapReduce skills. This is a big deal because data scientists with Hadoop skills are in short supply, while SQL skills are relatively abundant.”
Confidential © 2014 Actian Corporation13
Libraries of Analytics
Hadoop
Connections to Access Any Data
Actian Analytics Platform – Hadoop SQL Edition
Visual Data and Analytic Workbench
High Performance Data Flow Engine
Industrialized SQL Analytics Database Natively in Hadoop
Removes all barriers for business access to big data analytics
Business Processes
Users
Machines
Applications
Expansive Connectivity Data Blending & Enrichment Discovery Data Science Analytics Operational BI
Enterprise Data
Machine Data
Social Data
Data Warehouse
SaaS Data
Amazon Redshift
Confidential © 2014 Actian Corporation14
Actian Analytics Platform – Hadoop SQL Edition
Lightning fast and industrial strength SQL in Hadoop – Up to 30X faster than Impala
Full end-to-end analytic processing platform - all native in Hadoop
Packaged with “real world” solution blueprints
Confidential © 2014 Actian Corporation15
Visual Data Science & Analytics Workbench• Drag/drop interface with 100’s of data prep and analytic functions• Connect, blend, & enrich data and perform discovery & data science• Build and test predictive models• Running on top of a high performance data flow engine• All natively within Hadoop via YARN
MapReduce
Coding
Confidential © 2014 Actian Corporation16
Ubiquitous Skills■1 Million+ SQL Users
■$ Lower cost
■Easy to find, in most companies
■Embedded in the business
Specialty Skills■150K MapReduce
Programmers
■$$$ Expensive
■170K Shortage, hard to find
■Separate from the business
Unleash millions of business-savvy, SQL users with no constraints on Hadoop data
Actian Analytics PlatformTM
Analyze ActConnect+
Confidential © 2014 Actian Corporation17
Actian Analytics Platform = 25 Minutes
Log Reader Filter Rows Group Load Vectork-Means
Coding MapReduce = 4 Weeks
Avro WriterMapReduce Code
k-MeansMapReduce Code
Log Reader Filter Rows Group Load VectorMapReduce Code MapReduce Code MapReduce Code MapReduce Code
Accelerate time to value and turn Hadoop data into transformational value
Confidential © 2014 Actian Corporation18
Vendor Approaches to “SQL on Hadoop”
“marketing jobs”
“wrapped legacy”
“from scratch”
SQL Outside Hadoop• Connector approach• MPP DB need 2 clusters• Expensive, hard to manage
Mature but non-Integrated• Legacy engine (e.g. Postgres) + top layer• Store data outside HDFS (local files)• Separate Failover Management (tools)
Integrated but Immature• No trickle updates• Immature/poor optimizers+engines• I18N, security, workload mgmt,
access control?
Confidential © 2014 Actian Corporation19
“wrapped legacy”
“from scratch”
Maturity(SQL support,
ACID, reliability,security, connectivity,
performance)
Hadoop IntegrationLow Native
High
“marketing jobs” Mature & Integrated
++
“SQL on Hadoop” Vendor Landscape
Confidential © 2014 Actian Corporation20 Confidential © 2014 Actian Corporation 20
Actian Vector Hadoop Edition
Actian Analytics Platform Hadoop SQL Edition
Actian Analytics Platform
NameNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
DataNode DataNode
Prepare
Standard SQL Interfaces
Orchestrate
ConnectConnect to any data
via Actian DataConnect
Manage dataflow across the entire analytic process
6 POINTS OF INNOVATION:
Vector Processing
On Chip Cache
Fast Real-time Updates
Smart Compression
Storage Indexes
Multi-Core Parallelism
Running natively in Hadoop via YARN
Prepare, enrich, and analyze any data with
Actian DataFlow
NEXT GENERATION DATABASE
TECHNOLOGY::
Columnar
Compressed
Storage Indexes
Confidential © 2014 Actian Corporation21
Actian Vector – Unmatched InnovationT
ime
/ C
yc
les
to
Pro
ce
ss
Data Processed
DISK
RAM
CHIP
10GB2-3GB40-400MB
2-20
150-
250
Mill
ions
Vector Processing
Single Instruction Multiple Data
2nd Gen Column StoreLimit I/OEfficient real time updates
Smarter Compression
Maximize throughputVectorized decompression
Exploiting Chip Cache
Process data on chip – not in RAM
1
2
3
4
Multi-core ParallelismMaximize system resource utilization…
Storage Indexes
Quickly identify candidate data blocksMinimize IO
5
6
Confidential © 2014 Actian Corporation22
TPC-H 1TB – Faster, Less Hardware
0 100,000 200,000 300,000 400,000
Actian Vector 445,529
Actian Vector 436,788
SQL Server 219,888
Oracle 209,534
Oracle 201,487
SQL Server 173,962
Sybase IQ 164,747
Oracle 140,181
SQL Server 134,117
June ‘12
May ‘11
Aug ‘11
June ‘11
Sept ‘11
Apr ‘11
Dec ‘10
Apr ‘10
Dec ‘11
$57,146
$1,229,968
$460,869
$2,402,706
$753,392
$278,527
$85,621
$1,249,967
$258,880
Hardware Cost(excluding discounts)QphH
Fastest TPC-H QphH@1TB Benchmark (non-clustered)Source: www.tpc.org /
Confidential © 2014 Actian Corporation23
HADOOP
YARN
HDFS
Standard SQLInterfaces
DataNode
HDFS
Visual Data & Analytics Workflow
Actian Analytics Platform – Hadoop SQL EditionTransform Hadoop into a High Performance Analytics Platform
DataNode
HDFS
DataNode
HDFS
DataNode
HDFS
X100X100X100
ReadLoad
Actian VectorBlend &Enrich
Data Science & Analytics
DataNode
HDFS
X100
HDFS
Vector
• Original file format• Standard block
replicationNameNode
High Performance, Industrialized SQL
Database
High Performance, Parallelized Data Flow
Engine
• Column-based blocks
• Compressed• Partitioned
Replicated Vector
• >=3 ReplicatedCopies of VectorBlocks
• Leveraged to co-locate data with various join keys
Confidential © 2014 Actian Corporation24
History of the TPC-DS Comparison
Confidential © 2014 Actian Corporation25 Confidential © 2014 Actian Corporation 25
TPC-DS Benchmark Components
OperationalSystems
Refresh Process Ad-hoc Reporting Queries
User Queries
DSS DatabaseTPC-DS
Reports
Store
Web
Catalog
Inventory
Promotions
Set of Files
ETL
Confidential © 2014 Actian Corporation26
Actian Hadoop SQL Performance
Q3 Q7 Q19 Q27 Q34 Q42 Q43 Q46 Q52 Q53 Q55 Q59 Q63 Q65 Q68 Q73 Q79 Q89 Q980
2
4
6
8
10
12
14
16
18
“Impala Subset” of TPC-DS Queries at Scale Factor 3000 (3TB) Speedup vs Impala
Impala Actian
16x avg. speedup
Background to “Impala Subset “of TPC-DS benchmark can be found here:http://blog.cloudera.com/blog/2014/01/impala-performance-dbms-class-speed/
Both Executed on the Same Hardware and Software Environment:5 Node Cluster with 64GB of RAM per node and 12x2TB Hard Disks.
Spe
ed
up
Fa
ctor
Confidential © 2014 Actian Corporation27
Comprehensive – covers full analytic process: data blending & enrichment, discovery & data science, analytics & operational BI
Accessible – standard ANSI SQL to support standard BI tools; plus key advanced analytics including cube, grouping sets and windowing functions
Optimized – mature, proven planner and optimizer; optimal use of every node, CPU, memory, and cache
Secure – native DBMS security including authentication, user and role-based security, data protection, and encryption
Reliable - fully ACID-compliant with multi-version read consistency, plus system-wide failover protection
Manageable – resources managed automatically in Hadoop via YARN
Consumable – now usable by millions of users with every SQL tool and application on the planet
Scalable – unlimited expansion to handle extreme #s of users, nodes, data
Most Industrialized SQL in Hadoop
Confidential © 2014 Actian Corporation28
Actian Director for Management
Confidential © 2014 Actian Corporation29
Actian Analytics Platform – Hadoop SQL EditionIndustrialized, High-Performance SQL in Hadoop
Only end-to-end analytic processing natively in Hadoop
Highest performing, most industrialized SQL in Hadoop
Removes all barriers for business access to big data analytics
Unleashes millions of business-savvy SQL users on Hadoop data
Outperforms Cloudera’s Impala by up to 30x
Actian transforms Hadoop from a data lake into a high-performance analytics platform.
Confidential © 2014 Actian Corporation30
Transform Hadoop – Transform your Business
Confidential © 2014 Actian Corporation31
3
Get started today! www.actian.com/hadoop
Pre-register for an evaluation copy of Actian’s SQL in Hadoopbigdata.actian.com/sql-in-hadoop
Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014bigdata.actian.com/SandHill- Hadoop-Results
2
1
Confidential © 2014 Actian Corporation32
3
Get started today! www.actian.com/hadoop
Pre-register for an evaluation copy of Actian’s SQL in Hadoopbigdata.actian.com/sql-in-hadoop
Register for a Sand Hill Hadoop Survey Results webinar on July 24, 2014bigdata.actian.com/SandHill- Hadoop-Results
2
1