Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Utilizing Big Data Analytics
with Hadoop
Fern Halper @fhalper
TDWI Research Director for Advanced Analytics
April 17, 2014
Sponsor
3
Speakers
Fern Halper Research Director for
Advanced Analytics,
TDWI
Tapan Patel Product Marketing Manager,
SAS
Agenda
• The evolving big data ecosystem
• Status of big data, analytics,and hadoop
• Considerations for getting started
4
New TDWI Checklist
• Free to download
• http://tdwi.org/rese
arch/list/tdwi-
checklist-
reports.aspx
An evolving ecosystem
6
Hadoop
Big data
Advanced Analytics
in-memory
Examining the pieces: Big Data
7
Social
M2M/IoT
Text
Mobile/Location Volume
Formats
70% of those respondents
using or currently using predictive
analytics are utilizing big data
(source: TDWI Predictive Analytics Best Practices Report, 2014)
8
Examining the pieces: Analytics The Analytics Spectrum
Excel Dashboards and Reports
Other BI Visualization Advanced Analytics
9
Advanced Analytics
10
Advanced analytics provides algorithms for
complex analysis of either structured or unstructured
data. It includes sophisticated statistical models,
machine learning, text analytics, advanced
visualization, and other advanced
data mining techniques.
Examining the pieces: Hadoop
• HDFS/MapReduce
• Schema on read
• Ecosystem of tools
• Commercial distributions
11
In-memory analytics
• Performance
• Interactivity
12
Status: Evolving architectures
13
Source: (TDWI Evolving Data Warehouse Architectures In the Age of Big Data, 2014) n=1688 responses
What technical issues or practices are driving change in your DW architecture?
Select all that apply.
Status: Big data pieces
14
Status: Analytics pieces
15
Considerations
16
• Defining the problem
• Data preparation
• Analyzing the data
• Making it work (i.e., the team)
• Governance
Data preparation
• ETL vs. ELT
• Data quality
• Metadata
17
Data exploration
18
• Query
• Visualization
• Descriptive statistics
Analysis
19
• Data mining
– Supervised
– Unsupervised
• Other analytics
Operationalize
20
• Business process
• In-database scoring
Skills
21
• Computing
• Analytic modeling
• Creative thinker
• Communicator
Big Data:
The Big Data Maturity Model
22
Poll Question
Are you making use of Hadoop for advanced
analytics
• Yes
• No, but we’re thinking about it
• No, and no plans to do so
• Don’t know
23
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
UTILIZING BIG DATA ANALYTICS
WITH HADOOP
TAPAN PATEL, PRODUCT MARKETING MANAGER, SAS
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DATA TO DECISION LIFECYCLE
TEXT COMPETITIVE
ADVANTAGE
PREPARE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ACCESS TO HADOOP
HADOOP
Hive QL
SAS SERVER
Push some of SAS processing to Hadoop 1
Key Offerings: SAS/Access to Hadoop
SAS/Access to Cloudera Impala
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
EMBEDDED PROCESS FRAMEWORK
HADOOP
SAS Data Step & DS2
SAS SERVER
Push SAS processing to Hadoop with MapReduce 2
Key Offerings: SAS Scoring Accelerator for Hadoop
SAS Data Quality Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Data Management
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
IN-MEMORY ANALYTICS AND HADOOP
In-memory processing; use Hadoop for storage persistence and commodity computing 3
SAS® LASR ANALYTIC
SERVER
SAS® IN-MEMORY
SAS® IN-MEMORY
SAS® IN-MEMORY
SAS® IN-MEMORY
SAS® IN-MEMORY
HADOOP WEB CLIENTS APPLICATIONS ERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
f Web and
Social
Data Discovery and Visualization
Statistics and Predictive Analytics
Data Management
Text Analytics
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
VISUAL
STATISTICS INTERACTIVE PREDICTIVE ANALYTICS
EXPLORE AND
DISCOVER PREDICT AND
REFINE
DEPLOY AND
MONITOR
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
VISUAL
STATISTICS INTERACTIVE PREDICTIVE ANALYTICS
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
IN-MEMORY
STATISTICS FOR
HADOOP
WHAT IS IT
• Provides a single interactive programming environment
for Hadoop to perform:
• analytical data manipulation
• variable transformations
• exploratory analysis
• statistical modeling and machine learning
• integrated modeling comparison and scoring
• Takes advantage of distributed in-memory computing
optimized for analytical workloads
TEXT
MANIPULATE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
SC
OR
E
Copyr i g ht © 2014 , SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS®
IN-MEMORY STATISTICS FOR HADOOP
PRODUCT DEMONSTRATION
33
Questions?
34
Download a free
copy of the report
• Download the report as a PDF
file at:
http://tdwi.org/research/2014/03/
checklist-utilizing-big-data-
analytics-with-hadoop
Feel free to distribute the PDF file
of any TDWI Checklist Report
35
Contact Information
If you have further questions or comments:
Fern Halper, TDWI [email protected]
Tapan Patel, SAS [email protected]