Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 1
Big Data Multi-Platform Analytics
Mike Ferguson Managing Director Intelligent Business Strategies Big Data Meet-up London, September 2013
2
About Mike Ferguson
Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an analyst and consultant he specializes in business intelligence, big data, data management and enterprise business integration. With over 32 years of IT experience, Mike has consulted for dozens of companies, spoken at events all over the world and written numerous articles. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS and European Managing Director of DataBase Associates.
www.intelligentbusiness.biz [email protected]
Twitter: @mikeferguson1
Tel/Fax (+44)1625 520700
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 2
3
Topics
New business requirements driving Big Data multi-platform analytics
Deeper insight example – social media multi-platform analytics
Beyond the data warehouse – new Big Data analytical workloads
The danger of siloed analytics
The need for a multi-platform analytical ecosystem
Multi-platform analytical ecosystem components and integration
Platform integration between RDBMSs and Hadoop – SQL MapReduce
Multi-platform data access and global optimisation
Conclusions Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
4
The Traditional Data Warehouse Architecture Has Long Provided the Platform for Analysis and Reporting
Operational systems
web
P o r t a l
Employees Partners
Customers
BI Tools
Platform Dat
a In
tegr
atio
n / D
Q
Reports &
analytics Data warehouse
& data marts
DW
Customer intelligence Risk intelligence Financial intelligence Operations intelligence Supply chain intelligence
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 3
5
Business today has new requirements……
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
6
Customer Growth, Retention and Loyalty Are Top Of The Agenda – PwC 16th Annual CEO Survey 2013
Source: www.pwc.com
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 4
7
CEO’s Are Focused on Growth and Improving Operational Effectiveness Over The Next 12 Months
Source: www.pwc.com
PwC 16th Annual CEO Survey 2013
8
Why Is Customer Insight So Important? - New Competition On the Web Is Changing Customer Behaviour
Kayak.com (Travel)
Gocompare.com (Insurance)
9flats.com (Accommodation) Copyrig
ht © Intelligent Busin
ess Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 5
9
Understanding Customer Behaviour Especially on the Web is an Even Bigger Challenge for ALL Businesses
On the web the customer is king
New competitors are exploiting the web to give the customer more choice
Understanding on-line customer behaviour is now mission critical to customer retention and growth • Clickstream data is a new source of data for analysis
The web is also a place where the customer has a voice • Social media sites
– E.g. Facebook, Twitter (has 1 billion tweets every 72 hours) • Review web sites • Sentiment and influence is a new source of data for analysis
Social communication has spawned social networks that are subject to continuous change that need to be understood
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
10
How Will You Differentiate Your Company And Compete In a Continuously Changing On-Line World
“Take all the information that helps you make markets so you can go somewhere where there is no one else”
ThinkMarketing Conference, Paris, November 2012
We currently make 1m offers and about 10000 private offers…I see the number private offers climbing as we go forward and learn more
ThinkMarketing Conference, Paris, November 2012
Vittorio Colao – CEO Vodafone
Key Marketing Questions When is the trigger point? – when is the right moment to make the offer? What is the event that will maximize the probability?
It is at this point that you want to make the offer Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 6
11
Business Requirements Are Driving Technology Adoption Deeper insight for
• Customer intimacy and engagement • Risk reduction • Optimization of operations
Reduce time to value • Lower latency data • Streaming analytics and operational BI • Automated data discovery and rapid data integration • Exploratory analytics to discover value in new data sources • Agile BI and model development • Data virtualization to simplify data access • Self-service BI – Data discovery and visualisation
Proactive analytics and actionable insight • Role-based on any device to improve effectiveness and reach targets
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
Deeper Insight - Big Data Analytics
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 7
13
Popular Types of Data That Businesses Now Want to Analyse Web data
• Clickstream data, e-commerce logs
• Social networks data e.g., Twitter
Semi-structured data e.g., e-mail
Unstructured content • How much is TEXT worth to you
Sensor data • Temperature, light, vibration, location,
liquid flow, pressure, RFIDs
Vertical industries structured transaction data • E.g. Telecom call data records, retail Source: Analytics: The Real-World Use of Big Data
Said Business School Oxford and IBM, October 2012
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
14
Deepening Customer Insight Example – Social Media Big Data Analytics
Two examples of Social Media Analytics
Sentiment Analysis • Twitter • Facebook • Review web sites • Internal CRM systems • …..
Social Graph (network) Analytics • Relationships • Influencers • Network value analysis Copyrig
ht © Intelligent Busin
ess Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 8
15
Sentiment Analytics and Social Media
Opportunities
Listen to the “voice of the customer”
Better customer segmentation
Increase customer satisfaction
Increase employee satisfaction
Increase growth
Challenges
Emoticons ( :-) :-< :0) )
Twitter hashtags #bigdata
“Yoda” speak
Slang / vernacular / abbreviations
Sarcasm
Ambiguity
Spam
Multiple languages
Source: KPIApps
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
16
Sentiment Analytics – The Process
Social Data Platforms Text Analysis Engines Customer Engagement Management Social Media Aggregators
Collect / Clean / Clean /Integrate / Analyse / Index Analyse / Share
Twitter Firehose MySpace Klout Amazon Facebook reddit Flickr Youtube bit.ly CRM applications
Advanced, "beyond polarity" sentiment classification looks, at emotional states such as "angry," "sad," and "happy."
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 9
17
Search On Hadoop Allows Quick Indexing of Newly Loaded Multi-Structured Data e.g. Twitter Text
Social Data Platforms
MapReduce index building
application
HDFS files
BI Tools, Applications,
Mashups
Use massively parallel Map Reduce to build a partitioned search index
index index Index
partition
index partitions
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
18
Search On Hadoop - Data Scientists Can Quickly Explore Newly Loaded Multi-Structured Social Data
Social Data Platforms
HDFS files
BI Tools, Applications,
Mashups
index index Index
partition Product Examples • Attivio • Connexica • HP Autonomy IDOL • IBI webFOCUS Magnify • IBM InfoSphere Data Explorer • LucidWorks Big Data • Oracle Endeca • Quid • Splunk
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 10
19
Data Cleansing and Integration Tool
Scaling ETL Transformations By Generating Pig, Hive or 3GL MapReduce for In-Hadoop ELT Processing
Extract Parse Clean Transform Analyse Load Insights
Option 1 ETL tool generates HQL or convert generated SQL to HQL
Option 2 ETL tool generates Pig Latin (compiler converts every transform to a map reduce job)
Note - Generating native MapReduce code instead of HiveQL or Pig Latin would likely perform faster because there is no need to translate into MapReduce
Option 3 ETL tool generates 3GL MapReduce code
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
20
Big Data Quality - Talend StandardizeRow Function Can Be Used For Data Quality on Text
Uses the open source ANTLER language to parse text
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 11
21
Data Analysis - Processing Text Is A Key Part Of Sentiment Analytics Text Analytics is used to deriving data from unstructured content
Deriving sentiment and more from twitter data
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
22
Sentiment Analytics - Using Hadoop MapReduce To Analyse and Score Sentiment
Social Data Platforms Text Analysis Customer Engagement Management Social Media Aggregators
Query / Analyse / Share
MapReduce sentiment scoring
application
HDFS files Scored sentiment HDFS files
Hive tables
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 12
23
Can Also Use MapReduce Based BI Tools To Analyse Hadoop Data – E.g.Karmasphere
Karmasphere
Platfora
Platfora brings data into memory after mapreduce processing
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
24
Self-Service BI
Data Discovery & Visualisation, Dashboard or Analytical
workflow server
Business Analyst
personal & office data
Predictive models
community
Publish / Share Consume / Enhance / Re-publish
Transaction systems
DW
Analysing Results of MapReduce Analytics Using Self-Service BI Tool Hadoop Connectivity Via Hive
collaborate
HDFS / Hbase/ Hive
Hive interface SQL converts to HiveQL which is converted to MapReduce programs Copyrig
ht © Intelligent Busin
ess Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 13
25
Self-service BI tool Direct Access to Big Data via Hive Or Impala - E.g. Tableau 8 Connectivity to Hadoop
Impala connector
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
26
Self-Service BI
Data Discovery & Visualisation server
Business Analyst
personal & office data
Predictive models
community
Publish / Share Consume / Enhance / Re-publish
Transaction systems
DW
An Alternative Is Self-Service BI Tool Hadoop Connectivity Via An In-Hadoop Analytical Server
collaborate
HDFS / Hbase/ Hive
In-Hadoop In-Memory Analytical
Server
SQL converts to HiveQL which is converted to MapReduce programs Copyrig
ht © Intelligent Busin
ess Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 14
27
SAS Are Pushing Their In-Memory LASR Server Into Hadoop With ‘LASR-Ready’ Big Data in HDAT Files
Visual Analytics
EDW EDW
SAS Client
HDAT
HDAT
HDAT
HDAT
HDAT files are loaded directly into
LASR in parallel without any need for
transformation
The LASR Server can run in a Hadoop cluster
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
28
````````````
Text Analysis Can Lead To Further Analysis - Twitter Data Can Include Sentiment AND A Social Grap
Derive a social graph from analysing social network text
Social graph
Several steps to analyse the data are therefore necessary
```````````` Move the graph to a NoSQL Graph DBMS
Analyse the graph for influencers and relationships
Data Integration
New insights
Graph DBMS
Influencers? How
valuable ?
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 15
29
Improving Customer Insight May Require Different Platforms Optimised For Different Analytical Workloads
Data Warehouse RDBMS
NoSQL DBMS
EDW
DW & marts
Graph DB
mart
Big Data workloads result in multiple platforms now being needed for analytical processing
Advanced Analytics (multi-structured data)
DI DI
new insights
new insights
BI Tools
community Consume / Enhance / Re-publish / Act
Publish / Share
insights collaborate
new insights
Business analyst
Social graph
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
30
The Changing Landscape – We Now Have Different Platforms Optimised For Different Analytical Workloads Big Data workloads result in multiple platforms now being needed for analytical processing
Streaming data
Hadoop data store
Data Warehouse RDBMS
NoSQL DBMS
EDW
DW & marts
NoSQL DB e.g. graph DB
Advanced Analytic (multi-structured data)
mart DW
Appliance
Advanced Analytics (structured data)
Analytical RDBMS
Graph analysis
Investigative analysis,
Data refinery
Traditional query,
reporting & analysis
Real-time stream
processing & decision m’gmt
Data mining, model
development
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 16
31
Different Platforms Optimised For Different Analytical Workloads – E.g. Oil And Gas
Streaming data
Hadoop data store
Data Warehouse RDBMS
NoSQL DBMS
EDW
DW & marts
NoSQL DB e.g. graph DB
Advanced Analytic (multi-structured data)
mart DW
Appliance
Advanced Analytics (structured data)
Analytical RDBMS
Shipment route
optimization
Seismic analysis, well integrity,
pipeline analysis, Data refinery
Production forecasting
equipment failure prediction
development
Financial reporting, spend analysis,
production reporting, field service
maintenance…
Financial planning
Real-time sensor data analysis of drilling,
equipment monitoring, well integrity, pipeline flows, market trade
monitoring
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
32
The New Requirement Is Cross Platform Analytics
EDW
DW & marts
NoSQL DB e.g. graph DB
mart
DW Appliance
Advanced Analytics (structured data)
Analyse?
Advanced Analytics (multi-structured data)
Streaming data
RT Analytics SaaS BI
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 17
33
Cross-Platform Analytics - Using ETL Processing With Embedded Analytics Support parsing and extract of data from multi-structured data sources
Help automate analysis and consumption of data
Support analytical processing across multiple analytical platforms
Extract Parse Clean Transform Analyse Load Insights
Step 1
NoSQL DB e.g. graph DB EDW
Step 3 Step 2
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
34
Cross-Platform Analytics - SQL On Hadoop Initiatives
Hadoop Distribution Vendor SQL on Hadoop Initiative AMPlab (UC Berkeley) Shark on Spark Apache Hadoop Hive Cloudera Impala Hortonworks Stinger IBM BigSQL MapR Apache Drill Microsoft Polybase Pivotal HawQ Teradata SQL-H
Several initiatives are being implemented to bypass MapReduce
Vendor Product
CitusDB CitusDB
JethroData JethroData
Splice Machine Splice Machine SQL Engine
Attivio Active Intelligence Engine
Other NewSQL database vendors and search vendors also have products
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 18
35
Cross-Platform Analytics – Invoking Hadoop Analytics Via SQL From An Analytical RDBMS
RDBMS
Polymorphic table function(s)
HDFS / Hbase/ Hive
SQL, XQuery
RDBMS optimizer handles transparent access to multiple
analytical platforms on behalf of the user
Teradata Aster Big Data Appliance has Aster and Hortonworks HDP in the same box
Enables Aster-managed communication with Hadoop nodes, to read just the data needed from Hadoop for SQL
queries and SQL-MapReduce functions in Aster.SQL-H leverages the metadata library of H-Catalog
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
36
Teradata Aster Discovery Portfolio – 4 Types of Pre-Built SQL-MapReduce Can Be Nested In a SINGLE SQL Statement
data acquisition function
data preparation function
data analysis function
SELECT * FROM data visualization function (
The output of each function is fed into the next all in a single statement
Means the data scientist need only know what functions do and the parameters they take as opposed to how they work
Single SQL statement: SELECT * FROM nPathViz( on SELECT *
FROM nPath (
ON (SELECT * FROM SESSIONIZE ( ON SELECT
* FROM LOAD_FROM_TD_HADOOP) PARTITION BY sba_id SYMBOLS ( event LIKE '%REFERRAL%' AS START_EVENT, ) RESULT (…) ) n;
Includes a partnership with Attensity for Sentiment Analytics
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 19
37
Teradata Aster Discovery Portfolio – Teradata Aster Visualization Module Example
Source: Teradata Copyrig
ht © Intelligent Busin
ess Strategies
2013
All Rights R
eserved
38
Cross Platform Analytics: Data Virtualisation Simplifies Access & Optimises Queries Across The Entire Analytical Ecosystem
Analytical tools/apps
Analytical tools
Analytical tools/apps
Machine generated, markets data, sensors
CRM
Stream processing
(analytical models)
ERP SCM streaming data
Data management tools (including discovery, profiling, cleansing, integration)
social network
Data Virtualization and Optimisation
EDW
MDM System DW & marts
NoSQL DB e.g. graph DB
Advanced Analytic (multi-structured data)
mart
DW Appliance
Advanced Analytics (structured data)
C
R
U
D
Prod
Asset
Cust
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 20
39
Data Virtualization – Cisco Composite Integrates “Big Data” with Enterprise Data
Wider “Big Data” Access • Hadoop/Hive Connectivity • Netezza Enhancements • Sybase IQ and HP Vertica
Performance Plus™ Connectivity
• Additional “Big Data” Integration Support
Composite Data Virtualization Platform
Netezza SAP
Optimized SQL
Hadoop/Hive Example
MapReduce
Hive
Optimizer
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
40
Cross Platform Analytics – Product Sales Dashboard With In-Context Sentiment
Overall Sentiment Sentiment for Tennsco Lockers click
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 21
41
The Big Data Multi-Platform Enterprise Analytical Ecosystem – Integrated Technology Yields Real Value
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
42
Succeeding With Big Data - People In Different Roles Need to Work Together To Deliver Business Value
Exploratory analysis Model producer
Business Analyst / Business Manager Data Scientist
Model consumer Data visualisation Information Producer
• Build reports • Build and publish dashboards
Information consumer Decision maker Action taker
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
9/18/13
Copyright Intelligent Business Strategies 2013 - All Rights Reserved 22
43
Conclusions – Critical Success Factors
Identify candidate big data analytics use cases that yield real business value
Align use cases with business strategy objectives and priorities
Identify the types of big data analytical workloads you need
Assign workloads to the best platforms to run them on
Understand skills needed
Integrate Big Data platforms with your existing analytical environment
Extend the use of data management across entire ecosystem
Simplify access to multiple data stores to hide complexity from business users
Network with companies that have experience of using Big Data technologies
Copyright ©
Intelligent Business
Strategies
2013
All Rights R
eserved
44
www.intelligentbusiness.biz [email protected]
Twitter: @mikeferguson1
Tel/Fax (+44)1625 520700
Thank You!