Upload
guy-harrison
View
355
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Presentation on Big Data given at Collaborate 2014 #c14lv
Citation preview
1 Global MarketingConfidential
REMINDER
Check in on the COLLABORATE mobile app
C
14
LV
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampD
Information Management Group
Dell Software
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampDInformation management group
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
207Surviving and thriving in the big data revolution
Guy Harrison
Executive Director RampDInformation management group
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
3 Software Group
Introductions
Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
4 Software Group
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
5 Software Group
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
6 Software Group
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
7 Software Group
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
8 Software Group
Dell and Quest ndash a brief history
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
9 Software Group
But Seriously
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
10 Software Group
What is Big Data
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
11 Software Group
Three or Four ldquoVrdquos
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
12 Software Group
Instead - the industrial Revolution of data
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
13 Software Group
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
15 Software Group
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
16 Software Group
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
17 Software Group
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
18 Software Group
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
19 Software Group
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
20 Software Group
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
21 Software Group
Generated internally
Key to operational efficiency
1993
Generated externally
Key to competitive advantage
Source of product innovation
Changing our lives
2013
Data means more
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
22 Software Group
Big Data is the culmination of cloud social and mobile
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
23 Software Group
Not all upside
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
24 Software Group
Will Big Data kill retail
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
25 Software Group
Prevalence of Showrooming
Consumer Electronics
Home Improvement
0 10 20 30 40 50 60 70
Pct
Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
26 Software Group
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
27 Software Group
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
28 Software Group
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
29 Software Group
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
30 Software Group
Some novel defences
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
31 Software Group
Web analytics for retail
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
32 Software Group
Connected Store
bull Shelf assortment optimization
bull In store offers
bull Customer entertainment
bull Checkout anywhere
bull Relationship management
bull Customer analytics
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
33 Software Group
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
34 Software Group
Why showrooming
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
35 Software Group
Itrsquos not enough to lay out products on tables
bull Online has significant advantages
bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection
bull Only big data analytics can provide these advantages
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
36 Software Group
Therersquos a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
SecurityFinance
Government
Science
Healthcare
Insurance
Telecom
Advertising
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
37 Software Group
The Revolution is not over yet
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
38 Software Group
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
39 Software Group
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
40 Software Group
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
41 Software Group
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
42 Software Group
Willy Bowman
Nationality German
Donrsquot Mention the WAR
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
43 Software Group
Buying choices
Amazon softcover $4599
Oracle Performance Survival Guide
Amazon Kindle $3999
Say ldquoscrew you booksellerrdquo to buy kindle version
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
44 Software Group
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
45 Software Group
Data Input
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
46 Software Group
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
Siri
From now on Irsquoll call you lsquoAn Ambulancersquo OK
ldquoSiri call me an ambulancerdquo
I found 14 bridges nearby
ldquoI want to jump off a bridgerdquo
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
48 Software Group
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
49 Software Group
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
50 Software Group
Brain Control
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
51 Software Group
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
52 Software Group
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
53 Software Group
Muze
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
54 Software Group
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
55 Software Group
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
56 Software Group
The instrumented human
bull Bluetooth Personal Area Network
bull 3GWiFi Wide Area Network
bull GPSbull Storage
bull Pulse temp monitor
bull Silent alarmsbull Pedometer sleep
monitoring
bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention
monitor
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
57 Software Group
The instrumented world
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
58 Software Group
All of which accelerates what we call Big Data
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
59 Software Group
Big Database technologies
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
60 Software Group
Pioneers of Big Data
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
61 Software Group
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
62 Software Group
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
63 Software Group
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
64 Software Group
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
65 Software Group
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
66 Software Group
Google File System (GFS)
Map Reduce BigTable
Google Applications
Google Software Architecture
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
67 Software Group
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
68 Software Group
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
69 Software Group
Schema on Read vs Schema on Write
Data
Analyse
Aggregate
Normalize
Cleanse
CodeExtract
Load Transform Data Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
70 Software Group
Hadoop Open Source Map-Reduce Stack
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
71 Software Group
Hadoop at Yahoo
Yahoo Hadoop cluster
bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
72 Software Group
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
73 Software Group
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
74 Software Group
Hadoop File System (HDFS)
Map Reduce YARNHbase
(Database)ZooKeeper(Locking)
SQOOP(RDBMS loader)
Hive(Query)
Pig(Scripting)
Flume(Log Loader)
Oozie (Workflow manager)
Hadoop ecosystem
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
75 Software Group
Hadoop 10 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA PIG HIVE)
HDFS (DISTRIBUTED
STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
76 Software Group
Hadoop 20 YARN
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA PIG HIVE)
Yet Another Resource Negotiator
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
77 Software Group
Tez1
1Hindi for ldquofastrdquo
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
78 Software Group
HBase
A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffe
r
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
79 Software Group
Name Site Counter
Dick Ebay 507018
Dick Google 690414
Jane Google 716426
Dick Facebook 723649
Jane Facebook 643261
Jane ILoveLarrycom 856767
Dick MadBillFanscom 675230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarrycom
5 MadBillFanscom
NameId SiteId Counter
1 1 507018
1 3 690414
2 3 716426
1 3 723649
2 3 643261
2 4 856767
1 5 675230
Id Name Ebay Google Facebook (other columns) MadBillFanscom
1 Dick 507018 690414 723649 675230
Id Name Google Facebook (other columns) ILoveLarrycom
2 Jane 716426 643261 856767
Hbase Data Model
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
80 Software Group
Hive
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
81 Software Group
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
82 Software Group
SQL
JAV
A
RES
ULT
S
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
83 Software Group
Other SQL-like Hadoop Interfaces
Cloudera Impala
MapR Drill Aster
Greenplumb (Pivotal HD) Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
84 Software Group
Pig
Pig Latin
SQL or Hive QL
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
85 Software Group
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
86 Software Group
Berkeley Data Analytic Stack (BDAS)
Yarn Yarn EC2 Yarn
Mesos ndash heterogeneous cluster manager
Tachyon ndash in memory File system
Spark ndash memory optimized distributed execution
Spark Streaming
Mlbase Mlib ndash Machine Learning
Map Reduce
Shark (SQL) Hive (SQL)
BlinkDB
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
87 Software Group
Meanwhile back at the Death Star
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
88 Software Group
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
89 Software Group
Oracle Exadata (X-2)
Database servers
64 cores 576 GB RAM
Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
90 Software Group
Economies
Exadata
Hadoop
$0 $1000 $2000 $3000 $4000 $5000 $6000
$4911
$750
Exadata vs Hadoop $$TB (Hardware only)
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
93 Software Group
Oracle Big Data Appliance
bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre
bull Competitive Pricingwwworaclecomusbigdataindexhtml
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
94 Software Group
Big Data Appliance Software
bull Cloudera Enterprise
bull Oracle Enterprise R
bull Oracle NoSQL
bull Oracle Big Data Connectors
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
95 Software Group
Generating competitive advantage through ldquoBig Data analyticsrdquo Machine
LearningPrograms that evolve with ldquoexperiencerdquo
Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
96 Software Group
Collective Intelligence
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
97 Software Group
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
98 Software Group
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
99 Software Group
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
100 Software Group
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
101 Software Group
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
102 Software Group
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
103 Software Group
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
104 Software Group
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
105 Software Group
Google Flu Trends
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
106 Software Group
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
107 Software Group
Collective Intelligence outsmarts Artificial Intelligence
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
108 Software Group
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
109 Software Group
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
110 Software Group
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
111 Software Group
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
112 Software Group
Artificial Intelligence Strikes back
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
113 Software Group
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
114 Software Group
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
115 Software Group
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
116 Software Group
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
117 Software Group
Watson is big data AI
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
118 Software Group
Predictive Analytics
0 20 40 60 80 100 120
-20
0
20
40
60
80
100
120
f(x) = 0971521231456065 x + 071906459527154
bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
119 Software Group
Classificationbull Create a model that
identifiesclassifies new data
bull Spam detection churn risk customer value
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
120 Software Group
Clusteringbull Group data without a
pre-existing classification scheme
bull For instance basket analysis
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
121 Software Group
SupervisedMachine Learning
Raw Data Clean
Validate
Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
122 Software Group
Inmapslinkedincom
Unsupervised learning
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
123 Software Group
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
124 Software Group
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Securitybull Vulnerabili
tybull Penetratio
n Detection
Fraud Detection
CRMbull Churn bull Defaults
Medicalbull Risk
analysisbull Diagnosisbull Prognosis
Game optimization
Advertisingbull Targetingbull Tailoring
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
125 Software Group
Data Science is hard
bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD
bull Small-medium businesses need help to compete
bull Data scientists to the rescue
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
126 Software Group
Data Scientists to the rescue
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
127 Software Group
Kitenga Analytics Suite
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
128 Software Group
Toad for Hadoop
httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
129 Software Group
SharePlexreg for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit Change
Data
HBase RealTime replication
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
130 Software Group
Toad BI Suite
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
131 Software Group
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
132 Software GroupConfidential
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dellrsquos offering was not completehellip
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
133 Software GroupConfidential
Dell acquires Statsoft
Data Integration
Database Management
Advanced Analytics
Business Intelligence
Server and Storage
STATISTICA
Server and Storage
TOAD amp Shareplex
TOAD BI
Boomi
Kitenga
Key co
mponents
to b
uild
end-
to-e
nd B
IA
naly
tics
solu
tions
Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
134 Software GroupConfidentialConfidential13
4
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
135 Software GroupConfidentialConfidential
Data Visualization
135
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
136 Software GroupConfidentialConfidential
Live scoring ndash integration into operational systems
136
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
137 Software GroupConfidentialConfidential
Industry and cross-industry packaged solutions
137
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
138 Software Group
For your business
bull How could data and algorithms transform your business
bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics
bull Where is the datandash Start collecting now
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
139 Software Group
For your career bull Hadoop and NoSQL creates
strong career opportunities for DBAs and developersndash Demand will exceed supply for
the foreseeable future
bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that
statistics textbook and play with R (maybe Oracle Enterprise R)
bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online
C
14
LV
C1
4LV
Please complete the session evaluation on the mobile appWe appreciate your feedback and insight
This box will have simplified instructions about how to complete the session evaluation online