Upload
kognitio
View
105
Download
0
Embed Size (px)
DESCRIPTION
To empower Data Scientists, you need a Data Science Lab. Presented by Dr. Sharon Kirkham, director of the Kognitio Analytics Center of Excellence, a data scientist and expert on this topic. Join that session, as she explores the kind of environment required to imagine, create, experiment, develop and grow cutting-edge Big Data applications that add value
Citation preview
The Data Science Lab: Enabling Flexible,
Complex Analytics on a Single Platform
@Kognitio#DataSci
Follow the conversation on Twitter:
• Thank you for joining today’s session!• The web briefing will start momentarily.
Slides available NOW at www.slideshare.net/kognitio
Teleconference:Use your computer, or call:
US +1 631 267 4890Toll-Free 1-855-299-5224 Passcode: 841 203 797Other global Dial-in numbers available at: https://kognitio.webex.com/kognitio/globalcallin.php
- Web Briefing -The Data Science Lab:
Enabling Flexible, Complex Analytics
@Kognitio #DataSciFollow the conversation on Twitter:
Today’s call will use the WebEx Q & A feature
@Kognitio #DataSci@Kognitio #DataSci
Enabling Flexible, Complex Analytics on a single platform
The Data Science Lab: Enabling FlexibilityDemonstrationsSummary, Question & Answer Session
Presenters: ‐ Dr. Sharon Kirkham, Data Scientist‐ Michael Hiskey, Product Evangelist
Web Briefing
The Data Science Lab
@Kognitio#DataSci
Follow the conversation on Twitter:
3
@Kognitio #DataSci@Kognitio #DataSci
Enabling Flexible, Complex Analytics on a single platform
July 25, 2013
1. Data Accessibility• Hadoop• Data Mash‐Up
2. Analytical Productivity• MPP in‐memory code execution• R scripts with MPP
3. “Graduate” Projects to B.A.U.• Data Science and the Business
Use Case Scenarios:
The Data Science Lab
POLL
@Kognitio #DataSci@Kognitio #DataSci
Flexible Platform for Big Data Analytics
Flexible data access
Flexibleprocessing
Flexibledeploymentoptions
Near-lineStorage
(optional)
All BI Tools All OLAP Clients Excel
HadoopClusters
Enterprise DataWarehouses
LegacySystems
KognitioStorage
Reporting
Cloud Storage
AnalyticalPlatform
Layer
5
Mature Business Intelligence & ReportingNumbers, tables, charts, indicators
…accessed with ease and simplicity
Historical information, latency
BI tools have plateaued
Decision Support
Advanced analytics and data science
More math…a lot more math
6
The Analytical Enterprise
Business Analyst
Systems Admin
Data Scientist
Sexiest job of the 21st Century?
Key: “Graduation”• Projects will need to easily Graduate
from the Data Science Lab and become part of Business as Usual
7
@Kognitio #DataSci@Kognitio #DataSci
Telling a story with data
Build, tune and run complex data projects
Dealing with big data from multiple sources
Must overcome IT bottlenecks
Source: http://www.emc.com/microsites/bigdata/infographic.htm
Data scientists are in demand:
8
@Kognitio #DataSci@Kognitio #DataSci
Scenario 1: Data Accessibility
”… this exercise is to identify if improvements in data preparation can make a significant difference to the productivity and earning capacity of our analytics team”
- Global Digital marketing analytics firm
source: http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf
POLL
SQL querying onHadoop
Scenario 1: Data Accessibility
@Kognitio #DataSci@Kognitio #DataSci
Summary: Data Accessibility
Kognitio Hadoop Integration • Map/Reduce agent dynamically executes on
all Hadoop nodes• Query passes selections, relevant predicates
to the agents• Data filtering & projection locally on each node
• Data filtered as it is read from file(s)• Only data of interest is transferred and loaded
into memory via parallel load streams
HadoopClusters
Enterprise DataWarehouses
LegacySystems
KognitioStorage
Reporting
Cloud Storage
11
@Kognitio #DataSci@Kognitio #DataSci
Scenario 2: Analytical Productivity “…want to see a significant improvement in the analytical throughput … from current time frame of 2 weeks … to no more than 1 day”- A marketing science analytics company
“…we run much of our analytics on a 5% sample of the data. We want to be able to run on 100% of the data in the same time as the 5% sample.”
- A leading Ad Agency
Source: http://www.wired.com/insights/2013/07/the-new-horizon-for-bi-and-analytics/
POLL
12
Massively parallel in-memory code execution
Scenario 2: Analytical Productivity
@Kognitio #DataSci@Kognitio #DataSci
MPP in-memory code executionNoSQL external scripting function:• SQL provides standard data access framework
– Open, adaptable framework; pass data to/from any executable or interpreter
– Fully flexible MPP execution of R, Python, Java, text parsing libraries etc.
create interpreter perlinterpcommand '/usr/bin/perl' sends 'csv' receives 'csv' ;
select top 1000 words, count(*)from (external script using environment perlinterp
receives (txt varchar(32000))sends (words varchar(100))script S'endofperl(
while(<>){
chomp();s/[\,\.\!\_\\]//g;foreach $c (split(/ /)){ if($c =~ /^[a-zA-Z]+$/) { print "$c\n”} }
})endofperl'from (select comments from customer_enquiry))dt
group by 1 order by 2 desc;
From the Demo: This reads long comments text from customer enquiry table, in line Perl converts long text into output stream of words (one word per row), query selects top 1000 words by frequency using standard SQL aggregation
Accessing Analytics across the business
Scenario #3: Barriers to Deployment
@Kognitio #DataSci@Kognitio #DataSci
An Ideal Deployment Scenario
Cloud model can provide a way to quickly model, experiment, develop and build
• Deploy to existing reporting tools• Pass ownership to IT• Cloud instances can be “temporary”• Repeatable framework
2011 2010 Sep.3 Aug. Jul. Sep. Aug.3,443,873 8.1 382,009 401,951 391,878 351,696 369,199617,194 10.4 67,055 71,725 69,801 61,676 66,08565,237 1.0 7,671 7,892 7,422 7,357 7,61170,324 0.0 7,737 8,240 7,888 7,685 8,082226,261 5.8 24,764 26,196 25,973 23,288 23,722455,276 5.6 50,418 52,164 53,062 47,710 48,597446,918 3.5 48,368 51,797 51,160 46,166 49,84888,590 8.7 10,510 10,681 10,258 9,591 9,514279,985 13.2 31,390 31,889 28,478 28,266 28,282368,372 5.5 41,188 42,244 43,097 37,992 40,228
Not Adjusted9 Month Total 2011 2010*
Business Analyst
Business User
IT Admin
Data Scientist
PRESS HERE
PRESS HERE
…and really cool Big Data stuff happens!
16
@Kognitio #DataSci@Kognitio #DataSci
It’s all about flexibility
Flexible data access
Flexibleprocessing
Flexibledeployment
options
Near-lineStorage
(optional)
All BI Tools All OLAP Clients Excel
HadoopClusters
Enterprise DataWarehouses
LegacySystems
KognitioStorage
Reporting
Cloud Storage
17
Question & Answer session will be conducted electronically, using the panel to the right of your screen
Learn more, Stay connected:
Free Downloadkognitio.com/GoTryIt
Request a Meetingkognitio.com/meeting
Take the Surveykognitio.com/DSL
The Data Science Lab: Enabling Flexible, Complex Analytics