18
The Data Science Lab: Enabling Flexible, Complex Analytics on a Single Platform @Kognitio #DataSci Follow the conversation on Twitter:

Data science lab enabling flexibility

Embed Size (px)

DESCRIPTION

To empower Data Scientists, you need a Data Science Lab. Presented by Dr. Sharon Kirkham, director of the Kognitio Analytics Center of Excellence, a data scientist and expert on this topic. Join that session, as she explores the kind of environment required to imagine, create, experiment, develop and grow cutting-edge Big Data applications that add value

Citation preview

Page 1: Data science lab   enabling flexibility

The Data Science Lab: Enabling Flexible,

Complex Analytics on a Single Platform

@Kognitio#DataSci

Follow the conversation on Twitter:

Page 2: Data science lab   enabling flexibility

• Thank you for joining today’s session!• The web briefing will start momentarily.

Slides available NOW at www.slideshare.net/kognitio

Teleconference:Use your computer, or call:

US +1 631 267 4890Toll-Free 1-855-299-5224 Passcode: 841 203 797Other global Dial-in numbers available at: https://kognitio.webex.com/kognitio/globalcallin.php

- Web Briefing -The Data Science Lab:

Enabling Flexible, Complex Analytics

@Kognitio #DataSciFollow the conversation on Twitter:

Today’s call will use the WebEx Q & A feature

Page 3: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Enabling Flexible, Complex Analytics on a single platform

The Data Science Lab: Enabling FlexibilityDemonstrationsSummary, Question & Answer Session

Presenters: ‐ Dr. Sharon Kirkham, Data Scientist‐ Michael Hiskey, Product Evangelist

Web Briefing

The Data Science Lab

@Kognitio#DataSci

Follow the conversation on Twitter:

3

Page 4: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Enabling Flexible, Complex Analytics on a single platform

July 25, 2013

1. Data Accessibility• Hadoop• Data Mash‐Up

2. Analytical Productivity• MPP in‐memory code execution• R scripts with MPP

3. “Graduate” Projects to B.A.U.• Data Science and the Business

Use Case Scenarios:

The Data Science Lab

POLL

Page 5: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Flexible Platform for Big Data Analytics

Flexible data access

Flexibleprocessing

Flexibledeploymentoptions

Near-lineStorage

(optional)

All BI Tools All OLAP Clients Excel

HadoopClusters

Enterprise DataWarehouses

LegacySystems

KognitioStorage

Reporting

Cloud Storage

AnalyticalPlatform

Layer

5

Page 6: Data science lab   enabling flexibility

Mature Business Intelligence & ReportingNumbers, tables, charts, indicators

…accessed with ease and simplicity

Historical information, latency

BI tools have plateaued

Decision Support

Advanced analytics and data science

More math…a lot more math

6

Page 7: Data science lab   enabling flexibility

The Analytical Enterprise

Business Analyst

Systems Admin

Data Scientist

Sexiest job of the 21st Century?

Key: “Graduation”• Projects will need to easily Graduate

from the Data Science Lab and become part of Business as Usual

7

Page 8: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Telling a story with data

Build, tune and run complex data projects

Dealing with big data from multiple sources

Must overcome IT bottlenecks

Source: http://www.emc.com/microsites/bigdata/infographic.htm

Data scientists are in demand:

8

Page 9: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Scenario 1: Data Accessibility

”… this exercise is to identify if improvements in data preparation can make a significant difference to the productivity and earning capacity of our analytics team”

- Global Digital marketing analytics firm

source: http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf

POLL

Page 10: Data science lab   enabling flexibility

SQL querying onHadoop

Scenario 1: Data Accessibility

Page 11: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Summary: Data Accessibility

Kognitio Hadoop Integration • Map/Reduce agent dynamically executes on

all Hadoop nodes• Query passes selections, relevant predicates

to the agents• Data filtering & projection locally on each node

• Data filtered as it is read from file(s)• Only data of interest is transferred and loaded

into memory via parallel load streams

HadoopClusters

Enterprise DataWarehouses

LegacySystems

KognitioStorage

Reporting

Cloud Storage

11

Page 12: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

Scenario 2: Analytical Productivity “…want to see a significant improvement in the analytical throughput … from current time frame of 2 weeks … to no more than 1 day”- A marketing science analytics company

“…we run much of our analytics on a 5% sample of the data. We want to be able to run on 100% of the data in the same time as the 5% sample.”

- A leading Ad Agency

Source: http://www.wired.com/insights/2013/07/the-new-horizon-for-bi-and-analytics/

POLL

12

Page 13: Data science lab   enabling flexibility

Massively parallel in-memory code execution

Scenario 2: Analytical Productivity

Page 14: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

MPP in-memory code executionNoSQL external scripting function:• SQL provides standard data access framework

– Open, adaptable framework; pass data to/from any executable or interpreter

– Fully flexible MPP execution of R, Python, Java, text parsing libraries etc.

create interpreter perlinterpcommand '/usr/bin/perl' sends 'csv' receives 'csv' ;

select top 1000 words, count(*)from (external script using environment perlinterp

receives (txt varchar(32000))sends (words varchar(100))script S'endofperl(

while(<>){

chomp();s/[\,\.\!\_\\]//g;foreach $c (split(/ /)){ if($c =~ /^[a-zA-Z]+$/) { print "$c\n”} }

})endofperl'from (select comments from customer_enquiry))dt

group by 1 order by 2 desc;

From the Demo: This reads long comments text from customer enquiry table, in line Perl converts long text into output stream of words (one word per row), query selects top 1000 words by frequency using standard SQL aggregation

Page 15: Data science lab   enabling flexibility

Accessing Analytics across the business

Scenario #3: Barriers to Deployment

Page 16: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

An Ideal Deployment Scenario

Cloud model can provide a way to quickly model, experiment, develop and build

• Deploy to existing reporting tools• Pass ownership to IT• Cloud instances can be “temporary”• Repeatable framework

2011 2010 Sep.3 Aug. Jul. Sep. Aug.3,443,873 8.1 382,009 401,951 391,878 351,696 369,199617,194 10.4 67,055 71,725 69,801 61,676 66,08565,237 1.0 7,671 7,892 7,422 7,357 7,61170,324 0.0 7,737 8,240 7,888 7,685 8,082226,261 5.8 24,764 26,196 25,973 23,288 23,722455,276 5.6 50,418 52,164 53,062 47,710 48,597446,918 3.5 48,368 51,797 51,160 46,166 49,84888,590 8.7 10,510 10,681 10,258 9,591 9,514279,985 13.2 31,390 31,889 28,478 28,266 28,282368,372 5.5 41,188 42,244 43,097 37,992 40,228

Not Adjusted9 Month Total 2011 2010*

Business Analyst

Business User

IT Admin

Data Scientist

PRESS HERE

PRESS HERE

…and really cool Big Data stuff happens!

16

Page 17: Data science lab   enabling flexibility

@Kognitio #DataSci@Kognitio #DataSci

It’s all about flexibility

Flexible data access

Flexibleprocessing

Flexibledeployment

options

Near-lineStorage

(optional)

All BI Tools All OLAP Clients Excel

HadoopClusters

Enterprise DataWarehouses

LegacySystems

KognitioStorage

Reporting

Cloud Storage

17

Page 18: Data science lab   enabling flexibility

Question & Answer session will be conducted electronically, using the panel to the right of your screen

Learn more, Stay connected:

Free Downloadkognitio.com/GoTryIt

Request a Meetingkognitio.com/meeting

Take the Surveykognitio.com/DSL

The Data Science Lab: Enabling Flexible, Complex Analytics