33
AUGUST 2016 Big Data Analytics for BI/BA/QA Dmitry Tolpeko

Big Data Analytics for BI, BA and QA

Embed Size (px)

Citation preview

AUGUST 2016

Big Data Analytics for BI/BA/QA

Dmitry Tolpeko

2

BIG DATA

Why was it invented?

How is it used now?

How will it be used in the near future?

What do we need to do to stay competitive?

3

FIRST QUESTIONS

What size does it start?

Is it just another technology vendor?

4

IN REALITY

It is very easy to start using Hadoop

and Cloud now.

So it is true that now most people doing

traditional things with just larger data

sets.

And at much lower cost, of course.

So it looks like the size matters, and this

is just another technology

5

BUT IT IS …

Completely new mindset and

approach to analytics

Solution to satisfy new, “mass

market” analytics

And you cannot skip it

6

YOU CAN FEEL THIS AS …

Developers (Java, .NET etc.), non-

BI and even non-IT people talk and

work with analytics today.

That was not the case before.

So what happens?

7

TRADITIONAL ANALYTICS

Expensive

Separate and isolated BI world

Analyzing transactions (data you

cannot afford to lose or calculate

with errors)

Historical data and strategic decisions

8

AND TODAY THIS IS …

Very small % of analytics (1-5%?)

Analytics Boom

9

EVERTHING IS ABOUT DATA

Mindset: Data Analysis

not OLTP, DWH, ETL

Kimball/Inmon

Any application: UX+Analytics

(Machine Learning i.e.)

Competing on analytics, not just

product and service

Analytics become operational,

mass market

10

THE NEXT BIG SHIFT?

Digital Transformation of Economy

IoT, VR, AR, Machine Learning, AI

Personalized UX

Heavily relies on analytics

11

ANALYTICS TODAY

Fast, Advanced and Predictive

Analytics

o Personalization and customization: from

summary reports to a lot of tailored

data-driven actions (in near real time)

o Fast prototyping, implementation,

deployment and fast performance

o Data lakes

12

EXAMPLE - YESTERDAY

Company sends promo by email to

1M users paying $1 for each email,

50,000 users purchased goods at

$25

Profit: 50,000 * $25 - $1M =

$250,000

This is what traditional analytics

does.

13

EXAMPLE - TODAY

Today

Company identified to send promo

email just to 100,000 users, now

30,000 users purchased goods at $25

Profit: 30,000 * $25 - $100K =

$650,000

No new customers, no new

contracts – just algorithms and more

data

14

USE CASES

o Anomaly Detection

o Recommendation Systems

o Loyalty and Retention Programs

o Optimization

o A/B Testing

o Alarms, Scoring, Diagnosis

o Demand Forecasting and so on.

15

NEW CORE SKILLS

Distributed Data Processing and

Streaming Analytics

Programming (Python, R, Spark)

Math, Statistics

Machine Learning

Deep Learning

16

MACHINE LEARNING

Automation of discovery

Automatically adapt to new

circumstances

Detect patterns

In wide use now. “Self-testing”.

Few lines of code

17

BUILDING BLOCKS

Enriching analysis, development and

quality in software development

o Generic algorithms vs hardcoding

endless IF-ELSE

o Discovering hidden, not obvious

patterns

o Finding anomalies, outliers vs test

cases

18

BI TOOLS NOW

Self-service (less jobs?)

Advanced analytics (requires

understanding stats and machine

learning fundamentals)

19

SOURCE DATA

Non-transactional systems, weak or

no data model

Calculations with probability

Raw, unstructured data from

diverse data sources

Extracting small relevant pieces of

data from huge data sets

20

PEOPLE

Data engineers

Data scientists

Significant work force, not just 1-

5% as in BI

21

GOOD NEWS

BI people still good match as they

love crunching data

But significant shift in skills is

required

22

WHY TO BE INVOLVED

o Cutting edge

o Challenges

o Cool staff (predictions, AI

etc.)

o Growth, margin and revenue

23

HOW TO BE INVOLVED

o Mindset

o Skills

o Experience

o Solutions

24

PLATFORMS

25

TRADITIONAL EDW PLATFORMS

o Too expensive ($10,000 per TB and more)

o Large upfront cost

o Not easy procurement, setup and

maintenance

o Designed for relational data, SQL interface

only, limited schema flexibility

o Data must be loaded first (modeled,

prepared and moved)

o Marketing limitations for Appliances

26

TRADITIONAL OPEN SOURCE PLATFORMS

• Designed for relational data, SQL interface

only, limited schema flexibility

• Data must be loaded first (modeled,

prepared and moved)

• Not easily scalable (scale up and down)

27

TRADITIONAL DATA MINING TOOLS

• Expensive

• Smaller community (one more isolated

world)

• Targeted for enterprise users

• Longer release cycles, no way to mix tools

and try fresh new staff etc.

• Scalability and integration issues

28

WHY BIG DATA AND CLOUD

o Extremely economically attractive

o Scalable and elastic

o Self service

o Rich and diverse data tools

o Good enough quality (and

constantly improving)

29

BIG DATA AND CLOUD DESIGN PRINCIPLES

Decoupling Data Storage and Computing

o Database engine does not own data anymore

o Simplified load/extract

o Schema on read

o Not just SQL interface

o Any computing engines on top of data

Commodity Hardware

o Fault tolerant

Scale up and down

30

GROW PATH

From monolithic suites to diverse and rich tool set

SQL tools on Hadoop, Cloud

Advanced Data Analysis and Analytics

o Spark, MapReduce, NoSQL

o Python, R, Java, Scala

o Statistics

o Batch, Streaming, Real-time

Machine Learning and Deep Learning

o Understand use cases

o Understand specific algorithms and their

application

o Implementation

31

GAME (HOME WORK)

32

LET’S WIN THIS CAR

Suppose you're on a game show, and

you're given the choice of three

doors:

Behind one door is a car; behind the

others, goats.

You pick a door, say No. 3

33

SWITCH OR NOT?

Then the host, who knows what's

behind the doors, opens another

door, say No. 2, which has a goat.

He then says to you, "Do you want

to pick door No. 1?"

Is it to your advantage to switch

your choice?