Analytics and Big Data Analytics

Preview:

DESCRIPTION

View Dr. Robin Bloor's presentation from the Dec. 2013 Big Data Conference in Rome.

Citation preview

Analytics and Big

Data AnalyticsRobin Bloor Ph D

The Sequence of Topics….

1 Data Science?2 The Nature of

Analytics3 Machine Learning

Et Al4 The Business

Perspective5 The Future

1

What Is Data Science?

There is no “data science.” It’s a misnomer

All science is empirical and involves data analysis.

Science implements a method.

So do statisticians

What Is A Data Scientist?

Project managerQualified

statisticianDomain Business

expertExperienced data

architectSoftware engineer

(It’s a team)

Data Scientist v Business Analysts

Claims that business analysts can be data scientists are dubious

Good practitioners of statistics understand data (from years of training)

Software understands nothing, it simply implements algorithms

Who Understands Data?

Nevertheless!

You can know more about a

business from its data than by any

other means

2

TheNatureOfAnalytics

The Field of Business Intelligence

The Driving Force is Insight

A Process Not An Activity

Data Analytics is a multi-disciplinary end-to-end process

Until recently it was a walled-garden. But recently the walls were torn down by… Data availability Scalable technology Open source tools

The Data Analytics Process - Detail

The CRITICAL Workload Issue

Previously, we viewed database workloads as an i/o optimization problem

With analytics the workload is a very variable mix of i/o and calculation

No databases were built for this – not even Big Data databases

3

MachineLearning

Et Al

Analytical Latencies

1 Data access

2 Data preparation

3 Model development

4 Execution

5 Implementation

6 Model Audit & Update

Speed = value (probably)

The Open Source Dynamic

The R Language Over 1 million

users Hadoop and its

Ecosystem Reduced latency

for analytics Machine Learning

Algorithms Raw power

None of these are engineered for performance

Machine Learning Algorithms - 1

There are many: Neural

network(s) Bayesian

networks Decisions

trees/random forests

Support vector machines

K-means Clustering Regression(s) Etc.

Machine Learning Algorithms - 2

They are not newly invented

We did not previously use them much because we never had the computer power

Now that we have the power (at a price) we can employ them

Machine Learning Algorithms - 3

Machine learning algorithms can check all possibilities

We never had the computer power

Now that we have the power (at a price) we can employ them

The Impact?

Machine learning and processing power (parallelism) will change the data analysis process

The analytics team needs to understand IT

4

TheBusinessPerspective

Business Metamorphosis

The role of data analysis has not changed

Only the speed has changed

The process will evolve

It will be disruptive for incumbent vendors

The Data Analysis Budget

Data Analysis is Business R&D

The focus is on business process

The outcome of successful R&D is a changed process

Think of manufacturing for a useful analogy

The Data Analysis Budget

Data Analysis is Business R&D

The focus is on business process

The outcome of successful R&D is a changed process

Think of manufacturing for a useful analogy

5

TheFuture

Non è finita fino a quando la signora grassa canta

Hardware disruption Software disruption Business process

disruption All we know is:

Analytical processing will get faster

Analytic latencies will reduce

Data will continue to grow

Analytics will be a differentiator

In Summary…

1 Data Science?2 The Nature of

Analytics3 Machine Learning

Et Al4 The Business

Perspective5 The Future

Grazie milleper la vostra attenzione

Recommended