32
Prescriptive Analytics Part I Nick Gonzalez, 2/10/14

Prescriptive Analytics Part I

Embed Size (px)

DESCRIPTION

Prescriptive Analytics Part I. Nick Gonzalez, 2/10/14. “It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any longer without taking into account not only the world as it is, but the world as it will be.”. - PowerPoint PPT Presentation

Citation preview

Prescriptive AnalyticsPart I

Nick Gonzalez, 2/10/14

-Isaac Asimov

“It is change, continuing change, inevitable change, that is the dominant factor in society today. No sensible decision can be made any

longer without taking into account not only the world as it is, but the world as it will be.”

Topics Covered

Reference automated prescriptive analytics system

Automated algorithm selection

Distributed algorithm development

Covered in future presentations

Ontology creation and extraction

Representing solutions using ontologies

Business optimization

everything else…

Today’s Data Landscape

Tomorrow’s Data Landscape

Data is outpacing us

Humans can not keep up

Computers can but…

Prescriptive Analytics

Scalable

Automated understanding

Automated predictive analytics

Actionable

Closed loop

Example. Video Games

gamegame

metricsmetrics

learning learning processprocess

predictive predictive modelsmodels

deploydeploy

gamegameserverserver

rulesrules

simulationssimulations

write

startunderstand

ingbuild / update

models

modify

copy to production

generate

user spaceanalytics space

Problems

Scale

Speed

Adaptability

Automated Learning

- Isaac Asimov

“I do not fear computers. I fear the lack of them.”

Goals

Remove the human element from analysis phases

Generate accurate, actionable, predictive models

Combine predictive models and simulation to solve problems

Guiding Principle

Big data with simple algorithms will out perform

sampled data with complex algorithms.

How is this possible?

Focus on a single problem.

Limit scope

Goal must be

Measurable

Actionable

Process

DataData

Data Data Engineering Engineering

& & UnderstandiUnderstandi

ngng

ModelingModelingPrepPrep SimulatioSimulationn Actionable Actionable

DeploymentDeployment

1. Automated Understanding

Find the data representation that is most ideal

for the problem you are trying to solve.

Automated Understanding

Raw DataRaw Data Clean Clean DataData

Initial Transform

StatsStats

metameta

Automated Understanding

Clean Clean DataData

StatsStats

metameta

RepresentatioRepresentation An A

RepresentatioRepresentation Bn B

RepresentatioRepresentation Cn C

A.1A.1

……

A.2A.2

……

2. Automated Algorithm Selection

Find the algorithm that performs best against the

problem you are trying to solve, while meeting all

criteria.

Automated Algorithm Selection

Choose algorithms best suited for this type of problem.

Consider the data, types, sparsity, size, and desired outcome

Try multiple algorithms

Calculate the Root Mean Squared Error or some other appropriate measure.

Consider problem domain.

Use cross validation.

Do not just compare the average RMSE

Choose the algorithm(s) that perform the best

Distributed Processing

Learning to Scale

Approaching the Problem

Two ways to approach a problem

Bottom up

Top down

Bottom Up Approach

HardwareHardware

Assembly LanguageAssembly Language

C, PascalC, Pascal

C++, JavaC++, Java

Design Patterns, AlgorithmsDesign Patterns, Algorithms

ProgrammerProgrammer

Top Down

Problem SolverProblem Solver

Problem RepresentationProblem Representation

Distributed System AbstractionsDistributed System Abstractions

Functional LanguagesFunctional Languages

HardwareHardware

Building Distributed Algorithms

Identify the simplest concepts that describe data processing

Collections

Collection processing

Problem SolverProblem Solver

Problem RepresentationProblem Representation

Distributed System Distributed System AbstractionsAbstractions

Functional LanguagesFunctional Languages

HardwareHardware

Single “Box”Single “Box”

Evolution of thought

DataDataDataData

AlgorithmAlgorithmAlgorithmAlgorithm

DataData DataData

CollectionCollection

Collection ProcessingCollection Processing

No “Box”

Coming together

mapmap mapcamapcatt

reducereduce

filterfilter sortsort groupgroup

HadooHadoopp

Single Single PCPC MPIMPI ……

k-meansk-means densitydensity randomrandomforestforest

gradientgradientboostboost

……..

Distributed Processing Interface

Simple concept

Focus on building algorithms

Many ways to implement this concept

Works with both shared memory systems and distributed memory systems

Implementation

Functional language - Clojure

Reusable functions as callbacks

Hadoop drivers written on top of Cascalog

Data location and type are abstracted as “collection”

- Isaac Asimov

“Part of the inhumanity of the computer is that once it is completely programmed and working smoothly, it

is completely honest.”