137
Deriving Knowledge from Data at Scale

Barga Data Science lecture 2

Embed Size (px)

Citation preview

Page 1: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 2: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

An Introduction to Statistical Learning with Applications in R

The Elements of Statistical Learning

A Programmer’s Guide to Data Mining

Probabilistic Programming & Bayesian Methods for Hackers

Think Bayes, Bayesian Statistics Made Simple

Page 3: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Data Mining and Analysis, Fundamental Concepts and Algorithms

An Introduction to Data Science

Machine Learning

Machine Learning – The Complete Guide

Page 4: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Bayesian Reasoning and Machine Learning

A Course in Machine Learning

Information Theory, Inference and Learning Algorithms

Modeling with Data

Mining of Massive Datasets

Information Theory, Inference and Learning Algorithms

Page 5: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 6: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Introduction to machine learning

• Overview of the machine learning process

• Introduction to select algorithms

• Introduction to concepts we will examine later in course:

bias/variance, generalization, underfitting, overfitting, ensemble

methods, etc.

• Elements of a time series

• Functions to manipulate time series

• Lay a foundation for Prediction and Forecasting…

Page 7: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

http://www.cs.waikato.ac.nz/ml/weka/

install by next class, get the developer version…

Page 8: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Class Discussions 20 minutes

• Machine Learning Primer 60 minutes

• Break 15 minutes

• Time Series & Forecasting, 1/2 60 minutes

• Closing discussion 15 minutes

Page 9: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

so we need teams

Page 10: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Discussion on Assigned Reading…

Week One

Page 11: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowLecture 1 Review

Stay in the immediate zone during exploratory modeling, to extent possible:

• 5 to 10 minutes per experiment, results in 100’s per day;

• Small, statistically sound/relevant samples;

• Linear modelling during feature exploration;

• Don’t write ML algorithms, use packages, do learn to write data manipulation code;

Start with a sample of data, as soon as you can get it…

• You will learn a considerable amount from that first data sample

• Quality of the data, what fields are being collected, possible missing values and/or

missing fields, and the questions will begin to flow (dialogue with customer will be at a

much more meaningful level).

If your customer can’t quantify it, you can’t change/improve it…

Page 12: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowLoops within loops…

Page 13: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowDefine the Goal

The first task in a data science project is to define a measurable & quantifiable

goal. At this stage, learn all that you can about the context of your project.

• Why do the project sponsors want the project in the first place? What do they lack, and

what do they need?

• What are they doing to solve the problem now, and why isn't that good enough?

• What resources will you need: what kind of data, how much staff, will you have domain

experts to collaborate with, what are the computational resources?

• How do the project sponsors plan to deploy your results? What are the constraints that

have to be met for successful deployment?

Not We want to get better at finding bad loans.

But We want to reduce our rate of loan charge-offs by at least 10%, using a model

that predicts which loan applicants are likely to default.

Page 14: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

A concrete goal begets concrete stopping conditions and concrete

acceptance criteria. The less specific the goal, the likelier that the

project will go unbounded, because no result will be "good

enough." If you don't know what you want to achieve, you don't

know when to stop trying – or even what to try. When the project

eventually terminates – because either time or resources run out –

no one will be happy with the outcome…

Page 15: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowData Collection and Management

This step encompasses identifying the data you need, exploring it, and

conditioning it to be suitable for analysis. This stage is often the most time-

consuming step in the process. It's also one of the most important.

• What data is available to me?

• Will it help me solve the problem?

• Is it enough?

• Is it of good enough quality?

Rule First piece of data is very informative, data set utility is roughly logarithmic in size.

Prefer Direct measurements, but if they are not available identify proxy variables.

Page 16: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 17: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowModeling

We get to statistics and machine learning during the modeling stage. Here is where

you try to extract useful insights from the data. Many modeling procedures make

specific assumptions about data distribution and relationships, there will be back-and-

forth between the modeling and data cleaning stages as you try to find the best way to

represent and model the data. The most common data science modeling tasks are:

• Classification: deciding if something belongs to one category or another.

• Scoring: predicting or estimating a numeric value such as a price or probability.

• Ranking: learning to order items by preferences.

• Clustering: grouping items into most similar groups.

• Finding Relations: finding correlations or potential causes of effects seen in the data.

• Characterization: very general plotting and report generation from data.

Page 18: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowModel Evaluation and Critique

Once you have a model, you need to determine if it meets your goals.

• Is it accurate enough for your needs?

• Does it generalize well?

• Does it perform better than "the obvious guess"? Better than whatever

estimate you currently use?

• Do the results of the model (coefficients, clusters, rules) make sense in the

context of the problem domain?

If you've answered "no" to any of the above questions, it's time to loop back to

the modeling step — or decide that the data doesn't support the goal you are

trying to achieve.

Page 19: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowModel Deployment and Maintenance

Finally, the model is put into operation.

In many organizations this means the data scientist no longer has primary

responsibility for the day-to-day operation of the model. However, you still

should ensure that the model will run smoothly and will not make disastrous

unsupervised decisions. You also want to make sure that the model can be

updated as its environment changes. And in many situations, the model will

initially be deployed in a small pilot program.

The test might bring out issues that you didn't anticipate, and you will have to

adjust the model appropriately.

Page 20: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowPresentation and Documentation

Once you have a model that meets your success criteria, you will present your results

to your project sponsor and other stakeholders. You must also document the model

for those in the organization who are responsible for using, running, and maintaining

the model once it has been deployed. Model interpretability may be an issue…

Page 21: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The Data Science WorkflowLoops within loops…

Page 22: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Class Discussions 30 minutes

• Machine Learning Primer 60 minutes

• Break 15 minutes

• Time Series & Forecasting 60 minutes

• Closing discussion 15 minutes

Page 23: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 24: Barga Data Science lecture 2
Page 25: Barga Data Science lecture 2

1 1 5 4 3

7 5 3 5 3

5 5 9 0 6

3 5 2 0 0

Page 26: Barga Data Science lecture 2
Page 27: Barga Data Science lecture 2
Page 28: Barga Data Science lecture 2
Page 29: Barga Data Science lecture 2

training data (expensive) synthetic training data (cheaper)

Page 30: Barga Data Science lecture 2

solve hard problems

value from Big Data

human intelligence

Page 31: Barga Data Science lecture 2

Machine learning enables nearly every

value proposition of web search.

Page 32: Barga Data Science lecture 2

Hundreds of thousands of machines…

Hundreds of metrics and signals per machine…

Which signals correlate with the real cause of a problem?

How can we extract effective repair actions?

Page 33: Barga Data Science lecture 2

solve hard problems

value from Big Data

human intelligence

business analytics

Page 34: Barga Data Science lecture 2

Predicting future performance from historical data

Recommenda-

tion engines

Advertising

analysis

Weather

forecasting for

business

planning

Social network

analysis

IT infrastructure

and web app

optimization

Legal

discovery and

document

archiving

Pricing analysis

Fraud

detection

Churn

analysis

Equipment

monitoring

Location-based

tracking and

services

Personalized

Insurance

Predictive analytics address the likelihood of something happening in the future, even if it is just an instant later…

Page 35: Barga Data Science lecture 2

object encoded with features

(think DB attributes/ OO member fields of primitive types)

𝑑 is the feature dimensionality.

classifier

prediction

(response/dependent variable).

Can be qualitative/quantitative

(classification/regression).

𝑶𝒃𝒋𝒆𝒄𝒕 → 𝑶𝒖𝒕𝒄𝒐𝒎𝒆Entity → Category

Entity → PopularityComplex decision making:

𝑿 → 𝒀

𝑿 = (𝒙𝟎, … , 𝒙𝒅) → 𝒀

input/independent variable

We may know the relation for certain values of 𝑋 and 𝑌:

In fact, we may know the relation for many 𝒙s and 𝑦s:

𝒙, 𝑦

𝒙 𝟏 , 𝑦 𝟏 , … , 𝒙 𝑵 , 𝑦 𝑵 The 𝑖-th 𝑥 is: 𝒙(𝑖) = 𝑥0(𝑖), … , 𝑥𝑑

(𝑖)

Page 36: Barga Data Science lecture 2

TRAINING

Input

𝒙 = (𝑥𝟎, … , 𝑥𝒅)

Online

System

object encoded with features

classifier

prediction

(response/dependent

variable)

Final

Output

𝑦

ModelOffline

Training

Sub-system

Training Data

𝒙 𝟏 , 𝑦 𝟏 , … , 𝒙 𝑵 , 𝑦 𝑵

where 𝒙(𝒊) = 𝑥𝟎(𝒊), … , 𝑥𝒅

(𝒊)

𝑓 𝑋 = 𝑌𝑋 → 𝑌 Task is very complex . Hard to construct good 𝑓.

We construct an approximation to 𝑓: 𝑔(𝑋) ≈ 𝑌Hypothesis space: 𝑔(𝑋) ∈ 𝐻.

Page 37: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

The decision is driven by both the nature of your

data and the question you are trying to answer

Page 38: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Out of Class Reading

Week Two

Page 39: Barga Data Science lecture 2

gender age smoker eye color

male 19 yes green

female 44 yes gray

male 49 yes blue

male 12 no brown

female 37 no brown

female 60 no brown

male 44 no blue

female 27 yes brown

female 51 yes green

female 81 yes gray

male 22 yes brown

male 29 no blue

lung cancer

no

yes

yes

no

no

yes

no

no

yes

no

no

no

male 77 yes gray

male 19 yes green

female 44 no gray

?

?

?

Page 40: Barga Data Science lecture 2

gender age smoker eye color

male 19 yes green

female 44 yes gray

male 49 yes blue

male 12 no brown

female 37 no brown

female 60 no brown

male 44 no blue

female 27 yes brown

female 51 yes green

female 81 yes gray

male 22 yes brown

male 29 no blue

lung cancer

no

yes

yes

no

no

yes

no

no

yes

no

no

no

male 77 yes gray

male 19 yes green

female 44 no gray

?

?

?

Train ML Model

Page 41: Barga Data Science lecture 2

gender age smoker eye color

male 19 yes green

female 44 yes gray

male 49 yes blue

male 12 no brown

female 37 no brown

female 60 no brown

male 44 no blue

female 27 yes brown

female 51 yes green

female 81 yes gray

male 22 yes brown

male 29 no blue

lung cancer

no

yes

yes

no

no

yes

no

no

yes

no

no

no

male 77 yes gray

male 19 yes green

female 44 no gray

yes

no

no

Train ML Model

Page 42: Barga Data Science lecture 2

Always

focuses on abias-variance decomposition We have a lecture dedicated to evaluationmetrics for models.

Page 43: Barga Data Science lecture 2
Page 44: Barga Data Science lecture 2
Page 45: Barga Data Science lecture 2

key

easily the most important factor

Page 46: Barga Data Science lecture 2

key

CITY 1 LAT. CITY 1 LNG. CITY 2 LAT. CITY 2 LNG. DRIVABLE?

123.24 46.71 121.33 47.34 Yes

123.24 56.91 121.33 55.23 Yes

123.24 46.71 121.33 55.34 No

123.24 46.71 130.99 47.34 No

Page 47: Barga Data Science lecture 2

key

Feature engineering

Page 48: Barga Data Science lecture 2

More data wins

Once you’ve defined your input fields, there’s only so much analyticgymnastics you can do. Computer algorithms trying to learn models haveonly a relatively few tricks they can do efficiently, and many of them arenot so very different. Performance differences between algorithms aretypically not large. Thus, if you want better classifiers:

1. Engineer better features

2. Get your hands on more high-quality data

Page 49: Barga Data Science lecture 2

The power of ensembles

models vote for the final prediction

Page 50: Barga Data Science lecture 2

Occam’s Razor

smaller faster to fitmore interpretable

Page 51: Barga Data Science lecture 2

representable

could will

Page 52: Barga Data Science lecture 2

observational data can only show us that twovariables are related, but it cannot tell us the “why”

Freakonomics

tended to have higher standardized test scores

Page 53: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Break, 5 minutes…

Page 54: Barga Data Science lecture 2
Page 55: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

boosting)

Page 56: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 57: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Ensemble Classification

Page 58: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Let’s look at the Netflix Prize Competition…

Page 59: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Began October 2006

Page 60: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 61: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

from http://www.research.att.com/~volinsky/netflix/

However, improvement slowed…

Page 62: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Today, the top team has posteda 8.5% improvement.

Ensemble methods are the best performers…

Page 63: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”

Rookies

Page 64: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

“My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”

Arek Paterek

http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

Page 65: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

“When the predictions of multipleRBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.”

U of Toronto

http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

Page 66: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Gravity

home.mit.bme.hu/~gtakacs/download/gravity.pdf

Page 67: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

“Our common team blends the result of team Gravity and team Dinosaur Planet.”

Might have guessed from the name…

When Gravity and Dinosaurs Unite

Page 68: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

And, yes, the top team which is from AT&T…

“Our final solution (RMSE=0.8712) consists of blending 107 individual results. “

BellKor / KorBell

Page 69: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 70: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• 83.7% majority vote accuracy

• 99.9% majority vote accuracy

Intuitions

Page 71: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Boosting

Bagging

Page 72: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Leo Breiman

Page 73: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Resampl few data

• Resample the minority data skew

Page 74: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

N=10

Page 75: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 76: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 77: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 78: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Break, 5 minutes…

Page 79: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Time Series & Forecasting, 1/2

Page 80: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

median filter, a windowing technique that moves through the data point-by-point, and replaces it with the median value calculated for the current window…

Page 81: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Visual inspection reveals smoothing of the outliers without dampening the naturally occurring peaks and troughs (no signal loss). Prior to smoothing, we could see no correlation in our data, but afterwards, Spearman’s Rho was ~0.5 for almost all parameters.

Page 82: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

?

?

?

Page 83: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

3.7

12.5

9.0

Page 84: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Forecasting

Quantitative

Causal Model

Trend

Time series

Stationary

Trend

Trend + Seasonality

Qualitative

Expert Judgment

Delphi Method

Grassroots

Page 85: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Year1

Year2

Year3

Year4

Dem

and

fo

r p

rod

uct

or

serv

ice

Page 86: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Year1

Year2

Year3

Year4

Dem

and

fo

r p

rod

uct

or

serv

ice

Trend component

Actual

demand line

Seasonal peaks

Random

variation

Page 87: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale87

Page 88: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale88

Page 89: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Naïve

• Moving Average

• Exponential Smoothing

• Regression

Page 90: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 91: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

91

Page 92: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale92

smoothing time

n

A+...+A +A +A =F 1n-t2-t1-tt

1t

Ft+1 = Forecast for the upcoming period, t+1

n = Number of periods to be averaged

A t = Actual occurrence in period t

Page 93: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

3

n

A+...+A +A +A =F 1n-t2-t1-tt

1t

MonthSales

(000)

1 4

2 6

3 54 ?5 ?6 ?

Page 94: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

MonthSales

(000)Moving Average

(n=3)1 4 NA

2 6 NA

3 5 NA4 ?5 ?

(4+6+5)/3=5

6 ?

n

A+...+A +A +A =F 1n-t2-t1-tt

1t

You’re manager in Amazon’s electronics department. You want to forecast ipod

sales for months 4-6 using a 3-period moving average.

Page 95: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

MonthSales

(000)Moving Average

(n=3)1 4 NA

2 6 NA

3 5 NA4 35 ?

5

6 ?

?

Page 96: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

MonthSales

(000)Moving Average

(n=3)1 4 NA

2 6 NA

3 5 NA4 35 ?

5

6 ?

(6+5+3)/3=4.667

Page 97: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

MonthSales

(000)Moving Average

(n=3)1 4 NA

2 6 NA

3 5 NA4 35 7

5

6 ?

4.667?

Page 98: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

MonthSales

(000)Moving Average

(n=3)1 4 NA

2 6 NA

3 5 NA4 35 7

5

6 ?

4.667(5+3+7)/3=5

Page 99: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale99

Page 100: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

100

Page 101: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale10

1

Page 102: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Month Weighted

Moving

Average1 4 NA

2 6 NA

3 5 NA4 31/6 = 5.167

56 ?

??

1n-tn2-t31-t2t11t Aw+...+Aw+A w+A w=F

Sales(000)

Page 103: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Month Sales(000)

Weighted

Moving

Average1 4 NA

2 6 NA

3 5 NA4 3 31/6 = 5.167

5 76

25/6 = 4.16732/6 = 5.333

1n-tn2-t31-t2t11t Aw+...+Aw+A w+A w=F

Page 104: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale10

4

Page 105: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale10

5

Page 106: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

106

simple

any

• Previous

time

previous actual and forecasted value

Page 107: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• gives more weight to recent time periods

Ft+1 = Ft + a(At - Ft)

et

Ft+1 = Forecast value for time t+1

At = Actual value at time t

a = Smoothing constant

Need initial

forecast Ft

to start.

Page 108: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Week Demand

1 820

2 775

3 680

4 655

5 750

6 802

7 798

8 689

9 775

10

Given the weekly demand

data what are the exponential

smoothing forecasts for

periods 2-10 using a=0.10?

Assume F1=D1

Ft+1 = Ft + a(At - Ft)i Ai

Page 109: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Week Demand 0.1 0.6

1 820 820.00 820.00

2 775 820.00 820.00

3 680 815.50 793.00

4 655 801.95 725.20

5 750 787.26 683.08

6 802 783.53 723.23

7 798 785.38 770.49

8 689 786.64 787.00

9 775 776.88 728.20

10 776.69 756.28

Ft+1 = Ft + a(At - Ft)

a =

F2 = F1+ a(A1–F1) =820+.1(820–820)

=820

i Ai Fi

Page 110: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Week Demand 0.1 0.6

1 820 820.00 820.00

2 775 820.00 820.00

3 680 815.50 793.00

4 655 801.95 725.20

5 750 787.26 683.08

6 802 783.53 723.23

7 798 785.38 770.49

8 689 786.64 787.00

9 775 776.88 728.20

10 776.69 756.28

Ft+1 = Ft + a(At - Ft)

a =

F3 = F2+ a(A2–F2) =820+.1(775–820)

=815.5

i Ai Fi

Page 111: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Week Demand 0.1 0.6

1 820 820.00 820.00

2 775 820.00 820.00

3 680 815.50 793.00

4 655 801.95 725.20

5 750 787.26 683.08

6 802 783.53 723.23

7 798 785.38 770.49

8 689 786.64 787.00

9 775 776.88 728.20

10 776.69 756.28

Ft+1 = Ft + a(At - Ft)

This process

continues

through week

10

a =

i Ai Fi

Page 112: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Week Demand 0.1 0.6

1 820 820.00 820.00

2 775 820.00 820.00

3 680 815.50 793.00

4 655 801.95 725.20

5 750 787.26 683.08

6 802 783.53 723.23

7 798 785.38 770.49

8 689 786.64 787.00

9 775 776.88 728.20

10 776.69 756.28

Ft+1 = Ft + a(At - Ft)

What if the

a constant

equals 0.6

a = a =

i Ai Fi

Page 113: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

α• depends on the emphasis you want to place on the

most recent data

α

Page 114: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Ft+1 = a At + a(1- a) At - 1 + a(1- a)2At - 2 + ...

Weights

Prior Period

a

2 periods ago

a(1 - a)

3 periods ago

a(1 - a)2

a=

a= 0.10

a= 0.90

10% 9% 8.1%

90% 9% 0.9%

Ft+1 = Ft + a (At - Ft)

or

w1 w2 w3

Page 115: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

115

Page 116: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Trend

Seasonal

Page 117: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale11

7

Page 118: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10

Page 119: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

b0 b1

0

2

4

6

8

10

12

10 11 12 13 14 15 16 17 18 19 20

Best line!

Intercept

Page 120: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

deseasonalize

• Reseasonalize

Page 121: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Deseasonalize

Forecast

Reseasonalize

Actual data Deseasonalized data

Example (SI + Regression)

Page 122: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale12

2

Page 123: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

123

Year

Sales

($ mil.)

2002 7

2003 10

2004 9

2005 11

2006 13

Linear Trend – Using the Least Squares Method: An Example

Year t

Sales

($ mil.)

2002 1 7

2003 2 10

2004 3 9

2005 4 11

2006 5 13

Page 124: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

124

Page 125: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale12

5

Page 126: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

126

(Excel function:

=log(x) or log(x,10)

Page 127: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale12

7

Page 128: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

SI

SI

Page 129: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

year

raw index

quarter

four

Page 130: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale130

The table below shows the quarterly sales for Toys International for the years 2001 through 2006. The sales are reported in millions of dollars. Determine a quarterly seasonal index using the ratio-to-moving-average method.

Seasonal Index – An Example

Page 131: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale13

1

Page 132: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale13

2

Page 133: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale13

3

Page 134: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

deseasonalize

Page 135: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

Page 136: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

• Class Discussions 15 minutes

• Forecasting, continued (2/2) 45 minutes

• Weka Tutorial 30 minutes

• Break 10 minutes

• Decision Trees and Random Forests 60 minutes

• Hands on, decision tree in Weka 15 minutes

Page 137: Barga Data Science lecture 2

Deriving Knowledge from Data at Scale

That’s all for tonight….