33
Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstev ens.com www.westbrookstevens.com

Understanding Data Mining Craig A. Stevens, PMP, CC [email protected]

Embed Size (px)

Citation preview

Page 1: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Understanding Data Mining

Craig A. Stevens, PMP, CC [email protected]

www.westbrookstevens.com

Page 2: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Examples of Classical Statistical

Methods

Page 4: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Yi = a + bxi + e

Page 5: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

Multiple Regression

Page 6: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

Multiple Regression

Page 7: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

Multiple Regression

Page 8: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

Multiple Regression

Page 9: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm

Multiple Regression

Page 10: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Data Mining

Page 12: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 13: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

What is Data Mining?• The process of identifying hidden patterns, trends,

and relationships in large quantities of data. Why Do Data Mining? • To discover useful information for making decisions.• Too many variables for Classical Statistical methods

to work. – Large Number of Records 108 - 1012

• Gigabyte – Terabyte

– High Dimensional Data • Lots of Variables (10 – 104 attributes)

Page 14: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

The Huber-Wegman Taxonomy of Data Set Sizes

Descriptor Data Set Size in Bytes

Storage Mode

Tiny 10^2 Piece of PaperSmall 10^4 A few Pieces of

PaperMedium 10^6 A Floppy DiskLarge 10^8 Hard DiskHuge 10^10 Multiple Hard DisksMassive 10^12 Robotic Magnetic

TapeStorage Silos

Super Massive 10^15 Distributed Data Archives

Page 15: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Name Model Role

MeasurementLevel

Description

BAD Target Binary 1=client defaulted on loan 0=loan repaid

CLAGE Input Interval Age of oldest trade line in months

CLNO Input Interval Number of trade lines

DEBTINC Input Interval Debt-to-income ratio

DELINQ Input Interval Number of trade lines

DEROG Input Interval Number of major derogatory reports

JOB Input Nominal Six occupational categories

LOAN Input Interval Amount of the loan request

MORTDUE Input Interval Amount due on existing mortgage

NINQ Input Interval Number of recent credit inquiries

REASON Input Binary DebtCon=debt consolidation,

HomeImp=home improvement

VALUE Input Interval Value of current property

YOJ Input Interval Years at present job

Page 16: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

SAS Enterprise Miner Objects

Page 17: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 18: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Shows the Cut off Point is 6 Variables

Page 19: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Small Number of Useful Variables

Page 20: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 21: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Comparing Methods and Profit vs Marketing Cost

Page 22: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 23: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 24: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999

Page 25: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Clustering As in Different Brands

Page 26: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

MOIS_I9BPROT_TR3FAT_FCLJASH_JOD6SODI_HGQCARB_SZ0CAL_JOH4

PCR3_1

PCR1_1

PCR2_1

-1

01

MOIS_I9B

012

P R O T _ T R 3

-1

01

MOIS_I9B

-10123

F A T _ F C L J

01

2

PROT_TR3

-10123

F A T _ F C L J

-1

01

MOIS_I9B

-1012

A S H _ J O D 6

01

2

PROT_TR3

-1012

A S H _ J O D 6

-1

01

23

FAT_FCLJ

-1012

A S H _ J O D 6

-1

01

MOIS_I9B

-10123

S O D I _ H G Q

01

2

PROT_TR3

-10123

S O D I _ H G Q

-1

01

23

FAT_FCLJ

-10123

S O D I _ H G Q

-1

01

2

ASH_JOD6

-10123

S O D I _ H G Q

-1

01

MOIS_I9B

-101

C A R B _ S Z 0

01

2

PROT_TR3

-101

C A R B _ S Z 0

-1

01

23

FAT_FCLJ

-101

C A R B _ S Z 0

-1

01

2

ASH_JOD6

-101

C A R B _ S Z 0

-1

01

23

SODI_HGQ

-101

C A R B _ S Z 0

-1

01

MOIS_I9B

-1012

C A L _ J O H 4

01

2

PROT_TR3

-1012

C A L _ J O H 4

-1

01

23

FAT_FCLJ

-1012

C A L _ J O H 4

-1

01

2

ASH_JOD6

-1012

C A L _ J O H 4

-1

01

23

SODI_HGQ

-1012

C A L _ J O H 4

-1

01

CARB_SZ0

-1012

C A L _ J O H 4

Page 27: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Data Mining Art found at http://datamining.typepad.com/data_mining/dataviz/page/2/

Page 28: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Data Mining Art found at http://datamining.typepad.com/data_mining/dataviz/page/2/

Page 29: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com
Page 30: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

National Energy Research Scientific Computing Center

Page 31: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

SurfStatA Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theoryKeith J. Worsley

Page 32: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

Latitude 36.19N and Longitude -86.78W

Nashville, TN, USA

Page 33: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com

http://www.youtube.com/watch?v=CnniJR5Ah7g

Genealogical TreeOn You Tube